I intend to develop a web based application, which uses SQL server 2005 at back end and Visual studio 2.0 as front end.
Application serves two functionalities
Requirement1: It carryout a search (In SQL server) for a particular name entered from front end .net application against a huge DataBase of size about 1 million records.
Scenario: The above requirement can be complemented by following example
Consider we have a bank database which has its existing customer DataBase having containing attributes like Name, Age, and Profession e.t.c.
Now if some new customer want to open a new account in bank, then bank officials want to know whether the
new customer is one of the existing customer or not(without asking to customer itself).
System should be able to detect the combination of name also i.e if we enter "Jhon" from front end .net interface
then application should be able generate all list of all customer having "Jhon" as part of their name at any location(firstname, middlename, lastname).
Requirement 2: If some time change is detected in bank's extisting customer's DataBase then each record of this DataBase is searched against a external dataBase(having almost 2 -3 million records).
Scenario: The above requirement can be complemented by following example
If new user is added to bank's existing customer database(database change) then this new updated database's every record is serarched against another bank's database.
I would like to hear experts voice for database design of such application for optimal performance,and types of searches I should look for application.
The bank example is not a good one, as the banks are always requesting a unique identifying attribute from the customers (like an id from their id card or their SSN). If you want to search through all the fields, you would have to implement something like fulltext searching / soundex functionality (as I assume that you did not wrote Jhon instead of John accidentially)
Requirement 2 is not a database issue, as the other bank database is normally not on the same server and is normally reached via a Web service through a service bus.
Jens K. Suessmeyer
http://www.sqlserver2005.de
Thanks Jeans for replying to my post.
Perhaps you got me otherwise.Let me now tell you exact sitiuation.
Requirement1: It carryout a search (In SQL server) for a particular name entered from front end .net application against a huge DataBase of size about 1 million records.
We consider exact sitiuation here,
We have a huge (about 1 million records) database of people involved in loan, credit card or any kind of fraud against banks.It is combined database for all banks in a country and each bank is contributing a list of defaulters from it's side to this combined database.
Now if some new customer comes to avail the services of a bank, then bank wants to ensure that new customer never appeared in this defaulter list before.
But one important searching criteria for matching against this defaulter's dabase is that we should able to carryout sounlike search i.e (let me explain with one example mentioned below)
If a person names "Mohammed Ali", then our application should be able to find out variation of names which sounds similar to orignal name of person, i.e
"Mohammad Ali"
"Mohamad Ali"
This requirement is expected from banks if new customer comes to bank with fake ID Proof or with forged documents to avail the services. In this case there will be no pre defined unique id for customer or any identifier and application will have to solely on name matching logic.
My concern is mostly associated with the performance of application, especially to the database design (for optimal performance), rather then logic of search.
|||There are a lot of factors to take into account - hardware architecture, I/O, memory, database structure, etc. database architecture also has to be considered - creating filegroups which will contain the database files, storing the database files in multiple drive spindles, creating the tables to be stored in filegroups so that searches can be performed by multiple drive splindles, etc.
|||Thanks bass,for your reply
But our main focus is on database design rather then Hardware configuration...Hardware is not a issue as we have plenty for our use.
I shall appriciate your efforts if you can suggest something for DB design or tuning of database.
|||
Although the soundex functionality is implemented in SQL Server there might be more sophisticated algorithms out there that might fit your need better than the SOUNDEX in SQL Server does. but to your point about the performance of the system: there should be no problem in searching even "large" (though 1M is not that large) database for the names passed by the application. My design suggestion would be to store each name normalalized in tables, generate the soundex words either in your frontend application or using a CLR function and query your table for the soundex terms. By the score of matching and found items you can choose to display the TOP N customers who match the names passed or who have a certain score (e.g. passing John Smith might bring back a long list of entries :-) ) If the tables gets even bigger you could decide to use table partitiioning to scale out your design.
Jens K. Suessmeyer
http://www.sqlserver2005.de
No comments:
Post a Comment