tag:blogger.com,1999:blog-79910523727838533482024-02-08T20:24:52.787+05:30Data QualityThis blog will be a series of posts based on my experiences in handling data quality issues in several industries.
My intention for writing this blog is to share my experiences and learning with you and to get your feedback on these subjects.
Currently the blog is running into multiple pages. Please read from the first post in this blog.Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-7991052372783853348.post-20287343051656752292013-06-18T14:19:00.000+05:302013-06-18T14:19:36.514+05:30Demographic to Psychographic – Paradigm shift in Data Quality<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal" style="margin: 0in 0in 10pt;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">The relation between Data Quality and Direct Marketing is not just traditional but insurmountable too. With the flood of personal and behavioral information about the consumers, Direct Marketing has changed its focus from mass marketing to target marketing. This write-up explores the reflection of this change in Data Quality practice.</span><span style="font-family: 'Georgia','serif';"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin: 0in 0in 10pt;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">Direct Marketing in bulk, often termed as database marketing, has a Data Quality solution at its centre. This solution scrubs, augments and then de-duplicates a bunch of name and address records. This is done to minimize sending the same offer to two different consumer profiles pointing to the same real life individual. However, demographic information often finds bigger clusters of consumers such as “All male consumers from Kolkata in the age group 18-25”. It is almost impossible to send targeted promotional offers to such a cluster expecting a high taker rate.<span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="margin: 0in 0in 10pt;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">In order to define clusters of consumers likely to purchase certain type of products, it is necessary that the clustering technique includes variables based on past purchase history and other related psychographics besides traditional demographic parameters.<o:p></o:p></span></div>
<div class="MsoNormal" style="margin: 0in 0in 10pt;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">Where does big data fit in all these? Big data, besides being big in volume, can be broadly categorized into two groups viz. data collected from various social networking channels such as facebook, twitter or linked-in and the data collected from various devices such as card readers, GPS tools etc. However, there is at least one common characteristic in both these groups. And this is, they contain huge amount of personal and often psychographic information that can be extracted, parsed and mined.<o:p></o:p></span></div>
<div class="MsoNormal" style="margin: 0in 0in 10pt;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">Direct Marketing today, is in a position to make use of this vast collection of personal and psychographic information in clustering consumers into effective and smaller target groups.<o:p></o:p></span></div>
<div class="MsoNormal" style="margin: 0in 0in 10pt;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">Let us have a closer look into the nature of the psychographic parameters used in this kind of clustering.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpFirst" style="margin: 0in 0in 0pt 0.5in; mso-list: l0 level1 lfo1; text-indent: -0.25in;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%; mso-bidi-font-family: Georgia; mso-fareast-font-family: Georgia;"><span style="mso-list: Ignore;">1.<span style="font: 7pt 'Times New Roman';"> </span></span></span><span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">Social Class<br /><a href="http://www.blogger.com/null" name="top"><span style="color: black;">It is the single most used variable for research purposes, and divides the population into groups based on the occupation of the 'Chief Income Earner' (CIE), as such it can be seen as a socio-economic scale. </span></a><o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin: 0in 0in 0pt 0.5in; mso-list: l0 level1 lfo1; text-indent: -0.25in;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%; mso-bidi-font-family: Georgia; mso-fareast-font-family: Georgia;"><span style="mso-list: Ignore;">2.<span style="font: 7pt 'Times New Roman';"> </span></span></span><span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">Lifestyle<br />This involves classifying people according to their values, beliefs, opinions, and interests. There is no one standardized lifestyle segmentation model, instead market research firms, and advertising agencies are constantly devising new categories, which will help target possible consumers of their clients products.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="margin: 0in 0in 10pt 0.5in; mso-list: l0 level1 lfo1; text-indent: -0.25in;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%; mso-bidi-font-family: Georgia; mso-fareast-font-family: Georgia;"><span style="mso-list: Ignore;">3.<span style="font: 7pt 'Times New Roman';"> </span></span></span><span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">Behavioral<br />This kind of parameters divide the market into groups based on their knowledge, attitudes, uses and responses to the products they use.<o:p></o:p></span></div>
<div class="MsoNormal" style="margin: 0in 0in 10pt;">
<span style="font-family: 'Georgia','serif'; font-size: 10pt; line-height: 115%;">Keeping this trend in mind, we can realize that the shift of focus in Direct Marketing must have a significant impact on Data Quality practice itself. Indeed, Data Quality solutions now consider this external third party data for augmentation and then for de-duplication. Complexity in Data Quality with this additional information has become manifold making the journey challenging and all the more exciting.<o:p></o:p></span></div>
</div>
Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-48890626340869963942011-10-31T14:19:00.005+05:302020-05-15T23:09:27.353+05:30Genderization Issues in India<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
Deriving gender from name is important for two reasons.</div>
<div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">
1.<span style="font-family: "times new roman";"> </span>For database marketing, using gender information in addressing the offer letter is crucial.<br />
For example, we can address “John Smith” as “Dear Mr. Smith” and “Peggy Smith” as “Dear MS Smith”</div>
<div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">
2.<span style="font-family: "times new roman";"> </span>Gender code can improve matching by restricting false positives.</div>
<div class="MsoNormal">
Usually Genderization is done using the name components. We will discuss this process in brief for Anglo-Saxon names before jumping into various issues of Genderization in Indian context.</div>
<div class="MsoNormal">
Typical name components are: Salutation/Title, First Name, Middle Name, Last Name and Name Suffix.<br />
Among this, a salutation or title can determine gender uniquely. As for example, values like ‘Mr.’, ‘Mrs.’ can be very helpful for gender determination. But there could be values like ‘Prof.’, ‘Dr.’, which do not give the gender information or the value in this field could be blank. In such cases, we check the first name. Typically a first name like ‘Robert’ corresponds to a male. Sometimes a first name cannot determine the gender uniquely. Then we check the middle name if that can uniquely determine gender. Usually the last name component is not used to determine gender. But name suffixes are surely helpful. Suffixes like ‘Sr.’, ‘Jr.’ point to the male gender.</div>
<div class="MsoNormal">
Using the above logic, in most of the cases, we use the following:</div>
<div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">
1.<span style="font-family: "times new roman";"> </span>Determine gender from title (or salutation), if possible.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
2.<span style="font-family: "times new roman";"> </span>If gender code is blank, check the suffix and assign a gender code, if possible</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
3.<span style="font-family: "times new roman";"> </span>If gender code is blank then check the first name if gender code can be derived</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
4.<span style="font-family: "times new roman";"> </span>If gender code is still blank then check the middle name if gender code can be derived</div>
<div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">
5.<span style="font-family: "times new roman";"> </span>If gender code is still blank, set it to ‘U’</div>
<div class="MsoNormal">
Above is the outline of the Genderization process for a typical Anglo-Saxon name. Now we will see how the above logic can be modified for tackling Indian names.</div>
<div class="MsoNormal">
We will see the challenges in Indian naming system first so that deriving the gender code becomes less complicated.</div>
<div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">
1.<span style="font-family: "times new roman";"> </span>Middle Names should not be evaluated for genderization except for the rules 6 and 7 below.<br />
This is for the fact that people in various parts of the country mention their father’s (husband’s, in the case of a married woman) first name as the middle name.<br />
Therefore for a name like ARUNA PRASHANT IYER, PRASHANT could be her (ARUNA is a female name) father or husband.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
2.<span style="font-family: "times new roman";"> </span>Sometimes, first names (remember, we will derive the first name after parsing) lead to the wrong gender code. In such cases, first name should be clubbed to the middle name (or the initial part of the middle name) to derive the gender code. Let us check an example of this. Consider the name DEBIKA RANJAN SEN. Our parsing rule will classify DEBIKA as the first name, RANJAN as the middle name and SEN as the last name. Note that in Indian language, the name is DEBIKARANJAN which points to the gender code ‘M’. But, DEBIKA is a female name. So the gender code from the first name will be ‘F’… (incorrect).</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
3.<span style="font-family: "times new roman";"> </span>Last names might come handy in a few cases. This is unlike Anglo-Saxon names, last names like BIBI, BEGUM, DEBI, KAUR, KHATUN, SULTANA etc. indicates a female name.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
4.<span style="font-family: "times new roman";"> </span>Name Suffix is rarely used in India. </div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
5.<span style="font-family: "times new roman";"> </span>Presence of words like MOHD. (or any variation of this), KAZI, HAJI, SAYED etc. anywhere in the name indicates a male name.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
6.<span style="font-family: "times new roman";"> </span>If first name ends with (or if the first word in the middle name) is BHAI, it is a male name. <br />
Consider the name DADANBHAI KADVE. Here the first name ends with BHAI. So it is likely to be a male name. This name could also be written as DADAN BHAI KADVE. In this case, entire middle name is BHAI. So the gender code derived from the middle name is ‘M’. Another name could be DADAN BHAI NIRMAL BHAI KADVE. Our parsing rule will store DADAN as the first name, BHAI NIRMAL BHAI as the middle name and KADVE as the last name.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
7.<span style="font-family: "times new roman";"> </span>If first name ends with (or if the first word in the middle name) BEN then it is a female name. <br />
Look at the name SMITABEN V SOLANKI. In this case, the first name ends with BEN and consequently it is a female name.</div>
<div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">
8.<span style="font-family: "times new roman";"> </span>There are some Indian names (first names) that can be used by a male as well as female. Examples of these names would be KAMAL, SUMAN etc.</div>
</div>
Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-90130504201427975142011-09-18T01:21:00.000+05:302020-05-15T23:22:21.164+05:30Discovery Phases<div dir="ltr" style="text-align: left;" trbidi="on">
<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">
Perhaps the most critical phase of any data quality implementation is “Data Discovery” where we study the sample data collected from the site with the goals:</div>
<div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">
1.<span style="font: 7pt "Times New Roman";"> </span>Enrich metadata repository specific to the sample data</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
2.<span style="font: 7pt "Times New Roman";"> </span>Profile the sample data to gain an insight with respect to the semantics of the data</div>
<div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">
3.<span style="font: 7pt "Times New Roman";"> </span>Come up with the set of Data Quality rules for handling the sample data though the steps to be followed during the actual implementation</div>
<div class="MsoNormal">
In the title of this post I deliberately used the term “Phases”. This indicates that there are more than one such discovery phases in practice. Besides the “Data Discovery” phase that we carry out for each implementation, we also conduct the “Market Discovery” phase when we start Data Quality related practices in a new market (i.e. country/region). “Market Discovery” is usually carried out by Data Quality product development companies while “Data Discovery” is carried out by the team responsible for data quality implementations.</div>
<div class="MsoNormal">
I find “Market Discovery” to be very fascinating since you have almost nothing to start with. But let me talk about “Data Discovery” first as this phase is encountered frequently. We start with a set of metadata repository that we have prepared out of “Market Discovery” and enriched during previous “Data Discovery” and implementation activities.<br />
Let me list the things that we have at the start of the “Data Discovery” phase.</div>
<div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">
1.<span style="font: 7pt "Times New Roman";"> </span>Data Quality tool</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
2.<span style="font: 7pt "Times New Roman";"> </span>Metadata Repository including:</div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">
a.<span style="font: 7pt "Times New Roman";"> </span>Master Lookup Tables such as: Given Name, Last Name, Street Type etc. </div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">
b.<span style="font: 7pt "Times New Roman";"> </span>Supporting Lookup Tables such as Phonetic Sounds</div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">
c.<span style="font: 7pt "Times New Roman";"> </span>Lookup Tables for parsing</div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">
d.<span style="font: 7pt "Times New Roman";"> </span>Basic rules for initial cleanup</div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">
e.<span style="font: 7pt "Times New Roman";"> </span>Understanding of the address correction processes for the underlying market </div>
<div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">
3.<span style="font: 7pt "Times New Roman";"> </span>Sample Data from the site</div>
<div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">
<br /></div>
<div class="MsoNormal">
The process of “Data Discovery” cannot be specified and depends on the exact situation but it has to include the followings:</div>
<div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">
1.<span style="font: 7pt "Times New Roman";"> </span>Entire sample data needs to be profiled. This will bring up many data quality issues in the sample data that needs to be handled. In case there are multiple source systems, profiling should be carried out differently for different system.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
2.<span style="font: 7pt "Times New Roman";"> </span>After the data profiling, workflows should be set up in the data quality tool and samples from all the source system needs to be processed as per the requirement. Here manual review of the intermediate results after every step in the workflow is necessary.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
3.<span style="font: 7pt "Times New Roman";"> </span>While step 2 is in progress, discussions with the business users must be carried out to finalize address correction formalities and incorporate the corresponding process in the workflow.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
4.<span style="font: 7pt "Times New Roman";"> </span>At the end of DQ processes, present the results/reports to the business users and get their feedback. Incorporate the feedback in the solution and re-generate the reports.<br />
Remember the points:</div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">
a.<span style="font: 7pt "Times New Roman";"> </span>This is an iterative step</div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">
b.<span style="font: 7pt "Times New Roman";"> </span>You may have to make the business users aware of various Data Quality related concepts including the context sensitiveness of matching (Refer to my earlier post on this topic in July 2011)</div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">
c.<span style="font: 7pt "Times New Roman";"> </span>Discuss with the client regarding the usage of external lists (such as postal tables or telephone directories etc.) in enrichment/augmentation of the address information.</div>
<div class="MsoNormal" style="margin-left: 0.25in;">
At the end of “Data Discovery” you will have updated all the initial data knowledge you had earlier. But be prepared to fine tune the settings and the lookup tables during the implementation. In case, sample is not a representative one, you might have surprises. It is always a better practice to have two independent samples to start with. Use the first sample to come up with the optimum settings and apply it on the second sample and see what kind of gaps you are getting.</div>
<div class="MsoNormal" style="margin-left: 0.25in;">
<br /></div>
<div class="MsoNormal" style="margin-left: 0.25in;">
Now let us talk about “Market Discovery”. It is often said that the discipline data quality is a mix of art and science. The art in data quality seems to be the dominating part during “Market Discovery” phase. Goals for “Market Discovery” are basically to identify the conventions and nuances in names (including SME and Corporate names) and addresses besides building up the vocabulary and the associated rules. Let me briefly discuss the issue with respect to names:</div>
<div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">
1.<span style="font: 7pt "Times New Roman";"> </span>Find out what are the possible components in name. Typical components could be First Name, Middle Name, Last Name, Prefixes and Suffixes. But depending on the traditions and conventions of the market, you may have to include other fields like a second Last Name field and/or a Last Name Prefix and/or a Job Title field etc.</div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">
2.<span style="font: 7pt "Times New Roman";"> </span>For each of these fields, you need to find the vocabulary which will serve as the initial set of Lookup Tables.</div>
<div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">
3.<span style="font: 7pt "Times New Roman";"> </span>Next step will be to figure out the standard naming conventions. Usually, names are written like Title/Salutation + First Name + Middle Name(s) + Last Name + Suffix. But such conventions may vary depending upon the conventions in the underlying country. For example, people usually write Last Name before First Name in Japan. You may have some sample data to carry out the research. It is better to take help of a local expert to understand the nuances. Such research may include consulting books and other publications.</div>
<div class="MsoNormal">
Before carrying out this research, you may have to ensure the capability of handling DBCS or MBCS in the data quality tool (if applicable).<br />
<br />
In case you will be using distance function based comparison for record linkage, where the relative weight of a character-mismatch depends on the position of the character in a string, we need to know the writing convention (left to right or otherwise) in the region.</div>
<div class="MsoNormal">
Address validation/augmentation is another important thing to consider. We need to figure out various possible ways of performing this. Kind of postal tables that are available for the country, if there is any connection between telephone numbering system and state (or city etc.), if address correction tables are available etc. must be looked into and documented.</div>
<div class="MsoNormal">
Another important activity to be carried out in this phase is to find the scope of standardization. This is the phase where the fields which need to be standardized must be identified and associated list of vocabulary should be built. A related concept is the use of nicknames and aliases.</div>
<div class="MsoNormal">
“Phonetic Variation” depends on the culture and history of the underlying market and must be looked into during this phase. If the native language of the market is not the official language for communication then issues related to “Phonetic Variation” will be rampant. It is important not just to capture a few such examples but to understand if there is a pattern of such variations.</div>
<div class="MsoNormal">
<br /></div>
</div>
Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-51810338712317980892011-09-07T13:20:00.007+05:302011-11-04T10:34:24.749+05:30Addresses in India<div class="MsoNormal">Addresses in India are so confusing and follow many patterns from region to region that it makes address matching/parsing/enrichment very challenging. It is really difficult to write an article on this. Someone who has done years of research into this will require an entire volume to come up with the results.<br />
However, many issues that I have faced during various data quality centric implementations prompt me to write something on it.</div><div class="MsoNormal">In this post, I am going to discuss about a few complicated patterns and some of their numerous exceptions.</div><div class="MsoNormal">According to the standard addressing convention, a typical street address has three components where a street number is followed by a street name which is then followed by a street type (there are places where street type appears before the street name). I will begin my discussions with this convention. Yes in many cities in India this is a standard addressing convention.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Ariff Road</u></div><div class="MsoNormal"><u><br />
</u>A close friend of mine lives in the address “27/2 Ariff Road” which is not very far from my home. When I visited her last, out of my curiosity, I took a tour of the entire Ariff Road and looked at the addresses written on the surrounding houses and shops.</div><div class="MsoNormal">By the way, Ariff Road is a relatively narrow lane in North Kolkata. There are a few lanes and by-lanes that originated from Ariff Road. Interestingly most of these are called Ariff Road too. At least, the address on these houses in such lanes and by-lanes bear Ariff Road name. I was rather surprised to see the numbers on these addresses. These were not just unordered but totally chaotic. The house opposite to “27/2 Ariff Road” was “1H/2A Ariff Road”. I came across another house with the address “12/7/A/1 Ariff Road”</div><div class="MsoNormal">We find such addresses in many areas in North Kolkata. We will have to refer to history of this city to find how such addresses came into existence. It is not a planned city and was formed by the British in the late 17<sup>th</sup>. Centaury after the agent J. Charnok purchased three villages from local landlord Sabarna Chowdhury. Slowly but steadily this city as desired by the British, started to grow without any master plan.<br />
So when the postal system was put in place, the numbers were assigned in some order. But subsequently new houses were built and house-holds/families got split requiring separate addresses. Second observation is crucial here. Like in many places, in this part of the country too, the system of joint family (extended family) was prevailing. Obviously they required large houses. But with the passage of time, this system changed and some of the family member moved out while some others continued to live under the same roof but built separate dwelling units. Some of them rented out a portion of their premises. A significant number of these old houses are now sold to the real estate developers and promoter who are building multi-storied apartments. And the entire system is becoming complicated.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Main Road & Cross Road</u></div><div class="MsoNormal"><u><br />
</u>There are a few places (Bangalore or Bangaluru is one of those) where the entire area is divided by main roads running in one direction and cross roads running perpendicular to it. An address in such a place is described by the nearest main and cross road information besides the house name/number.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Plot/Block/Sector</u></div><div class="MsoNormal"><u><br />
</u>Marking an address by these (or some of these) is found in the planned cities in the country. Addresses in Chandigarh (capital of two neighboring states as well as a union territory itself) contain sector information. <br />
In Salt lake area (a suburb of Kolkata), the entire region is divided into sectors. There are blocks in each sector and plots in each block. So a typical address here looks like:<br />
“Plot Y 14, Block – EP, Sector 5 Salt Lake”</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Laxminarayan Jewelers - different entities with similar names/addresses</u> </div><div class="MsoNormal"><br />
Few months back, I took a tour of the city Kolkata. My intention was to observe the addresses on the houses I come across. In one area, I saw a number of shops with the same name <br />
“Laxminarayan Jewelers”. Sometimes I noticed a little variation “Laxminarayan & Sons”. These shops were located in the basement of a huge building. <br />
It took a few weeks for me to find out the history behind this. Someone called “Laxminarayan” established a shop many years ago. But his sons got separated and started their own business under the same roof but as different entities. They all bore matching names and addresses (addresses differed by a number like UNO 1 55 XYZ Road, UNO 2 55 XYZ Road etc.)</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>“Diagonally Opposite to” -a land of landmarks</u></div><div class="MsoNormal"><br />
Last month I took a new telephone connection. In the process, one executive from this telecom company called me to verify/cross-check the address that I provided in the application. She repeatedly asked for a landmark near my house. <br />
While profiling addresses in India, rampant usage of landmark information is noticed. Some of the identifiers for landmark are “Near”, “Opposite to”, “Behind”, “Beside”, “Next to” and not to forget “Diagonally Opposite to”</div><div class="MsoNormal"><u>Adjacent houses or apartments</u><br />
<u><br />
</u><br />
Look at this address - Office Space 2 & 5, Paramount Complex, Navelim, Goa – 403707. It is a commercial address that points to a shop. This shop, however, is spread across two shopping units in the same floor of a shopping complex.</div><div class="MsoNormal">Also, one my friend has this address: Apt 5C & 5D, 12 Mandevilla Gardens, Kolkata – 19<br />
<br />
This possess a serious challenge for address parsing as we need to have multiple fields for containing similar information like two fields for apartment number, two fields for street number etc.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Personal Names in addresses</u> </div><div class="MsoNormal"><br />
Many Indian addresses and esp. the ones from rural areas begin with a personal name. Usually the name of the head of the family is mentioned in the addresses. Many times, post men, in these areas, know people by the name and in-case the letter is addressed to someone else in the family who is not known to the post man, the letter gets delayed. This is the primary reason for using the name of the head of the family in the address. Most popular keyword to identify such names is C/O or “care of”.<br />
There are other variations of C/O where the exact relation is mentioned like “or S/O or son of”, “D/O or daughter of”, “M/O or mother of” etc.</div><div class="MsoNormal">A typical address in this format looks like:</div><div class="MsoNormal">“C/O Ashim Biswas, 33 Govinda Naskar Lane, Sriharipara”</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Addresses in Goa</u></div><div class="MsoNormal"><br />
Goa is a famous tourist spot in India. There is another equally interesting fact surrounding this place. India was dominated by the British for over two hundred years. Goa, on the other hand, was dominated by the Portuguese for over four hundred years. India got its independence in 1947 but the operation “Vijay” was carried out by Indian army in 1961 to liberate Goa.<br />
Names including individual names as well as name of places/buildings/roads in this place sometimes follow the Portuguese style.<br />
Here one can find a street named “18<sup>th</sup>. June Road”<br />
<br />
Now another controversy and debate is going on regarding renaming these streets and buildings!</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Roman digits</u></div><div class="MsoNormal"><br />
Usage of roman digits is abundant in Indian addresses. Consider the address: <br />
“C-1 295/296 Rohini Sector-11 Near Bay Japanese Park Back Of welcome Hotel”. In addresses like this, Sector-11 sometimes written as Sector – XI (or Sec XI). Usually sector numbers in Indian addresses at times, are written using roman digits.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>New and Old</u> </div><div class="MsoNormal"><br />
Yesterday evening, I was walking down a street named “Camac Street”. When I was looking at a new sign board which displaying another name for this famous street, one my friend happened to call me and asked me where I was. I said “Abanindranath Thakur Sarani” and he expressed his concerns that I was in a weird place. I had to tell him that the new name for the “Camac Street”, according to the Kolkata Municipal Corporation, was “Abanindranath Thakur Sarani” to settle things!<br />
Places in India are slowly coming out of its colonial structure and conventions and as a part of this, renaming things is a commonplace now. For this reason, cities like “Bombay” has renamed as “Mumbai” or “Madras” has become “Chennai” and the list continues. Well... “Kolkata” is no different here. Few years back, this city was known as “Calcutta”. It seems that my state “West Bengal” will soon become “Pashchimbanga”.<br />
<br />
Also, the postal department has introduced new postal code (or PIN code) numbers recently. PIN code of the place where I live is 700136. Some of the courier companies refused to deliver packages to my residence initially as they were using the old number 700059<br />
<br />
How about this address? “Old No 160, New No 111,6th Street Extension, 100 Feet Road, Gandhipuram”</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Multiple Locations</u></div><div class="MsoNormal"><br />
Many addresses in this country contains one or two locations. An example will be: <br />
“36, Arakashan Road, Ram Nagar, Paharganj, Diagonally Opposite to New Delhi Railway Station”<br />
This address points to a commercial place in “Paharganj” area of “New Delhi” (capital of India). It also says that the place is in “Ram Nagar” area which is located in “Paharganj”. Usually the first location encountered when the address is read (from left to right) is located within the second location mentioned in the address.</div><div class="MsoNormal">But one will come across plenty of addresses involving more than two locations.<br />
An example will be:<br />
“8A/48, W.E.A. Channa Market, Karol Bagh, Behind Pusa Road”</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Incomplete or partial address</u></div><div class="MsoNormal"><br />
Look at the business address: “C/O Star Investments, OPP. Head Post Office Panjim, Tiswadi – 403001”<br />
This address does not contain a street name and number and yet a valid address i.e. addressee can be reached using the postal service. <br />
Most of addresses such as the above can be written in multiple ways which are apparently not similar.</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com3tag:blogger.com,1999:blog-7991052372783853348.post-76729571788700807922011-08-18T12:42:00.004+05:302011-11-04T10:27:45.992+05:30House-holding dilemma with Indian Data<div class="MsoNormal">House-holding or finding the records under the same house-hold is a typical data quality activity as far as linking individual records goes. According to Wikipedia, a house-hold is defined as “the basic residential unit in which economic production, consumption, inheritance, child rearing, and shelter are organized and carried out”. Typically, it refers to a family unit that stays in the same dwelling unit.<br />
<br />
Household matches are found out using these properties:</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">1.<span style="font: 7pt "Times New Roman";"> </span>Last Name i.e. Family Name should be the same and</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">2.<span style="font: 7pt "Times New Roman";"> </span>Address (residential) on the records should be same </div><div class="MsoNormal">Let us look at the first point that is last name (or family name) matching. This is done under the assumption that the family members share the same family name. But this often fails in Indian context such as:</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">1.<span style="font: 7pt "Times New Roman";"> </span>Muslim families (well…most of them) do not have a family name concept.</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">2.<span style="font: 7pt "Times New Roman";"> </span>Traditionally family name concept was not present in South India. Parents in south Indian families bestowed a single name to their child at birth and appended it with many initials. The abbreviations could stand for the ancestral village and the father’s first name in Karnataka, the house name in Kerala, for the caste name in Tamil Nadu and in Andhra Pradesh, the place of family origin. </div><div class="MsoNormal">I encountered this issue while performing name parsing for south Indian names. However, if we use a name component called last name instead of the family name (or surname) and use this component for individual matching then the complexity reduces a little when cross-matching is also used covering the name components. But for house-holding, this <span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;">possesses </span>a tough challenge.</div><div class="MsoNormal">Let us now look at the issues in address matching. We need to look at this keeping in mind the issues we saw in last name matching. The biggest issue in address matching is incomplete or partial addresses.</div><div class="MsoNormal">Let us look at the following addresses:</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 3.2in;" valign="top" width="307"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 9pt;">Address</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 225pt;" valign="top" width="300"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 9pt;">Potentially Matching Address</span></b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 3.2in;" valign="top" width="307"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 9pt;">Y 14, BLOCK EP, SECTOR V, SALT LAKE, KOLKATA, 700091</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 225pt;" valign="top" width="300"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 9pt;">BLOCK EP, SECTOR V, SALT LAKE, KOLKATA, 700091</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 3.2in;" valign="top" width="307"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 9pt;">16A GARIAHAT ROAD, APT 1C, KOLKATA-19</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 225pt;" valign="top" width="300"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 9pt;">16A GARIAHAT ROAD, KOLKATA 700019</span></div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;">Addresses on both the rows are close. But a detailed inspection reveals that the second address on these rows do not have the dwelling number. In fact, if these addresses appear on two records where names are matching then we would accept these as matches. But when there is no family name on the records then?<br />
It’s a big question mark. Take for example the second address on the second row. It is a close match for the address </span><span style="font-family: "Calibri","sans-serif"; font-size: 9pt; line-height: 115%;">16A GARIAHAT ROAD, APT 2B, KOLKATA 700019 too.<br />
</span><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;">Though residential telephone numbers are of much help, presence of such incomplete addresses possesses big challenges in house-holding. According to Graham Rhind (an expert in handling international data), house-holding should be avoided as far as possible (except some traditional anglo-saxon communities) because it hardly ever works.</span><br />
<br />
<span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;">Note: <u>Discussion only includes individual house-holds and not corporate house-holds</u> </span>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com1tag:blogger.com,1999:blog-7991052372783853348.post-68706287251177976002011-08-06T00:26:00.003+05:302011-08-07T00:36:28.514+05:30Requirements for a Data Quality solution (updated on 6th August)<div class="MsoNormal">I have often come across questions like what could be the necessary properties for software that links records from my colleagues and clients and other associates. It largely depends on the objective of the record linking (please read my earlier post on Context Sensitiveness). However, I am giving a few points that can be considered as the necessary properties of such software.<br />
For the ease of my convenience, I am dividing the properties into two exclusive sets of “Consolidation” and “Matching”. “Consolidation” is the data preparation steps that are carried out before any operation for matching begins. Steps followed in a typical “Consolidation” are:</div><div class="MsoNormal"><u>Basic Cleanup<br />
</u>Here we do initial cleanup of the records like replacement or removal of special characters, replacing multiple consecutive spaces by a single whitespace.</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Software should allow the users to write grammar rules for carrying out such cleanup. Grammar rules can be defined for any specific field. </div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users should be able to write grammar rules applicable to the beginning/end of a field. Good matching software should be flexible enough to accept regular expressions.</div><div class="MsoNormal"><u>Adjusting the misfielded information</u><br />
Often values are put different fields such as Job Title values may be present in the name field even if there is a separate field in the system for Job Title. This routine should be able to identify such occurrences and rectify those. </div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Setting must accept at least two fields- one for the search and the other for the destination</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users will be able to decide in case the destination field is already populated</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Solution can handle different requirements like pickup and adjust the search keyword or the portion of the field from the start till the keyword or the keyword till the end.</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users should be able to add/edit/delete different keywords.</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Settings should be able to accept different sets of keywords for different search.</div><div class="MsoNormal">Misfielded data can be handled using parsing techniques and that is why a separate routine for handling these must be faster enough.</div><div class="MsoNormal"><u>Identification</u><br />
This routine can classify records into various types based on the presence of some keywords. An example of this would be to classify and flag individual customer records (B2C records) and Corporate/SME records (B2B records).</div><div class="MsoListParagraph" style="margin-left: 37.5pt; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to add/edit/delete keywords for each classification/identification.</div><div class="MsoNormal"><u>Branching</u><br />
This routine is closely related to identification. Often different type of records need to be treated differently down the line and hence need to be put in separate buckets. Branching does exactly this.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"></div><div class="MsoNormal"><u>Parsing</u><br />
Parsing is the process of splitting the words in a field like Name or Address into multiple component fields and is discussed in detail in my earlier posts in June 2011.</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to add/edit/delete keywords in the lookup tables for parsing</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to add new mask characters, lookup tables</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to add/edit/delete parsing rules</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>A good software must be able to handle partial parsing (see my earlier post on the topic)</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to create new parsing routines for any field</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Software must be able to generate reports of parsing so that the quality can be assessed by the users and settings may be fine-tuned.</div><div class="MsoNormal"><u>Genderization</u><br />
The process of determining the gender of individual records based on various name components like Title or Given Name etc. is called Genderization.</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users should be able to prioritize the name components to be evaluated for Genderization</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users should be able to add/edit/delete gender codes corresponding various name words</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Software must provide Genderization report that includes the percentages of records for which gender code could be assigned and the percentages of records for which gender codes could not be assigned, a sample of names where gender codes could be assigned and a sample of names where gender code could not be assigned</div><div class="MsoNormal"><u>Augmentation</u><br />
It is the process of enriching the data. One of the major data quality concern is the missing values for important fields. Augmentation or Enrichment is the process by which a data quality solution can take care of a portion of this. Besides filling up the missing information, this routine can also be used for validating the existing information e.g. a record might show the state name as NJ while the name of the city may be Dallas.<br />
This routine can be divided into two sub-routines viz. internal augmentation and external augmentation.</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">1.<span style="font: 7pt "Times New Roman";"> </span>Internal Augmentation: It is the process of enriching information using the data values itself. For example, on a particular record, the name of the city may reside in the address lines instead of the city field.</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">2.<span style="font: 7pt "Times New Roman";"> </span>External Augmentation: It is the process of enriching/validating information using external data such as postal information, telephone directory etc.<br />
<br />
</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to select the validation/enrichment options from a list</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Software will flag the records where validation fails for a particular setting</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>A report must be generated showing the percentages of enriched records, percentages of records that could not be enriched, sample of enriched records etc.</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><br />
</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"></div><div class="MsoNormal"><u>Standardization</u><br />
It is the process of transforming similar data values into a unique format. For example, different spellings and abbreviations for city names, state names etc. are made into a standard format.</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to select the fields which are going to be standardized</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>For every field to be standardized, users must be able to specify lookup tables that contain the variations and the standard format.</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Each lookup table containing possible variations and the standard format need to be editable.</div><div class="MsoNormal"><u>Rejection Routines</u><br />
This routine can reject/flag records based on user defined rules like use of profanity, having information on which validation failed etc.</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to define rejection rules involving one or more fields and one or more lookup tables</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>All the lookup tables must be editable by the users</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to specify if rejection or flagging will be done</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Software must be able to generate reports containing sample records (that got rejected or flagged) for each setting.</div><div class="MsoNormal">Once the “Consolidation” steps are executed, steps in the “Matching” process are followed for the surviving records. Steps followed in the “matching” routine are:</div><div class="MsoNormal"><u>Defining Match Groups/Hierarchy</u><br />
Users define the match hierarchy here. For example, the matching software can find the address matches and then for the records with matching addresses, it can probe further and can find out house-holds etc. So, defining more than one match groups requires a relation among these match groups. Some of the match groups may be unrelated while some others may be related </div><div class="MsoNormal"><u>Two-step matching</u><br />
A good matching software is able to perform matching in two steps. First step is called primary matching where hard keys defined for each record are compared to arrive at a match. Readers can refer to my earlier post on key-based matching in May 2011.</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">1.<span style="font: 7pt "Times New Roman";"> </span>Primary Matching</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to define one or more match keys</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to specify, for each key, what would be done in case a key element is missing for a record.</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to specify the string modes in which a field (or a portion of it) should be included in a match key. Two strings can be compared in various string modes such as</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.25in; text-indent: -0.25in;"><span style="font-family: "Courier New";">o<span style="font: 7pt "Times New Roman";"> </span></span>Consonated Mode: In this case, all the vowels are dropped from the string</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.25in; text-indent: -0.25in;"><span style="font-family: "Courier New";">o<span style="font: 7pt "Times New Roman";"> </span></span>Vowelized Mode: In this case, all the consonants are dropped from the string</div><div class="MsoListParagraphCxSpLast" style="margin-left: 1.25in; text-indent: -0.25in;"><span style="font-family: "Courier New";">o<span style="font: 7pt "Times New Roman";"> </span></span>Numeric Mode: In this case, all the non-numeric characters are dropped from the string</div><div class="MsoListParagraphCxSpFirst" style="margin-left: 1.25in; text-indent: -0.25in;"><span style="font-family: "Courier New";">o<span style="font: 7pt "Times New Roman";"> </span></span>Phonetized Mode: In this case, the entire string is phonetically transformed (for details, please read my earlier post in June 2011)</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.25in; text-indent: -0.25in;"><span style="font-family: "Courier New";">o<span style="font: 7pt "Times New Roman";"> </span></span>Alpha Mode: In this case, all the numeric and special characters are dropped from the string</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.25in; text-indent: -0.25in;"><span style="font-family: "Courier New";">o<span style="font: 7pt "Times New Roman";"> </span></span>Alpha-Numeric Mode: In this case, all the special characters are dropped from the string</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to review the primary match results by generating sample reports<br />
<br />
</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">1.<span style="font: 7pt "Times New Roman";"> </span>Fuzzy Matching<br />
Fuzzy Matching works on the probable matches discovered after primary matching. In this matching probable match pairs are classified into three subsets of definite matches, definite non-matches and suspected matches. This type of matching is also called probabilistic matching and can be implemented in many ways. </div><div class="MsoListParagraphCxSpMiddle"><br />
</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Fuzzy matching should be an optional step. Users may define the primary keys in a satisfactory way and may decide to consider the output of primary matching as final set of matches.</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>This routine must allow the users to specify the action(s) in case blank values are required to be compared to another blank or non-blank value. This setting may be different in case of different fields.</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Fuzzy matching must allow the users to specify the string modes in which fields will be compared</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>This routine must allow users to specify the possible cross-matching options (for details please my earlier post in may 2011)</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to decide, for each fuzzy matching rule, the associated fields to be compared</div><div class="MsoListParagraphCxSpLast" style="margin-left: 1in; text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>At the end of fuzzy matching, users should be able to generate sample reports containing the matched and/or un-matched records for each group in the hierarchy.</div><div class="MsoNormal"><u>Manual Decision/Review</u><br />
Manual review is an important part of a data quality tool. System must provide the appropriate interface to the users so that each of the matches can be reviewed and the following decisions can be made:</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">1.<span style="font: 7pt "Times New Roman";"> </span>Mark as match</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">2.<span style="font: 7pt "Times New Roman";"> </span>Mark as un-match (This option will make sure that the concerned records are never matched in future, if that is the case)</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">3.<span style="font: 7pt "Times New Roman";"> </span>Hold a case for further verification and review by higher authority</div><br />
<div class="MsoNormal"><u>Consolidation of indirect matches</u><br />
After the final matching (Fuzzy matching in case of a 2-step matching or the primary matching in case of a 1-step matching), the matched records need to be consolidated. This routine, besides doing that, must be able to assign unique cluster number to every record such that matching records in a group (or cluster) get the same number. For more detail, please read my earlier post “Indirect Matching” in June 2011. </div><div class="MsoListParagraph" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>At the end of this consolidation, users must be able to generate sample match/un-match reports corresponding to each match group to review the results.<br />
<br />
<div class="MsoNormal"><u>RaRationalization or selecting the survivo</u><br />
This routine works after the consolidation of final matches is done and entire set of records are put into several clusters where each cluster contains matching records. Obviously, each un-matched record forms a cluster of size one.<br />
Often business requires a single record representing a cluster of matching records. This was discussed in detail in my earlier post “Constructing the Survivor Record” posted in June 2011.</div><div class="MsoListParagraph" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Users must be able to write code snippets to build the routine in case, it is complicated. System must provide the users with the code editor and available functions and fields and the logical operators.</div></div><div class="MsoNormal">Data Profiling is an important function of a good data quality software. Requirements around this would be:</div><div class="MsoNormal"><u>Column analysis</u><br />
This routine, given a table will generate the following reports:</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Percentages of NULL or blank</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Percentages of initials</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Percentages of numeric values</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Percentages of alpha values</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Percentages of unique values</div><div class="MsoNormal"><u>Frequency analysis</u><br />
This routine, given a table will generate the following reports:</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Frequency Report</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Pattern frequency Report</div><div class="MsoNormal"><u>Table analysis</u><br />
Given two tables, this routine should be able to identify:</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Common unique key between the tables</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span style="font-family: Symbol;">·<span style="font: 7pt "Times New Roman";"> </span></span>Orphan records i.e. records present in child table but not in master</div><div class="MsoNormal"><u>Generate alerts</u><br />
This routine will allow the users to define business rules and will generate compliance reports. Optionally this routine can generate failure alerts and send e-mails.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal">There are, of course, detail level requirements around each topic mentioned above.</div><br />
<br />
<div class="MsoNormal">We have discussed so far the technical requirements of the good matching software. But there are a few more requirements for this software depending upon the context. These are:</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Input and Output</u><br />
System should be able to read data from a number of sources including</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">1.<span style="font: 7pt "Times New Roman";"> </span>Text files</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">2.<span style="font: 7pt "Times New Roman";"> </span>Delimited files</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;">3.<span style="font: 7pt "Times New Roman";"> </span>Excel files</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">4.<span style="font: 7pt "Times New Roman";"> </span>RDBMS files</div><div class="MsoNormal">Similarly, system should be able to output data in several formats.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>GUI</u><br />
System must have a good GUI. The steps we discussed earlier can be arranged in any order depending upon the requirement. Good to have a GUI that is drag-drop facility.<br />
<br />
<br />
<u>Workflow</u><br />
I have mentioned requirements corresponding to a number of subject areas for the entire software. Users should be able to create workflows incorporating one or more source files and the required processes/functions and settings. These processes/settings should be flexible enough so that different workflows may use these in different order or even may not use some of these as per the context. </div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Integration</u><br />
Such a matching tool can be used stand alone or another application may use the services.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><u>Performance</u><br />
Functional requirements for matching software require a lot of string manipulation. But the software must be quick enough to process large volume of data files.</div><br />
<div class="MsoNormal"><br />
</div><br />
<div class="MsoNormal"><br />
</div><div class="MsoNormal"><br />
</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><br />
</div><div class="MsoNormal"><br />
</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-28658600633556200412011-07-21T15:46:00.000+05:302011-07-21T15:46:30.674+05:30Context Sensitiveness in Matching<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">At present there is a serious discussion going on in the Linekdin group “Matching” (You need to be a member of the networking site Linekdin and also a member of the group “Matching” to access the thread) on the subject of Context Sensitiveness in matching. The subject is closely related to probable errors in matching. Looking at the richness of the content in this discussion and the vastness of the topic itself, I am tempted to share my understanding in this regard.</div><div class="MsoNormal">Let me begin by sharing an experience I had a few years back while implementing a data quality solution in a private bank. This bank was in the process of implementing a data quality solution for its large customer base.<span> </span>In order to fine-tune the matching algorithm, it gave us a control/test file (consisting of a few hundred records) and with this, we tried various possible algorithms. It took us some time before we came up with the proper match algorithm for the control file. Both the business users and the IT users were happy with the result displayed for the control file. But to our horror, the same algorithm became a disaster when a portion of the customer data was processed. We finally had to realign the algorithm from the start. </div><div class="MsoNormal">Before I explain the scenario, let me give one example of the disparity. Consider the two individual records (only a few fields) in the table below:</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 108.9pt;" valign="top" width="145"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Name</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 171pt;" valign="top" width="228"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Address</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">City</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Tel1</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.55pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Tel2</span></b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 108.9pt;" valign="top" width="145"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">ABHISEK C KOTCHER</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 171pt;" valign="top" width="228"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">C TOWER, UNO 12, JEEVAN MANZIL</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SURAT</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">1111111111</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.55pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">2222222222</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 108.9pt;" valign="top" width="145"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">AVISEK C </span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 171pt;" valign="top" width="228"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">C12 OFF MG RD, NEAR JEEVAN MANZIL</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SURAT</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">3333333333</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.55pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">Above two records were matched by the algorithm developed using the control file. But for the customer data integration activity these records were not a match as we realized later.<br />
<br />
We wanted to know if this one was a one of case or there was something fundamentally wrong. To our shock, we found that the control file given by this bank was a portion taken out from their fraud detection de-duplication database which was prepared by another vendor earlier. Unfortunately this vendor did not make the bank aware of the effect of using the same or similar match algorithm under different context.<br />
<br />
In case you can spare some time, you may refer to my earlier post “Errors in matching” posted during May 2011.<br />
In a nutshell, there are two types of possible error when we say; there is a match (or no match) between two specific records. When the algorithm says it’s a match but actually the records represent two different entities, the error is called a false positive. And when the algorithm says that there is no match between the records but actually the records represent the same entity then the error called a false negative. Depending on the context in which the match results will be used there are two types of match objectives. One situation demands that a slight similarity should be captured by the match algorithm and thereby the corresponding objective becomes to reduce false negatives. Another type of scenario demands that two records should match only when there is strong similarity and the corresponding objective in this case becomes to reduce false positives.</div><div class="MsoNormal">In a fraud detection type of context, the objective is to capture a slight similarity so that none is escaped. But in a typical customer data integration type of context, the objective is to allow two records to match only when there is strong evidence that these represent the same entity.<br />
<br />
I do not think there is any strategy to improve the match algorithm in a way so that both false positives and false negatives reduce (unless of course you change the input file/files!). Unfortunately there is no mathematical proof of this but experience of people in this field tells so. <span> </span>And that is why we have these two possible objectives rather than just one that requires reduction of both false positives and false negatives.<br />
<br />
The idea is when one adjusts the match algorithm to reduce false positives as in the case of a typical CDI type of situation by making the match settings stricter, one increases the risk of having more false negatives. On the other hand, when one adjusts the match algorithm to reduce false negatives as in the case of a typical fraud detection type of situation by making the match settings relaxed, one increases the risk of having more false positive.</div><div class="MsoNormal">So, before you start working on the match algorithm (setting), be sure of the objective. <br />
<br />
</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><br />
</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-75525242476053980072011-07-01T10:35:00.004+05:302011-07-02T14:18:53.684+05:30Compound Words<div class="MsoNormal" style="line-height: normal;"><span style="font-size: 12pt;">[I will use many examples in this discussion. Most of these examples are taken from Indian files but a few are from international files.]<br />
<br />
While trying to de-dupe records, issues with compound words crop up often. A nice <a href="http://liliendahl.wordpress.com/2011/05/11/compound-words/">post</a> on this is written by <a href="http://liliendahl.wordpress.com/about/">Henrik Liliendahl Sørensen</a>.Such issues come up when we need to match two field values with at least one field value consisting of more than one words. For ease of discussion, I will split the topic into two. Firstly, we will talk about compound words in name matching.<br />
Let me give a few examples of names:<br />
<br />
</span></div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 180.9pt;" valign="top" width="241"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Name – Record1</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 2.25in;" valign="top" width="216"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Matching Name-Record2</span></b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 180.9pt;" valign="top" width="241"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">JOHN P SMITH</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.25in;" valign="top" width="216"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">JOHNP SMITH</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 180.9pt;" valign="top" width="241"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">DADAN BHAI BOTTLEWALA</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.25in;" valign="top" width="216"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">DADANBHAI BOTTLEWALA</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 180.9pt;" valign="top" width="241"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">AMAL KANTI SEN</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.25in;" valign="top" width="216"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">AMALK SEN</span></div></td> </tr>
</tbody></table><div class="MsoNormal" style="line-height: normal;"><span style="font-size: 12pt;"><br />
After parsing, these names will be<br />
<br />
</span></div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td colspan="3" style="background: none repeat scroll 0% 0% rgb(217, 217, 217); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 207.9pt;" valign="top" width="277"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Name – Record1</span></b></div></td> <td colspan="3" style="background: none repeat scroll 0% 0% rgb(217, 217, 217); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 229.5pt;" valign="top" width="306"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Matching Name-Record2</span></b></div></td> </tr>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">First Name</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Middle Name</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 85.5pt;" valign="top" width="114"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Last Name</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">First Name</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Middle Name</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Last Name</span></b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">JOHN</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">P</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 85.5pt;" valign="top" width="114"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SMITH</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">JOHNP</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SMITH</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">DADAN</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">BHAI</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 85.5pt;" valign="top" width="114"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">BOTTLEWALA</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">DADANBHAI</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">BOTTLEWALA</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">AMAL</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">KANTI</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 85.5pt;" valign="top" width="114"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SEN</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">AMALK</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SEN</span></div></td> </tr>
</tbody></table><div class="MsoNormal" style="line-height: normal;"><span style="font-size: 12pt;"><br />
Notice that in each of these three cases, matching names do not have a middle name. Also on the first two instances, the concatenated values of first name and middle name of the first record matches to the first name of the second record.<br />
Names on the third row, on the other hand are a bit different. Ideally speaking the two names do not exactly match. But since we know that the use of initials for the middle names is frequent, we need to allow these two names to match but with a probability less than 100%.<br />
This is because we need to allow the two words KANTI and K to match as middle names with probability less than 100%.<br />
<br />
Names in this table can be matched by using the following rule:</span></div><div class="MsoListParagraphCxSpFirst" style="line-height: normal; margin-left: 0.25in; text-indent: -0.25in;"><span style="font-size: 12pt;">1.<span style="font: 7pt "Times New Roman";"> </span></span><span style="font-size: 12pt;">For all probable match pair of record</span></div><div class="MsoListParagraphCxSpMiddle" style="line-height: normal; margin-left: 0.55in; text-indent: -0.3in;"><span style="font-size: 12pt;">1.1.<span style="font: 7pt "Times New Roman";"> </span></span><span style="font-size: 12pt;">If the middle name is empty in exactly one record in a pair</span></div><div class="MsoListParagraphCxSpMiddle" style="line-height: normal; margin-left: 0.85in; text-indent: -0.35in;"><span style="font-size: 12pt;">1.1.1.<span style="font: 7pt "Times New Roman";"> </span></span><span style="font-size: 12pt;">If the two first names, when compared, do not give adequate match probability <br />
then carry out the following</span></div><div class="MsoListParagraphCxSpMiddle" style="line-height: normal; margin-left: 1.2in; text-indent: -0.45in;"><span style="font-size: 12pt;">1.1.1.1.<span style="font: 7pt "Times New Roman";"> </span></span><span style="font-size: 12pt;">Concatenate the first name and middle name of the other record in the <br />
pair and compare this string with the first name of the record where <br />
middle name is blank</span></div><div class="MsoListParagraphCxSpMiddle" style="line-height: normal; margin-left: 1.2in; text-indent: -0.45in;"><span style="font-size: 12pt;">1.1.1.2.<span style="font: 7pt "Times New Roman";"> </span></span><span style="font-size: 12pt;">If the probability in the comparison above do not give good result, see if <br />
the first name on the record where middle name is not blank, is a subset <br />
(from the beginning) of the other first name then </span></div><div class="MsoListParagraphCxSpLast" style="line-height: normal; margin-left: 1.55in; text-indent: -0.55in;"><span style="font-size: 12pt;">1.1.1.2.1.<span style="font: 7pt "Times New Roman";"> </span></span><span style="font-size: 12pt;">Consider the remaining substring from the first name where <br />
middle name is blank. If the length of this substring is 1 then see <br />
if there is an initial match between this character and the middle <br />
name on the other record.</span></div><div class="MsoNormal" style="line-height: normal;"><span style="font-size: 12pt;">There are several types of occurrences of compound words in addresses.<br />
Let us consider the following examples:<br />
<br />
</span></div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 0.7in;" valign="top" width="67"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Case #</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 193.5pt;" valign="top" width="258"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Address Word – Record1</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 234.9pt;" valign="top" width="313"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Matching Address Word-Record2</span></b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.7in;" valign="top" width="67"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">1</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 193.5pt;" valign="top" width="258"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">25 MAIN ROAD NEAR IIT CAMPUS</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 234.9pt;" valign="top" width="313"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">25 MAIN ROAD NEAR I I T CAMPUS</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.7in;" valign="top" width="67"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">2</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 193.5pt;" valign="top" width="258"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">21 MG ROAD BOWBAZAR</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 234.9pt;" valign="top" width="313"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">21 M G ROAD BOWBAZAR</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.7in;" valign="top" width="67"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">3</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 193.5pt;" valign="top" width="258"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SCHORBACHSTRASSE 9</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 234.9pt;" valign="top" width="313"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SCHORBACH STRASSE 9</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.7in;" valign="top" width="67"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">4</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 193.5pt;" valign="top" width="258"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">NEW YORK</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 234.9pt;" valign="top" width="313"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">NEWYORK</span></div></td> </tr>
</tbody></table><div class="MsoNormal" style="line-height: normal;"><span style="font-size: 12pt;"><br />
In the first two instances (case # 1 & 2) refer to one style of issues involving compound words where abbreviations of place names using the initials are written differently.<br />
In the next instance (case # 3) refer to another style of issues involving compound words street names and the corresponding street types are combined together.<br />
<br />
Let us see what happens to these addresses (case # 1, 2 and 3) after proper parsing<br />
<br />
</span></div><div class="MsoNormal" style="line-height: normal;"><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none; width: 661px;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 2.2in;" valign="top" width="211"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Original Address</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Hse. No.</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 99pt;" valign="top" width="132"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">St. Nm.</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">St. Typ.</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Location</span></b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b><span style="font-size: 10pt;">Landmark</span></b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2.2in;" valign="top" width="211"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">25 MAIN ROAD NEAR IIT CAMPUS</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">25</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 99pt;" valign="top" width="132"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">MAIN</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">ROAD</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">IIT CAMPUS</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2.2in;" valign="top" width="211"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">25 MAIN ROAD NEAR I I T CAMPUS</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">25</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 99pt;" valign="top" width="132"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">MAIN</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">ROAD</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">I I T CAMPUS</span></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2.2in;" valign="top" width="211"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">21 MG ROAD BOWBAZAR</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">21</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 99pt;" valign="top" width="132"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">MG</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">ROAD</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">BOWBAZAR</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2.2in;" valign="top" width="211"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">21 M G ROAD BOWBAZAR</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">21</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 99pt;" valign="top" width="132"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">M G</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">ROAD</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">BOWBAZAR</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2.2in;" valign="top" width="211"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SCHORBACHSTRASSE 9</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">9</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 99pt;" valign="top" width="132"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SCHORBACHSTRASSE</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2.2in;" valign="top" width="211"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SCHORBACH STRASSE 9</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">9</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 99pt;" valign="top" width="132"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">SCHORBACH</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 10pt;">STRASSE</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> </tr>
</tbody></table><br />
<span style="font-size: 12pt;"> In the first case, we need to match IIT CAMPUS to I I T CAMPUS. We can drop the keyword CAMPUS for matching and then remove the whitespace characters if the field contains only initials. <br />
In the second case we can adopt the same technique.<br />
Third case is unique. This is an address example from picked up from data file from Germany.<br />
STRASSE is a popular street type in this country which is often clubbed with the corresponding street name. One way to handle this would be if the street type is STRASSE then combine street name and street type together and compare this value to the street name of the other record in a probable match pair.</span></div><div class="MsoNormal" style="line-height: normal;"><span style="font-size: 12pt;">Last instance (case # 4) is an example of the city field which can be tackled using standardization.<br />
<br />
Lastly, I have seen many cases of typo that lead to issues involving compound words in matching. I prefer using a separate match technique built on the earlier match technique where we compared two strings where each string contained one word. I will briefly discuss this in my next post.</span></div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-65075805669918530372011-06-20T13:52:00.003+05:302011-06-21T17:47:10.539+05:30Phonetic Similarity<div class="MsoNormal">Consider the following words taken from India:</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="border: 1pt solid windowtext; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SO<b>F</b>IA</div></td> <td style="border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SO<b>PH</b>IA</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>J</b>ENA</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>X</b>ENA</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>S</b>ANTANU</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>SH</b>ANTANU</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>B</b>IKA<b>SH</b></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>V</b>IKA<b>S</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.95in;" valign="top" width="91"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">BAI<b>BH</b>AB</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">BAI<b>V</b>AB</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">Words in the same row are actually matching and the difference between each pair is put in bold. These differences in spelling reflect how people pronounce these words. The issue becomes convoluted if the native language of people considered is not English (i.e. the names are non English) but the impact of regional languages are obvious in the spelling. <br />
There are some standard algorithms to handle such situation. For example, Soundex and Metaphone are two widely known and used algorithms that help bring two similar sounding names closer. There are ms other even more sophisticate algorithms to use.</div><div class="MsoNormal">But there is an issue with each of these algorithms i.e. these algorithms are sort of fixed. We cannot customize these lists/rules. And as I processed data from different countries and regions I encountered more variations than listed in these typical algorithms.<br />
<br />
Let me give you a funny example. I came across a man DHARMENDER a few years back and he eventually represented my case in a legal matter. When I was going through the initial draft paper of my case I found his legal name was DHARMENDRA. Eventually I realized that in some specific region, personal names ending with DRA are often pronounced and written as the same name ending with DER.<br />
</div><div class="MsoNormal">We wrote such phonetically similar syllables in a table sorted on the length in a decreasing manner. Our algorithm just compared these syllables with the values in the desired field in the records and replacing them accordingly if found. We could edit this table subsequently.</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-65006247244101405402011-06-20T11:42:00.000+05:302011-06-20T11:42:28.421+05:30Partial Parsing<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">Again, we are concentrating on Address Parsing and esp. on the addresses that failed the earlier parsing techniques. Note that handling name parsing through this method will be disastrous and the solution must have adequate number of name parsing rules.<br />
But the possible number of variations in address strings is huge and it is practically impossible to have address-parsing rules covering all possible patterns.<br />
<br />
Let us look at an address: SHIKHA TERRACE UNO 22A 27 MAGRI C OPP GAS DEPOT OFF M G RD<br />
<br />
Suppose we use the following lookup tables:</div><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <div class="MsoNormal"><br />
</div><table border="0" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>House/Building: Mask Character H</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">TERRACE</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">BLDG</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">DEPOT</div></td> </tr>
</tbody></table></td> <td style="padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Landmark: Mask Character C</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">OPP</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">NEAR</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">OFF</div></td> </tr>
</tbody></table></td> </tr>
<tr> <td style="padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Street Type: Mask Character T</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">RD</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ST</div></td> </tr>
</tbody></table></td> <td style="padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Apartment: Mask Character A</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">APT</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 227.85pt;" valign="top" width="304"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">UNIT</div></td> </tr>
</tbody></table></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">Suppose, before parsing, we apply the following cleansing rules on the above address:</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span><span>1.<span style="font: 7pt "Times New Roman";"> </span></span></span>UNO is replaced by UNIT</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span><span>2.<span style="font: 7pt "Times New Roman";"> </span></span></span>Consecutive numeric characters and initials are separated.</div><div class="MsoNormal">So, the above address looks like: SHIKHA TERRACE UNIT 22 A 27 MAGRI C OPP GAS DEPOT M G RD<br />
So, the pattern for this address will be: UHANINUICUHIIT (U denotes an unidentified component and I denotes an initial). Suppose we do not have any parsing rule defined for this pattern. So this address will remain unparsed.<br />
But one look at the identified pattern and we will be able to identify (and extract) meaningful information out of it. </div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span><span>1.<span style="font: 7pt "Times New Roman";"> </span></span></span>The initial UH represents the building</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span><span>2.<span style="font: 7pt "Times New Roman";"> </span></span></span>The sub-pattern AN that comes immediately after UH can be treated as the apartment information.</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span><span>3.<span style="font: 7pt "Times New Roman";"> </span></span></span>The sub-pattern CUH indicates a landmark</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span><span>4.<span style="font: 7pt "Times New Roman";"> </span></span></span>The sub-pattern IIT represents the street name and type and should be parsed as SST</div><div class="MsoNormal">Partial parsing is the identification of known sub-patterns from within the identified pattern and subsequent extraction of the relevant components from the original string.</div><br />
<div class="MsoNormal">There are a few points to note for partial parsing.</div><ol><li>Several known sub-patterns may be identified from within an identified pattern.</li>
<li><span><span><span style="font: 7pt "Times New Roman";"></span></span></span>For a given identified pattern ABCDEF, suppose CD is a known sub-pattern. After chopping that off from the identified pattern, the remaining pattern will be ABEF. Even if BEF is another known sub-pattern, it must not be considered as the original pattern did not contain BEF as a sub-string. When CD is removed, we are left with two sub-strings i.e. AB and EF and our search needs to continue within these sub-patterns.</li>
<li><span><span><span style="font: 7pt "Times New Roman";"></span></span></span>Let us consider the same identified pattern ABCDEF. Suppose our partial parsing rules contain two known sub-patterns viz. CD and BCD. Well, if CD comes up first in the search then we will be chopping of CD from the identified pattern and eventually, the sub-pattern BCD, in-spite of being present in our solution, will not work. One way to handle this is to sort the partial parsing rules according to the lengths of the sub-patterns in the decreasing way.</li>
</ol>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-58462818720189880472011-06-12T14:33:00.000+05:302011-06-12T14:33:18.504+05:30Improvised Parsing Technique<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">We have already discussed the basic technique that can be used in parsing. We have also seen some of the limitations in this technique. The biggest limitation is the effort required to implement (and maintain) this basic approach in a region or country where data is semi-structured. <br />
For example, possible number of components in an address is well over 15.<br />
Obviously, the number of possible patterns in the address is huge. In fact, armed with around 30K address patterns, I could not parse more than 60% of Indian addresses during one implementation.<br />
I am mentioning address for parsing here since address parsing seems to be most challenging as it comes- up with lot more variation than name parsing. Name parsing has its challenges though.<br />
<br />
Suppose an address can be consisting of the blocks: BLK1 and BLK2<br />
That is, an address might look like:<br />
BLK1 or BLK2 or BLK1||BLK2 or BLK2||BLK1 (|| is the concatenation operator).<br />
Now, there are components within each address block. Suppose these are:</div><div class="MsoNormal"><br />
</div><div class="MsoNormal">There are two blocks viz. BLK1 and BLK2 in an address. BLK1 contains the components with mask characters A, B and C. BLK2 contains the components with mask characters D, E and F. </div><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;">A, B, …, F are the mask characters (including U and I (initials) and N (numbers)).</span><br />
<br />
<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">Suppose we are creating patterns of length 7 where 4 characters correspond to BLK1 while 3 characters correspond to BLK2.<br />
There are not more than 4<sup>3</sup> or 64 sub-patterns within BLK1 and not more than 3<sup>3</sup> or 27 sub-patterns within BLK2. So, we need to create 91 possible parsing rules.<br />
But if we use the basic approach that was discussed earlier, we will create a maximum of 6<sup>7</sup> or 279936 possible parsing rules.<br />
Why do we have such a huge difference?<br />
Let us have a look at the following table which lists a few possible patterns in BLK1 and BLK2:</div><div class="MsoNormal"><br />
</div><table border="0" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="padding: 0in 5.4pt; width: 126.9pt;" valign="top" width="169"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Pattern in BLK1</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ABCA</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ABCC</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">CBAB</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">BACB</div></td> </tr>
</tbody></table></td> <td style="padding: 0in 5.4pt; width: 319.5pt;" valign="top" width="426"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none; margin-left: 58.85pt;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Pattern in BLK2</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">DEF</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">DDF</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">EDF</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 115.35pt;" valign="top" width="154"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">DFF</div></td> </tr>
</tbody></table></td> </tr>
</tbody></table><div class="MsoNormal"><br />
When these are combined into BLK1||BLK2 or BLK2||BLK1 they form 16 + 16 = 32 patterns.<br />
That is, we need to create 8 parsing rules (basically these are sub-parsing rules) to get 40 parsing rules.<br />
Wow! This is just wonderful. But let us stop here. Are we covering everything in such improvisation?<br />
Unfortunately not. Let us consider the pattern: <span> </span>ADBEFCC<br />
Note that in this pattern, BLK1 and BLK2 are not separated. So, this pattern will not be covered if we stick to the same definition of BLK1 and BLK2.<br />
<u>In reality, we select the blocks (BLK1 and BLK2 here) so that the possibility of such patterns is less</u>.<br />
But eventually, we will be left with some unparsed addresses. How to handle those? We will see that shortly.<br />
In the team time, we will see how the other limitation can be handled. Earlier, we gave a few examples of situations where a word or abbreviation can have different meanings in different addresses (or names). For example, the word MD at the end of a name string may denote a job title. It may also be an abbreviation for the word MOHAMMED. <span> </span>Let us see one such example in address. <span> </span>Suppose, we are using the mask character T to denote a street type and U to denote an unidentified word. If we try to parse addresses from a region where street name is preceded by street type, we will encounter <br />
sub-patterns of the form UT or UUT etc. In such cases, the U (or the UU) refers to the street name and T points to the street type. But we may as well get TUT as a sub-pattern. Initial T here points to a street type which is followed by possibly a street name and then another street type appears. This looks a bit unusual. In reality the first T may point to a word that has a typographical error or it can also be a situation like ST PETER AVENUE. Note that the first T points to the word ST which is a popular abbreviation for STREET but in this case, probably, it is an abbreviation for the word SAINT. And consequently the TUT should be having the same rule as UUT. Learning from this is anything that looks like a sure variation from the general convention may not be a variation at all. So, we need to consider the relative position of a mask character in the pattern before setting a parsing rule corresponding to it. This is applicable in either of the approaches for parsing. <br />
The last limitation that is encountered during parsing and discussed earlier is the presence of compound words in names (personal names or otherwise).<br />
Let us discuss this (or the possible solution to this) using the same example that was given earlier.</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>First Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ATULPRASAD SEN</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ATULPRASAD</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SEN</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ATUL PRASAD SEN</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ATUL</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">PRASAD</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SEN</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">This issue of compound words surfaces during parsing but actually impacts matching or identity resolution. In this case, we will not end up matching either the first names or the middle names (even after using cross matching) with a high probability.<br />
During matching, we can use another value viz. first name||middle name. Note that this refers to the field (derived field) for given name. The above two records will agree on this new field. Note that such solutions are purely ad-hoc and the exact nature of such solutions varies from case to case.</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-56343833545307801422011-06-10T12:36:00.003+05:302011-06-12T15:42:23.906+05:30Limitations in the Parsing approach (updated on 11th. June 2011)<div class="MsoNormal">In the earlier post, we saw how parsing can be automated (with the help of name parsing). Why did we use name parsing as example? Well… I used it because a name string has less number of possible components in an address string (usually).<br />
It can be appreciated that the total number of possible combination increases with the number of possible components. <br />
Let us have an idea about the magnitude of such possible combination.<br />
Suppose the original string has n number of tokens and there are m possible components. <br />
We also assume that a valid string will have minimum p number of components.<br />
<u>Case1</u>: Each component appears only once in the string (and m > n).<br />
Here, the total number of possible patterns is: m!/n! <br />
<u>Case2</u>: Each component can be repeated. <br />
Here, the total number of possible patterns is: m<sup>n</sup></div><div class="MsoNormal">So, we see that in either case, for fixed n, total number of possible patterns increases with m<br />
Well… in reality, none of the above cases is applicable in full. There are complex relationships among the components though such relationships are kind of probabilistic in nature. Such as: in US a street type usually appears after a street name.</div><div class="MsoNormal">So, we see the first limitation of this kind of automation. It is that the number of possible patterns is potentially high. In fact, in developing countries i.e. where the addressing conventions are not standardized, the number of address components (and thus patterns) is very high. To give an idea, even 50K different patterns were not sufficient to process address data from India.<br />
High number of patterns impacts an automated solution in two ways: </div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;">1.<span style="font: 7pt "Times New Roman";"> </span> It takes more effort to develop the vocabulary and the table containing the parsing rules.</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;">2.<span style="font: 7pt "Times New Roman";"> </span>High volume of vocabulary and high number of entries in the table containing the parsing rules makes the processing window time (for the automated solution) high.</div><div class="MsoNormal">Apart from the above, there is one more limitation in the parsing approach described earlier.<br />
Let us consider the initial A in an address string. Usually, it is an abbreviation for the word APARTMENT and is used as an apartment identifier. But many times it also appears to denote something else. The initial A might be a part of the house number (or street number), bock or even a street name.<br />
Similarly, the token KUMAR may be a person’s first name or middle name or last name. <br />
The word MD at the end of a name string may denote a job title. It may also be an abbreviation for the word MOHAMMED.<br />
<br />
<br />
<div class="MsoNormal">One more issue that is seen during parsing is (actually an issue in matching) the use of compound words in names (personal names as well as names of street, city, locality, company etc.). Let us give one name example:</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>First Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ATULPRASAD SEN</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ATULPRASAD</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SEN</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ATUL PRASAD SEN</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ATUL</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">PRASAD</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SEN</div></td> </tr>
</tbody></table><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;"><br />
These names are taken from Indian data. Clearly, these Names are equal but a space in between ATUL and PRASAD in the given name is creating a big issue here. Even if we have a soft-key based matching we will end up getting much less than 100% comparison probability for first name comparison between these two records and much less than 100% comparison probability for first name cross middle name comparison between these two records.<br />
But one look at the given names in these records tells us that these should match with a very high probability (close to 100%, if not 100%).</span></div><div class="MsoNormal"><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;"> </span>In the next post, we will see how the parsing approach can be improvised to address these limitations.</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-53276889987762947722011-06-09T15:30:00.000+05:302011-06-09T15:30:49.439+05:30Discovering the hidden dimensions - Parsing<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">We have discussed the matching or record linking so far. I have said that if n fields from a set of records are used in matching then these records can be considered as points in an n-dimensional space. <br />
Suppose a record has n fields to start with. Parsing is a process that splits these n fields into m fields where m > n. In other words, parsing increases the granularity in a record. <br />
For example, a record might come in with a name field. Parsing process may generate additional fields like Title, First Name, Middle Name, Last Name Prefix, Last Name, Suffix etc.<br />
Similarly, an address field might be split into multiple granular fields. <br />
<br />
Let us look at the following examples (three names from a US file) first and then, we will see how parsing is done when we review records manually.</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none; width: 643px;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 135.9pt;" valign="top" width="181"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Title</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 85.5pt;" valign="top" width="114"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>First Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 76.5pt;" valign="top" width="102"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 76.5pt;" valign="top" width="102"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Suffix</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 135.9pt;" valign="top" width="181"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ROBERT CANNING</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 85.5pt;" valign="top" width="114"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ROBERT</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 76.5pt;" valign="top" width="102"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 76.5pt;" valign="top" width="102"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">CANNING</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 135.9pt;" valign="top" width="181"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MR. STUART ROGER BINNY</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MR.</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 85.5pt;" valign="top" width="114"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">STUART</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 76.5pt;" valign="top" width="102"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">RODGER</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 76.5pt;" valign="top" width="102"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">BINNY</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 135.9pt;" valign="top" width="181"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ARNOLD JONES SR.</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 85.5pt;" valign="top" width="114"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ARNOLD</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 76.5pt;" valign="top" width="102"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 76.5pt;" valign="top" width="102"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">JONES</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SR.</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
If we check the first record, we find that </div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span><span>1.<span style="font: 7pt "Times New Roman";"> </span></span></span>ROBERT is a standard given name and CANNING is a standard last name.<span> </span></div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span><span>2.<span style="font: 7pt "Times New Roman";"> </span></span></span>General convention says, last name is written after the first name. </div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span><span>3.<span style="font: 7pt "Times New Roman";"> </span></span></span>Our conclusion is ROBERT is the first name and CANNING is the last name</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><br />
</div><div class="MsoNormal">For the second record, </div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span><span>1.<span style="font: 7pt "Times New Roman";"> </span></span></span>We immediately identify MR. as a title, STUART and RODGER as both given names while BINNY remains unidentified.</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span><span>2.<span style="font: 7pt "Times New Roman";"> </span></span></span>General convention says title precedes the first name and usually middle name is written in between the first name and last name.</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span><span>3.<span style="font: 7pt "Times New Roman";"> </span></span></span>As per the general convention, the unidentified word (or token) seems to be the last name. <br />
Two given name words follow the title; the first one is the first name while the second one is the middle name. So the entire parsing is: MR. goes in the title field, STUART in the first name field, RODGER in the middle name field and BINNY in the last name field.</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><br />
</div><div class="MsoNormal">As for the third record,</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span><span>1.<span style="font: 7pt "Times New Roman";"> </span></span></span>ARNOLD is a standard given name, JONES is a standard last name and SR. is a standard name suffix.</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span><span>2.<span style="font: 7pt "Times New Roman";"> </span></span></span>General convention says, last name follows the first name which is followed by the suffix. </div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span><span>3.<span style="font: 7pt "Times New Roman";"> </span></span></span>Our conclusion is ARNOLD is the first name and JONES is the last name and SR. is the suffix.</div><div class="MsoNormal">From these examples, we see that for name parsing, we use two rules.</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span><span>1.<span style="font: 7pt "Times New Roman";"> </span></span></span>Initially we identify each word or token in the name as one of the name components.</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span><span>2.<span style="font: 7pt "Times New Roman";"> </span></span></span>We also use the general convent ions of writing names.</div><div class="MsoNormal">Note both these rules are dependent on the underlying region from which names are taken.<br />
<br />
Armed with this idea, let us see how automation can be used to do name parsing.<br />
Let us use the following three tables for the automation.<br />
<br />
</div><table border="0" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Title</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MR.</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MRS.</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MS.</div></td> </tr>
</tbody></table><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ROBERT</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">STUART</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">RODGER</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ARNOLD</div></td> </tr>
</tbody></table></td> <td style="padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">CANNING</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">JONES</div></td> </tr>
</tbody></table></td> <td style="padding: 0in 5.4pt; width: 119.7pt;" valign="top" width="160"> <table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Suffix</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SR.</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 77.4pt;" valign="top" width="103"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">JR.</div></td> </tr>
</tbody></table></td> </tr>
</tbody></table><div class="MsoNormal">For each of the records, we evaluate the names with the above four tables and in the order in which these tables appear from left to right. We designate a token that matched to the title table by T, given name table by G, last name table by L and suffix table by S. We also use the symbol U to mark any unidentified token.<br />
<br />
Once such evaluation is done, we get the patterns as displayed in the following table:</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 1.95in;" valign="top" width="187"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Identified Pattern</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 1.95in;" valign="top" width="187"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ROBERT CANNING</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">GL</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 1.95in;" valign="top" width="187"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MR. STUART ROGER BINNY</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">TGGU</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 1.95in;" valign="top" width="187"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ARNOLD JONES SR.</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">GLS</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">Once this pattern identification is done, we require rules corresponding to each pattern to tell us how the pattern is to be parsed. This approach gives us ability to parse all the names with same pattern with one rule.<br />
To create these rules (we need three rules for now) we require a few more symbols. Let T denote a title, F denote a first name, M denote a middle name, L denote a last name and S denote a suffix.<br />
We now, build up the following name parsing rules using the general conventions of writing names in US:</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Identified Pattern</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Parsing Rule</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">GL</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">FL</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">TGGU</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">TFML</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">GLS</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">FLS</div></td> </tr>
</tbody></table><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;"><br />
Using the four tables to identify tokens (viz. title table, given name table, last name table and the suffix table) along with the table with parsing rules and the, we can automate name parsing to generate results mentioned earlier.<br />
<br />
<br />
<br />
For our reference, we will call the tables to identify tokens as vocabulary and the symbols used to represent the tokens matched to any such table (including U) as mask characters.<br />
It is obvious that to be able to parse more and more names we need to correctly identify more and more tokens. That is, we need to add more entries to the tables in our vocabulary. This way, we will identify more patterns and in order to process those, we need to have more entries in our rule table.<br />
<br />
As we discussed earlier, name parsing largely depend on the customs and conventions of writing names in the underlying region or country, we will see different name components. To give an example, we will see that many names in Mexico have last name prefix field. There are countries where we have two last name fields. Sometimes, you will encounter multiple names separated by some delimiter in the name field. For example, you might get names like MR. & MRS. CLARK. One way to handle such data issues is to break the original record into two having two different names. On both the records, we will keep the remaining information same.<br />
<br />
We can easily use similar technique to automate address parsing or parsing of any other field.<br />
<br />
</span>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-15194256312516612112011-06-06T11:49:00.000+05:302011-06-06T11:49:11.011+05:30Constructing the Survivor Record<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">So far, we have discussed many things about matching or record linking. At the end of this process, we get a few groups of matching records besides the bunch of non-matches.<br />
Let us look at the following example:</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Id.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Record</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record1</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record2</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">3</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record3</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">4</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record4</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">5</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record5</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">6</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record6</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">7</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record7</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">8</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 81pt;" valign="top" width="108"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record8</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal"></div><div class="MsoNormal">Suppose after matching, we see that record1, record4 and record6 are matching while record3 and record5 are matching and record2, record7 and record8 are unmatched.<br />
This is expressed by assigning the same identifier to the matching records as in the following table:</div><div class="MsoNormal"><br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Master Id.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Id.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Record</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record1</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">3</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record2</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">3</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record3</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">4</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record4</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">5</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record5</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">6</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record6</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">4</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">7</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record7</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 63.9pt;" valign="top" width="85"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">5</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 27pt;" valign="top" width="36"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">8</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 66pt;" valign="top" width="88"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Record8</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">Now, if we are told to give the record corresponding to master id. 1, we are at a loss as we have three such records.<br />
What we need is to have one representative record for each master id. This is a non issue in case of an unmatched record like record7 in the above table. But it is really a challenging task for the cases where we have a cluster of records under one master id. Such record is called a surviving record or survivor.<br />
There are ways of building this survivor record corresponding to a cluster of matching records depending upon the situation or context. It is a business decision.<br />
We will briefly discuss one such method.<br />
Suppose records are coming in from several channels. In an environment where these records are customer records captured by a manufacturing firm, channels could be different types of POS (point of sale) or in a mass-mailing scenario, channels could be various third-party organizations etc. Let us also suppose that each record bears a time-stamp representing time date-time when it was last updated.<br />
Let us suppose that there are fields like Name, Address, City, State, Post Code, Phone and E-Mail in a record (in reality there could be much more).<br />
<br />
For a cluster, Name field of the survivor record will be built using a logic similar to the following example:</div><div class="MsoNormal">Use the name from the latest channel1 record (if it is not blank) or if there is no channel1 record in the cluster then take the name from the latest channel3 record (if it is not blank) else take the name from the latest record.<br />
<br />
This way we can have rules defined for each field on the survivor record. Such rules, obviously, are business decisions and must be defined along with the users of the system.<br />
<br />
</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-80235968359046990862011-06-03T18:34:00.000+05:302011-06-04T14:12:40.368+05:30Indirect Matching<div class="MsoNormal">Consider the following records:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none; width: 697px;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>#</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. No.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Type</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Apt</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Cell No.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>City</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>ZIP</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">John</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Peter</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Apt 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1234567890</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Jon</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">P</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Ste 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1212121212</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">3</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">J</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">P</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1750</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Collins</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Blvd</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">102</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1212121212</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Richardson</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">75068</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">In this case, the key-based matching we discussed earlier will declare the first two records to be a match and the last two records a match. But ideally, we want all the three records to be considered a match and they should form one cluster/group of matched records.</div><div class="MsoNormal">This can only be done by performing an indirect match according to the rule:<br />
For any three records A, B and C; if A matches B and B matches C then A indirectly matches C. <br />
<br />
If n fields are being used for linking records then we can consider a record to be a point in the n-dimensional space and also visualize and define a distance between two such points.</div><div class="MsoNormal">Actually, our key-based matching will consider two records to be a match provided they are close enough i.e. the distance between the records is not bigger than a predefined number.<br />
<br />
A distance function can easily be defined for two records using the highest comparison probability returned by the match keys.<br />
Suppose the highest comparison probability for the two records A and B be λ<sub>AB</sub>. We can define the distance function D (A, B) = 1 – λ<sub>AB</sub> to measure the distance between A and B.<br />
<br />
Now, A and B will match only if D (A, B) < δ where δ ε [0, 1] is a pre-defined number.<br />
<br />
</div><div class="MsoNormal">In our example in this section, distance between the first two records and the distance between the last two records are less than the pre-defined number δ. But the distance between the first and third records is more than δ<br />
<br />
In Data Quality in general and in record linking especially, though mathematics plays the central role, it never is the ultimate decision maker. We will see this in the example below:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none; width: 697px;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>#</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. No.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Type</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Apt</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Cell No.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>City</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>ZIP</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">John</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Peter</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Apt 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1234567890</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">J</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Ste 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1212121212</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">3</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Jessie</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Ste 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1212121212</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
As per the rule of indirect matching, all the three records will be put under the same cluster and will be assigned the same master identifier.<br />
But we have an issue here. Clearly, the first and the third records are not matching. Probably each of these represents the same house-hold. <br />
So what’s the issue here? Obviously, either the first two records are not matching in reality or the last two records are not matching in reality. But unfortunately these two matches were concluded using the same logic. In fact, when we review the records manually, it is not possible to decide if the match between first two records is correct or the match between the last two records is correct.<br />
In reality, we look for other pieces of information which could be DOB, TAX Id, SSN or any other identifier. If nothing works then we just contact the customers and find out.<br />
<u>Automatic matching cannot resolve situations where even manual review fails.</u></div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-68634962320631773192011-06-03T12:01:00.000+05:302011-06-03T12:01:48.420+05:30The Small Steps – Transformations<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal" style="line-height: normal;">Earlier, we mentioned transformations while defining a match-key.</div><div class="MsoNormal">What kind of transformation? Surely, we consider transformations so that two or more apparently dissimilar strings (a finite sequence of characters) comes closer i.e. the dissimilarity reduces (if not vanishes!) provided the strings are actually matching.<br />
<br />
Usually, two matching strings differ because of four reasons:</div><div class="MsoListParagraphCxSpFirst" style="text-indent: -0.25in;"><span><span>1.<span style="font: 7pt "Times New Roman";"> </span></span></span>Typographical errors and spelling mistakes</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span><span>2.<span style="font: 7pt "Times New Roman";"> </span></span></span>Usage of different conventions of writing similar things</div><div class="MsoListParagraphCxSpMiddle" style="text-indent: -0.25in;"><span><span>3.<span style="font: 7pt "Times New Roman";"> </span></span></span>Words from regional languages transliterated in English and regional influences</div><div class="MsoListParagraphCxSpLast" style="text-indent: -0.25in;"><span><span>4.<span style="font: 7pt "Times New Roman";"> </span></span></span>Combination of some of the above.</div><div class="MsoNormal">Let us look at the following examples:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Sl. #</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Similar Strings</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 135pt;" valign="top" width="180"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Remark</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">HYDERABAD, HYDERAGAD, HYDRABAD</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 135pt;" valign="top" width="180"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Name of a city in India</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">STREET, ST, STR</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 135pt;" valign="top" width="180"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">A common street type</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">3</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">CHAVEZ, SAVEZ, CHAVEJ</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 135pt;" valign="top" width="180"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">A popular Family Name</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">4</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ROAD, RD, RAOD</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 135pt;" valign="top" width="180"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">A common street type</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">Three strings in the first row are matching but with obvious spelling mistakes. Matching strings in the second row are a result of using different conventions of writing the same street type. A popular <br />
Latin American last name is written with different spelling in the third row whereas the matching strings in the fourth row display a mix of different conventions and typographical errors.<br />
<br />
Techniques described earlier can handle typographical errors and spelling mistakes. But for handling other types of dissimilarities in matching strings, we need to use various transformations.<br />
<br />
I will discuss t transformations that I have used in different situations. But before that, let us see what happens after matching. We will talk about indirect matching and survivor selection that takes place after matching.</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-24915833576680538402011-05-30T10:41:00.000+05:302011-05-30T12:59:17.374+05:30Building Blocks – String Matching<div class="MsoNormal">We discussed earlier how a match key is formed using the transformed values of several fields and the associated match techniques. Basically we need to compare two strings. <br />
Before discussing this, it will only be fair to state here that the transformation mentioned earlier brings to apparently distant strings closer.</div><div class="MsoNormal">One such transformation could be standardization which can bring to strings CALCUTTA and KOLKATA together.<br />
<br />
Two strings may be compared for an exact match. <br />
Many match engines are based on such exact matches. Let us consider the following strings:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Base String</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Input String</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">INNOCENT</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">INNOCEMT</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">EXPRESSION</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">EPXRESSION</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">PRICEWATERHOUSECOOPERS</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 239.4pt;" valign="top" width="319"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">PRICEWATERHOUSECOOPER</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal"><br />
<div class="MsoNormal">In each row, the two strings are close enough to conclude them to be matching but none of these pairs is an exact match. This kind of situation arises largely from typographical errors. And consequently, any match engine that uses exact match on the match keys will fail to match the corresponding keys.</div></div><div class="MsoNormal"><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;">A distance function may address such issues. It is a function that takes into account two strings and returns a number that represents the distance between the input strings. A distance function needs to have the following properties:<br />
<br />
</span><br />
1. Distance between two exactly same strings is 0<br />
2. Distance between two input strings is non-negative<br />
3. Distance between two strings increases (at least non-decreasing) when the similarity between the strings decreases.</div><div class="MsoNormal">Let us define one such distance functions.<br />
</div><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <div class="MsoNormal" style="line-height: normal;">Suppose s<sub>1</sub> and s<sub>2</sub> are two strings each of which are of length l<sub>1</sub> and l<sub>2</sub>. Also call the distance function between s<sub>1</sub> and s<sub>2</sub> by d (s<sub>1</sub>, s<sub>2</sub>). Then the above three rules can be expressed as:</div><ol start="1" style="margin-top: 0in;" type="1"><li class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">d (s<sub>1</sub>, s<sub>1</sub>) = 0</li>
<li class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">d (s<sub>1</sub>, s<sub>1</sub>) ≥ 0</li>
<li class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">d (s<sub>1</sub>, s<sub>1</sub>) ≥ d (s<sub>1</sub>, s<sub>3</sub>) when s<sub>1</sub> is more similar to s<sub>3</sub> than s<sub>2</sub></li>
</ol><div class="MsoNormal" style="line-height: normal;">Let us define one such function here.<br />
Let m be the number of position-wise matching characters in s<sub>1</sub> and s<sub>2</sub></div><div class="MsoNormal" style="line-height: normal;"><u>Case1:</u> l<sub>1 </sub>= l<sub>2 </sub></div><div class="MsoNormal" style="line-height: normal;">Here we can consider d (s<sub>1</sub>, s<sub>2</sub>) = 1 – m/l</div><div class="MsoNormal" style="line-height: normal;">Note that the maximum possible value for m is l and that happens when s<sub>1</sub> and s<sub>2</sub> are exactly same. In such a case the distance becomes 0.</div><div class="MsoNormal" style="line-height: normal;"><u>Case2:</u> l<sub>1</sub> > l<sub>2 </sub></div><div class="MsoNormal" style="line-height: normal;">Here we consider d (s<sub>1</sub>, s<sub>2</sub>) = 1 – m / [l<sub>2 </sub>(l<sub>1</sub> – l<sub>2</sub> + δ)] where δ > 0 a constant.</div><div class="MsoNormal" style="line-height: normal;">Note that the maximum possible value for m is l<sub>2</sub> and that happens when s<sub>2</sub> is a sub-string of s<sub>1</sub>. <br />
While fixing a value for δ, it has to be kept in mind if the length of s<sub>1</sub> is one more than the length of s<sub>2</sub> and s<sub>2</sub> is a sub-string of s<sub>1</sub> then d (s<sub>1</sub>, s<sub>2</sub>) = 1-1/(1 + δ)<br />
If we want two strings as above (lengths differ by one and the smaller one is a sub-string of the other) to differ by 5 unit then 1-1/(1 + δ) = 0.05 => δ = 0.05 (appx.). In fact, we see that in order to have such strings closer, we need to set δ as a very small positive number.<br />
<br />
The distance function defined above lies between 0 and 1 and a match probability can be defined using this distance function. For example, the match probability can be p (s<sub>1</sub>, s<sub>2</sub>) = 1 - d (s<sub>1</sub>, s<sub>2</sub>)</div><div class="MsoNormal" style="line-height: normal;">However, the above distance function is only an indicative one and can be further improved.</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-2900798087492224472011-05-27T14:09:00.001+05:302011-05-27T14:09:58.370+05:30Restricting False Positives – Cut-Off point for each match-key<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">While discussing the basic framework, we saw the two cut-off points m and M between 0 and 1 which are applied on the match probability λ(p) in a way that if λ(p)> M then we conclude that p ε M, if λ(p) < m then p ε U and if m < λ(p) < M then p ε S<br />
<br />
Here the pre-defined points m and M are called cut-off points.<br />
Cardinality of the set of matching pairs reduces when the cut-off point M increases. So, one way of reducing the number of false positive matches seems to be increasing the value of cut-off point M.<br />
But it could result in some genuine matches to land up in the set of suspected matches and thereby the total cost of error gets increased.<br />
This is a typical issue encountered while implementing data quality solutions.<br />
One way to address this issue is to introduce cut-off points for the individual match-keys.<br />
<br />
While discussing match-key, we talked about the probability/indicator returned by each match-key during a comparison involving a pair p. These probabilities/indicators are denoted as λ<sub>i</sub> where the suffix i runs from 1 to k in a system with k match-keys.<br />
<br />
Besides having a cut-off point M for the composite probability/indicator, we can define cut-offs for each match-key and call those M<sub>i</sub> such that if λ<sub>i</sub> ≥ M<sub>i</sub> for all i = 1 to k then only the composite probability/indicator for the underlying pair is calculated otherwise it is set to 0.<br />
<br />
By introducing the M<sub>i</sub>’s we will be able to restrict false positive matches.<br />
<br />
</div>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-11468945576948424322011-05-27T14:08:00.000+05:302011-06-06T13:54:18.342+05:30Enhancing Match-Keys - Cross-Matching<div class="MsoNormal">Consider the following names:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Sl. #</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 76.9pt;" valign="top" width="103"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 1.25in;" valign="top" width="120"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name Prefix</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 1in;" valign="top" width="96"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 76.9pt;" valign="top" width="103"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">GABRIEL</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 1.25in;" valign="top" width="120"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MARQUIZ</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">DE</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 1in;" valign="top" width="96"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">PEREIRA</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 76.9pt;" valign="top" width="103"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MARQUIS</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 1.25in;" valign="top" width="120"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">GEBRIEL</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 94.5pt;" valign="top" width="126"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">DE</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 1in;" valign="top" width="96"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">PERERA</div></td> </tr>
</tbody></table><div class="MsoNormal">Suppose the address information is matching on these two records. A manual inspection tells us that the records are indeed matching.<br />
How to establish the match in names?<br />
Well, the Last Names do match (well…close enough) and we do not compare the Prefix. But what about the Given Name and Middle Name? Apparently, they have switched places besides initial being used on the second record.<br />
Only way to find the match in Given Name and Middle Name is to perform a cross-matching between these fields. That is, the match keys should be defined in a way that Given Names are also compared with the corresponding Middle Names and vice-versa.<br />
<br />
This is a good time to introduce the reader to a matching objective i.e. House-Holding.<br />
Here we try to cluster (or link) the records belonging to the same house-hold. Well…in many cases, we define a house-hold to be comprised of members sharing the same Last Name, Address and the residential Phone Number.</div><div class="MsoNormal">Now, let us see the following two names and consider these for house-holding:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Sl. #</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 102.1pt;" valign="top" width="136"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 87.3pt;" valign="top" width="116"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 102.1pt;" valign="top" width="136"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">JONATHAN</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 87.3pt;" valign="top" width="116"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">A</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ABOTT</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 102.1pt;" valign="top" width="136"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">ABOTT</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 87.3pt;" valign="top" width="116"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">MARGARET</div></td> </tr>
</tbody></table><div class="MsoNormal">Suppose the address information is matching on these two records. A manual inspection tells us that the records belong to the same house-hold. How is the Last Names matching? Well…Last Name on the first record matches to the Given Name on the second record. Again name components have switched places. But there is a little difference in the way we do cross-matching from the earlier example.<br />
In the earlier example, we checked if the Given Name on one record matches to the Middle Name on the other record and the Middle Name on the second record matches to the Given Name on the first record. But for the house-hold matching example, we check if the Last Name on one record matches to the Given Name on the other record.<br />
<br />
In DQ terminology these are known to be 2-way cross-matching and 1-way cross-matching.<br />
Let us look at one more example where cross-matching has occurred involving three name fields:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Sl. #</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 102.1pt;" valign="top" width="136"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 87.3pt;" valign="top" width="116"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 102.1pt;" valign="top" width="136"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">RAJ</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 87.3pt;" valign="top" width="116"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SINGH</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">THAKOR</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 36.5pt;" valign="top" width="49"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 102.1pt;" valign="top" width="136"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">THAKORE</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 87.3pt;" valign="top" width="116"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">RAJ</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">SINGH</div></td> </tr>
</tbody></table><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;"><br />
Though theoretically such component switching can take place between any two (or more) fields in the database, we usually keep the cross-matching involving a few pair of fields.</span><br />
<span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;"></span><br />
<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;">Cross matching can be implemented when we use soft match-keys as discussed earlier. But there are many data quality tools out there which are built using hard match-keys alone.<br />
So what do we do if one such tool is being used?<br />
There is an alternative route that takes care of the cross matching in a limited way. Suppose we have two similar fields Field1 and Field2 which we want to cross match. In this case, first we make sure that there is no record with non blank Field2 and blank Field1. Then we make two copies of the records where both these fields are non blank. One such set of records is allowed to pass through as it is but for the records in the second set, we copy Field2 and overwrite Field1 with the values in Field2. Finally we append these records together to make a bigger set of records. Obviously, we put some identifier for each record to identify if they came from set1 or set2. Hard-key based match can be performed on this bigger set of records to arrive at the exactly same conclusions.<br />
However, there is one limitation in this approach. For each such pair of fields we have to add new records in the database and this takes time/effort besides disc space. </span><br />
<span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;"> <br />
</span>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-28187767418638437612011-05-26T18:29:00.000+05:302011-05-29T16:46:54.284+05:30Rome was not built in a day – Key based matching<div class="MsoNormal">Basic framework as discussed earlier, talks about match keys. It also says that for each pair of records, comparison at each match key returns a result λ<sub>i</sub> where 0 ≤ λ<sub>i</sub> ≤ 1.<br />
λ<sub>i</sub>’s are called match probability if λ<sub>i</sub>’s can assume any real number in the unit interval or it is called a match indicator if it can have only two values, 0 and 1. <br />
<br />
λi’s play the pivotal role in determining if a pair (of records) should be put in M, U or S.<br />
<br />
Earlier, I said the each match key involves several fields. The idea of a match key is a pair of records should be matching if and only if the underlying records closely resemble each other at the fields which constitute the match key.<br />
Let me explain this by an example:<br />
<br />
</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none; width: 649px;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>#</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 65.45pt;" valign="top" width="87"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. No.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Type</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Apt</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>City</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>ZIP</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 65.45pt;" valign="top" width="87"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Johnn</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">P</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Apt 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 65.45pt;" valign="top" width="87"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">John</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">M</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Ste 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">3</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 65.45pt;" valign="top" width="87"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Jason</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Peter</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">M</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Blvd</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">A 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">A closer look tells us that first two records are probably the same i.e. they represent the same individual.<br />
But it is highly likely the third record belongs to a different individual.</div><div class="MsoNormal">The logic behind automatic matching should closely follow our thinking process when we say that the first two records are probably the same.<br />
<br />
Now let us look at the values on these two records. Given Names are close…may not be an exact match…but very close. Middle Names are not contradicting each other. Last Names are exactly the same. Street numbers are the same. On the street name, well… the initial characters are matching and there is no contradiction. Numeric digits are the same on Apartment Information while Cities as well as <br />
ZIP codes are the same.<br />
<br />
If we have a match key comprising of the Last Name, St. Number, Numeric portion of Apartment Information and ZIP Code then the first two records will agree on this key.<br />
<br />
But the third record will also agree with the first two records on this match key. That is simply because; we have overlooked the Given Name.<br />
To address this issue, we include Given Name in the match key definition. Unfortunately on the first two records, Given Names are similar but not exactly the same. So, instead of the Given Name value, our match code needs to include a transformed value of Given Name… a transformation so that JOHNN becomes JOHN but JASON does not become JOHN.<br />
<br />
Now let us look at the two records in the following table:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none; width: 649px;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>#</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 51.95pt;" valign="top" width="69"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. No.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Type</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Apt</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>City</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>ZIP</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 51.95pt;" valign="top" width="69"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">John</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Peter</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Apt 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 51.95pt;" valign="top" width="69"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">John</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 58.5pt;" valign="top" width="78"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Proctor</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Ste 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">This pair of records has a good amount of similarity. Probably these represent the same house-hold. But unfortunately they represent probably different individuals. Since their Middle Names are contradicting.<br />
Probably, we need to modify our match key so that the entire Middle Names are compared when both of the Middle Names have length more than 1, only an initial match on this field is performed when on at least one record, the length of Middle Name is 1 and a blank should be allowed to match a non-blank Middle Name.<br />
<br />
So, the above match technique for the middle name is not a transform in the sense that it does not change the underlying values of the Middle Name. <br />
<br />
So match keys are combination of a few transformed field values with associated match technique(s).<br />
<br />
We defined one match key above. However, we need multiple match keys.<br />
To understand this, let us look at the following records:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none; width: 697px;"><tbody>
<tr> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border: 1pt solid windowtext; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>#</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Given Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Middle Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Last Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. No.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Name</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>St. Type</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Apt</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>Cell No.</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>City</b></div></td> <td style="background: none repeat scroll 0% 0% rgb(223, 223, 223); border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>ZIP</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">John</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Peter</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Apt 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1234567890</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">2</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">John</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">P</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">25</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Main</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Street</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Ste 225</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1212121212</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Kansas</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">11111</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 25.45pt;" valign="top" width="34"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">3</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 42.95pt;" valign="top" width="57"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">John</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Peter</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 0.75in;" valign="top" width="72"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Morkel</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 45pt;" valign="top" width="60"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1750</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Collins</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Blvd</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 49.5pt;" valign="top" width="66"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">102</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 67.5pt;" valign="top" width="90"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">1212121212</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 63pt;" valign="top" width="84"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Richardson</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 40.5pt;" valign="top" width="54"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">75068</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">A close look at the above records will reveal that the first two records match as per the match key we discussed earlier but none of these will match to the third record. Address information on the third record is totally different. But the cell number on the third record matches to the cell number on the second record. It looks like the same person, at different point in time was in a different location but for at least sometime maintained the same cell number.<br />
To capture this match, we have to use a different match key involving Given Name, Middle Name, Last Name and Cell Number.</div><div class="MsoNormal">In a real life scenario, we will have to deal with many more address fields as well as other fields like SSN, Tax Id etc. So we will have to have many match keys defined in the system.<br />
Probably, a field (or more than one field) is common between two match keys. It may so happen that either the associated match technique or the transformation is different in these cases.<br />
<br />
<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <br />
<div class="MsoNormal">Again, match-keys can be defined in two ways. Suppose transformed values of n fields are used to define a match-key. We may just concatenate the values to obtain the match key. Such a key is called a hard match-key. Alternatively, we can define the key as ordered set of n string values. Such a key is called soft match-key. Though many matching engines are built using hard match-keys (sometimes these are called match codes), there are tools which are built using both soft-key and hard-key. Soft-key has an added advantage that it is flexible enough but comparing each component value in a soft-key takes longer matching time for the entire key. That is why I prefer using two sets of keys. Records are passed through the hard match-key first to select a possible matching pairs and then these pairs are evaluated once more using soft match-key. This is called 2-step matching.</div><div class="MsoNormal"><br />
</div></div><span style="font-family: "Calibri","sans-serif"; font-size: 11pt; line-height: 115%;">We will come back to key based matching and discuss match probability and match indicator that I touched upon at the beginning of this post. But before that we need to examine one more property of a match key and the transformations that were discussed in this post.</span>Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-36811531312461679722011-05-25T16:49:00.001+05:302020-04-06T12:31:01.291+05:30Errors in matching<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
As a continuation of my previous post, here we discuss the possible errors in a matching that fits the above framework.<br />
<br />
In fact, if we divide the possible pairs into two groups M (group of matching pairs) and U (group of <br />
non-matching pairs) then there are two types of errors.<br />
Type I error or false positive matches and type II error or false negative matches.<br />
A false positive match P<sub>F</sub> ε MU<sub>P</sub> and a false negative match N<sub>F</sub> ε UM<sub>P<br />
</sub>Let the average cost of a false positive match be C<sub>P</sub> and the average cost of a false negative match be C<sub>N<br />
</sub>So the total cost of matching error can be defined as: <br />
E (L) = [C<sub>P</sub>N(P<sub>F</sub>) + C<sub>N</sub>N(N<sub>F</sub>) ] [2]<br />
where N (P<sub>F</sub>) denotes the number of false positive matches and N(N<sub>F</sub>)is the number of false negative matches.<br />
However, in reality, actual values of N(P<sub>F</sub>) and N(N<sub>F</sub>) will be very difficult to obtain and hence, we will estimate these values by executing this matching on a smaller number of representative set of records.<br />
<br />
In case, the match engine divides the possible pairs into three groups M, S and U as mentioned in the framework, there will be one more component in the error expression. This new component will be contribution of S.<br />
Suppose the cost of processing/resolving a suspect match (a member of S) is C<sub>S</sub> and the number of pairs in S are N(P<sub>S</sub>) then this error component will be C<sub>S</sub>N(P<sub>S</sub>)<br />
And hence, the error expression becomes:<br />
E (L) = [C<sub>P</sub>N(P<sub>F</sub>) + C<sub>N</sub>N(N<sub>F</sub>) + C<sub>S</sub>N(P<sub>S</sub>)] [3]<br />
<br />
Let us concentrate on the expression [3] because as per the basic framework, we have produced three subsets of the possible pairs as the output of matching.</div>
<div class="MsoNormal">
Obviously, we would want to reduce the matching error or E (L). Note that the variables in this expression are N(P<sub>F</sub>), N(N<sub>F</sub>), N(P<sub>S</sub>) i.e. the number of false positive pairs, number of false negative pairs and the number of suspected matching pairs.</div>
<div class="MsoNormal">
The number of false positive matches can be reduced by making the match criteria (or the match keys) more stringent. But this ensures that some genuine matches are identified as non-matches i.e. this action increases the number of false negative matches.</div>
<div class="MsoNormal">
Similarly, the number of false negative matches can be reduced by making the match criteria (or the match keys) more relaxed. But this ensures that some genuine non-matches are identified as matches i.e. this action increases the number of false positive matches.</div>
<div class="MsoNormal">
So, the match rules are made stringent or relaxed based on the relative values of the cost of a false positive (C<sub>P</sub>) and the cost of a false negative (C<sub>N</sub>)<br />
<br />
<br /></div>
<div class="MsoNormal">
The last variable that contributes to the matching error E(L) is the number of suspected matching pairs i.e. N(P<sub>S</sub>).<br />
Obviously it depends on the value of (M-m) i.e. the length of the suspect interval. Apart from this, it depends on the following factors:<br />
<br />
Before trying to reduce the number of suspect matches, let us stop here and investigate why do we have the suspect matches in the first place. Matching records should look similar and non-matching records should not look similar. Ideally yes! But there are reasons why the distinction between a match and non-match is blurred.<br />
Let us look at some of those reasons:<br />
<br />
<u>Accidental closeness of the records (the values in the fields)</u><br />
As an example, the name strings TIRTHANKAR and DIPANKAR are close enough. A good amount of similarity in the surname clubbed with address information in two records with the given name TIRTHANKAR and DIPANKAR may very well put the underlying pair into the set of suspected matching pairs.<br />
<br /></div>
<div class="MsoNormal">
<u>Cultural Mix</u><br />
These days, the effect of this is proving to be costly. Let me give an example, a bit extreme though.<br />
Suppose we are processing data from a country where the popular nickname BILL does not mean WILLIAM. Unfortunately someone from the USA has settled in this country. This person has a name WILLIAM.<br />
Since, the rule does not allow nickname matching, we do not match BILL and WILLIAM and hence the two records (both corresponding to the immigrant from the USA), instead of going into the set of definite matching pairs, land up in suspected matching pairs.<br />
<br /></div>
<div class="MsoNormal">
<u>Missing Values and Typographical Errors</u><br />
Missing Values, at times do not allow otherwise matching records to be close enough. As an example, consider a pair where one record does not have the given name (or any other critical) field filled-in.<br />
In such a case, the match score, instead of being high, will be comparatively lower which may result in the pair being landed up in the set of suspected matching pairs. Sometimes, a missing value (or missing values) may bring two otherwise dissimilar records closer, may be in the set of suspected matching pairs. Similar observations can be made for typographical errors.</div>
<div class="MsoNormal">
<u>Insufficient match settings</u><br />
Factors that drive match settings can be discussed at length. But without taking the deep dive it can be said that incorrect settings (may be incorrect parsing rule) can increase or decrease the match probability and thus bring a pair of records to the set of suspected matching pairs instead of the two other sets.<br />
<br />
From the above discussion, we see that a considerable portion of the variable N(P<sub>S</sub>) are dependent on factors beyond our control. Besides, the reduction of N(P<sub>S</sub>) may result in an increase in N(P<sub>F</sub>) and/or N(N<sub>F</sub>). It is better to reduce N(P<sub>F</sub>) and/or N(N<sub>F</sub>) by fine-tuning the match settings instead.</div>
</div>
Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0tag:blogger.com,1999:blog-7991052372783853348.post-91286674731547703332011-05-24T17:50:00.000+05:302020-04-06T12:57:38.835+05:30A basic framework for matching or linking records<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
Let A be a file containing n records. The inner product A X A i.e. the set of all possible pairs, is ideally composed of three subsets M<sub>P, </sub>S<sub>P</sub><sub> </sub>and U<sub>P</sub>.<br />
M<sub>P</sub> is the set of pairs of matching records, S<sub>P </sub>is the set of pairs of suspected matching records and U<sub>P</sub> is the set non-matching pairs.<br />
<br />
Our aim is to find a match rule (L) such that any pair of records in A X A falls in one of the three following sets:<br />
M = set of definite matching pairs.<br />
S = set of suspected matching pairs.<br />
U = set of non-matching pairs.</div>
<div class="MsoNormal">
Any record in A is composed of several fields. Ideally, we build several match keys involving these fields. For example, let us consider the fields on the records to be:</div>
<div class="MsoNormal">
Given Name, Middle Name, Surname, Name Suffix, Street Number, Street Type, Apartment Number, Floor, Post Code, Locality, City, State, Country, Telephone Number, Mobile Number, SSN.<br />
A match key may involve Given Name, Surname, Street Number, Street Type, and Apartment Number.<br />
Another match key may involve Given Name, Surname, Telephone Number, Post Code and City.<br />
<br />
Suppose there are k match keys defined in the system.<br />
<br />
A pair (p) is compared at every defined match key i.e. for any pair, comparison is done for each match key and consequently a number λ<sub>i</sub> (p) (or just λ<sub>i</sub>) is returned to indicate the comparison result for the <br />
ith. Match key. <br />
Here, 0 ≤ λ<sub>i</sub> ≤ 1 for I = 1(1)k</div>
<div class="MsoNormal">
Let {λ<sub>1</sub>, λ<sub>2</sub>, …, λ<sub>k</sub>} be the comparison vector.<br />
<br />
Let us define match probability to be the maximum value of these comparison results. <br />
If the match probability is λ (or λ(p))then λ = MAX{λ<sub>1</sub>, λ<sub>2</sub>… λ<sub>k</sub>} [1]<br />
[1] ensures that in order to be a match, a pair must be matching at least in one match key.</div>
<div class="MsoNormal">
Let us denote the match probability corresponding to the pair p by λ(p). <br />
<br />
<!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--><span style="font-family: "calibri" , "sans-serif"; font-size: 11pt; line-height: 115%;">Alternatively some other single-valued function in the range of 0 to 1 (could be a weighted average) can be considered instead of MAX in [1].</span><br />
<br />
We also define two positive numbers 0 < c < l < 1 such that, if λ(p)> l then we conclude that p ε M, if λ(p) < c then p ε U and if c < λ(p) < l then p ε S<br />
<br />
The above rule put all the possible members of A X A (there are <sup>n</sup>C<sub>2 </sub>such pairs in A X A) into three subsets M, U and S.</div>
</div>
Tirthankar Ghoshhttp://www.blogger.com/profile/14687942450731369225noreply@blogger.com0