Deriving gender from name is important for two reasons.
1. For database marketing, using gender information in addressing the offer letter is crucial.
For example, we can address “John Smith” as “Dear Mr. Smith” and “Peggy Smith” as “Dear MS Smith”
For example, we can address “John Smith” as “Dear Mr. Smith” and “Peggy Smith” as “Dear MS Smith”
2. Gender code can improve matching by restricting false positives.
Usually Genderization is done using the name components. We will discuss this process in brief for Anglo-Saxon names before jumping into various issues of Genderization in Indian context.
Typical name components are: Salutation/Title, First Name, Middle Name, Last Name and Name Suffix.
Among this, a salutation or title can determine gender uniquely. As for example, values like ‘Mr.’, ‘Mrs.’ can be very helpful for gender determination. But there could be values like ‘Prof.’, ‘Dr.’, which do not give the gender information or the value in this field could be blank. In such cases, we check the first name. Typically a first name like ‘Robert’ corresponds to a male. Sometimes a first name cannot determine the gender uniquely. Then we check the middle name if that can uniquely determine gender. Usually the last name component is not used to determine gender. But name suffixes are surely helpful. Suffixes like ‘Sr.’, ‘Jr.’ point to the male gender.
Among this, a salutation or title can determine gender uniquely. As for example, values like ‘Mr.’, ‘Mrs.’ can be very helpful for gender determination. But there could be values like ‘Prof.’, ‘Dr.’, which do not give the gender information or the value in this field could be blank. In such cases, we check the first name. Typically a first name like ‘Robert’ corresponds to a male. Sometimes a first name cannot determine the gender uniquely. Then we check the middle name if that can uniquely determine gender. Usually the last name component is not used to determine gender. But name suffixes are surely helpful. Suffixes like ‘Sr.’, ‘Jr.’ point to the male gender.
Using the above logic, in most of the cases, we use the following:
1. Determine gender from title (or salutation), if possible.
2. If gender code is blank, check the suffix and assign a gender code, if possible
3. If gender code is blank then check the first name if gender code can be derived
4. If gender code is still blank then check the middle name if gender code can be derived
5. If gender code is still blank, set it to ‘U’
Above is the outline of the Genderization process for a typical Anglo-Saxon name. Now we will see how the above logic can be modified for tackling Indian names.
We will see the challenges in Indian naming system first so that deriving the gender code becomes less complicated.
1. Middle Names should not be evaluated for genderization except for the rules 6 and 7 below.
This is for the fact that people in various parts of the country mention their father’s (husband’s, in the case of a married woman) first name as the middle name.
Therefore for a name like ARUNA PRASHANT IYER, PRASHANT could be her (ARUNA is a female name) father or husband.
This is for the fact that people in various parts of the country mention their father’s (husband’s, in the case of a married woman) first name as the middle name.
Therefore for a name like ARUNA PRASHANT IYER, PRASHANT could be her (ARUNA is a female name) father or husband.
2. Sometimes, first names (remember, we will derive the first name after parsing) lead to the wrong gender code. In such cases, first name should be clubbed to the middle name (or the initial part of the middle name) to derive the gender code. Let us check an example of this. Consider the name DEBIKA RANJAN SEN. Our parsing rule will classify DEBIKA as the first name, RANJAN as the middle name and SEN as the last name. Note that in Indian language, the name is DEBIKARANJAN which points to the gender code ‘M’. But, DEBIKA is a female name. So the gender code from the first name will be ‘F’… (incorrect).
3. Last names might come handy in a few cases. This is unlike Anglo-Saxon names, last names like BIBI, BEGUM, DEBI, KAUR, KHATUN, SULTANA etc. indicates a female name.
4. Name Suffix is rarely used in India.
5. Presence of words like MOHD. (or any variation of this), KAZI, HAJI, SAYED etc. anywhere in the name indicates a male name.
6. If first name ends with (or if the first word in the middle name) is BHAI, it is a male name.
Consider the name DADANBHAI KADVE. Here the first name ends with BHAI. So it is likely to be a male name. This name could also be written as DADAN BHAI KADVE. In this case, entire middle name is BHAI. So the gender code derived from the middle name is ‘M’. Another name could be DADAN BHAI NIRMAL BHAI KADVE. Our parsing rule will store DADAN as the first name, BHAI NIRMAL BHAI as the middle name and KADVE as the last name.
Consider the name DADANBHAI KADVE. Here the first name ends with BHAI. So it is likely to be a male name. This name could also be written as DADAN BHAI KADVE. In this case, entire middle name is BHAI. So the gender code derived from the middle name is ‘M’. Another name could be DADAN BHAI NIRMAL BHAI KADVE. Our parsing rule will store DADAN as the first name, BHAI NIRMAL BHAI as the middle name and KADVE as the last name.
7. If first name ends with (or if the first word in the middle name) BEN then it is a female name.
Look at the name SMITABEN V SOLANKI. In this case, the first name ends with BEN and consequently it is a female name.
Look at the name SMITABEN V SOLANKI. In this case, the first name ends with BEN and consequently it is a female name.
8. There are some Indian names (first names) that can be used by a male as well as female. Examples of these names would be KAMAL, SUMAN etc.