However, many issues that I have faced during various data quality centric implementations prompt me to write something on it.
A close friend of mine lives in the address “27/2 Ariff Road” which is not very far from my home. When I visited her last, out of my curiosity, I took a tour of the entire Ariff Road and looked at the addresses written on the surrounding houses and shops.
So when the postal system was put in place, the numbers were assigned in some order. But subsequently new houses were built and house-holds/families got split requiring separate addresses. Second observation is crucial here. Like in many places, in this part of the country too, the system of joint family (extended family) was prevailing. Obviously they required large houses. But with the passage of time, this system changed and some of the family member moved out while some others continued to live under the same roof but built separate dwelling units. Some of them rented out a portion of their premises. A significant number of these old houses are now sold to the real estate developers and promoter who are building multi-storied apartments. And the entire system is becoming complicated.
There are a few places (Bangalore or Bangaluru is one of those) where the entire area is divided by main roads running in one direction and cross roads running perpendicular to it. An address in such a place is described by the nearest main and cross road information besides the house name/number.
Marking an address by these (or some of these) is found in the planned cities in the country. Addresses in Chandigarh (capital of two neighboring states as well as a union territory itself) contain sector information.
In Salt lake area (a suburb of Kolkata), the entire region is divided into sectors. There are blocks in each sector and plots in each block. So a typical address here looks like:
“Plot Y 14, Block – EP, Sector 5 Salt Lake”
Few months back, I took a tour of the city Kolkata. My intention was to observe the addresses on the houses I come across. In one area, I saw a number of shops with the same name
“Laxminarayan Jewelers”. Sometimes I noticed a little variation “Laxminarayan & Sons”. These shops were located in the basement of a huge building.
It took a few weeks for me to find out the history behind this. Someone called “Laxminarayan” established a shop many years ago. But his sons got separated and started their own business under the same roof but as different entities. They all bore matching names and addresses (addresses differed by a number like UNO 1 55 XYZ Road, UNO 2 55 XYZ Road etc.)
Last month I took a new telephone connection. In the process, one executive from this telecom company called me to verify/cross-check the address that I provided in the application. She repeatedly asked for a landmark near my house.
While profiling addresses in India, rampant usage of landmark information is noticed. Some of the identifiers for landmark are “Near”, “Opposite to”, “Behind”, “Beside”, “Next to” and not to forget “Diagonally Opposite to”
Look at this address - Office Space 2 & 5, Paramount Complex, Navelim, Goa – 403707. It is a commercial address that points to a shop. This shop, however, is spread across two shopping units in the same floor of a shopping complex.
This possess a serious challenge for address parsing as we need to have multiple fields for containing similar information like two fields for apartment number, two fields for street number etc.
Many Indian addresses and esp. the ones from rural areas begin with a personal name. Usually the name of the head of the family is mentioned in the addresses. Many times, post men, in these areas, know people by the name and in-case the letter is addressed to someone else in the family who is not known to the post man, the letter gets delayed. This is the primary reason for using the name of the head of the family in the address. Most popular keyword to identify such names is C/O or “care of”.
There are other variations of C/O where the exact relation is mentioned like “or S/O or son of”, “D/O or daughter of”, “M/O or mother of” etc.
Goa is a famous tourist spot in India. There is another equally interesting fact surrounding this place. India was dominated by the British for over two hundred years. Goa, on the other hand, was dominated by the Portuguese for over four hundred years. India got its independence in 1947 but the operation “Vijay” was carried out by Indian army in 1961 to liberate Goa.
Names including individual names as well as name of places/buildings/roads in this place sometimes follow the Portuguese style.
Here one can find a street named “18th. June Road”
Now another controversy and debate is going on regarding renaming these streets and buildings!
Usage of roman digits is abundant in Indian addresses. Consider the address:
“C-1 295/296 Rohini Sector-11 Near Bay Japanese Park Back Of welcome Hotel”. In addresses like this, Sector-11 sometimes written as Sector – XI (or Sec XI). Usually sector numbers in Indian addresses at times, are written using roman digits.
Yesterday evening, I was walking down a street named “Camac Street”. When I was looking at a new sign board which displaying another name for this famous street, one my friend happened to call me and asked me where I was. I said “Abanindranath Thakur Sarani” and he expressed his concerns that I was in a weird place. I had to tell him that the new name for the “Camac Street”, according to the Kolkata Municipal Corporation, was “Abanindranath Thakur Sarani” to settle things!
Places in India are slowly coming out of its colonial structure and conventions and as a part of this, renaming things is a commonplace now. For this reason, cities like “Bombay” has renamed as “Mumbai” or “Madras” has become “Chennai” and the list continues. Well... “Kolkata” is no different here. Few years back, this city was known as “Calcutta”. It seems that my state “West Bengal” will soon become “Pashchimbanga”.
Also, the postal department has introduced new postal code (or PIN code) numbers recently. PIN code of the place where I live is 700136. Some of the courier companies refused to deliver packages to my residence initially as they were using the old number 700059
How about this address? “Old No 160, New No 111,6th Street Extension, 100 Feet Road, Gandhipuram”
Many addresses in this country contains one or two locations. An example will be:
“36, Arakashan Road, Ram Nagar, Paharganj, Diagonally Opposite to New Delhi Railway Station”
This address points to a commercial place in “Paharganj” area of “New Delhi” (capital of India). It also says that the place is in “Ram Nagar” area which is located in “Paharganj”. Usually the first location encountered when the address is read (from left to right) is located within the second location mentioned in the address.
An example will be:
“8A/48, W.E.A. Channa Market, Karol Bagh, Behind Pusa Road”
Look at the business address: “C/O Star Investments, OPP. Head Post Office Panjim, Tiswadi – 403001”
This address does not contain a street name and number and yet a valid address i.e. addressee can be reached using the postal service.
Most of addresses such as the above can be written in multiple ways which are apparently not similar.