Wednesday, September 7, 2011

Addresses in India

Addresses in India are so confusing and follow many patterns from region to region that it makes address matching/parsing/enrichment very challenging. It is really difficult to write an article on this. Someone who has done years of research into this will require an entire volume to come up with the results.
However, many issues that I have faced during various data quality centric implementations prompt me to write something on it.
In this post, I am going to discuss about a few complicated patterns and some of their numerous exceptions.
According to the standard addressing convention, a typical street address has three components where a street number is followed by a street name which is then followed by a street type (there are places where street type appears before the street name). I will begin my discussions with this convention. Yes in many cities in India this is a standard addressing convention.

Ariff Road

A close friend of mine lives in the address “27/2 Ariff Road” which is not very far from my home. When I visited her last, out of my curiosity, I took a tour of the entire Ariff Road and looked at the addresses written on the surrounding houses and shops.
By the way, Ariff Road is a relatively narrow lane in North Kolkata. There are a few lanes and by-lanes that originated from Ariff Road. Interestingly most of these are called Ariff Road too. At least, the address on these houses in such lanes and by-lanes bear Ariff Road name. I was rather surprised to see the numbers on these addresses. These were not just unordered but totally chaotic. The house opposite to “27/2 Ariff Road” was “1H/2A Ariff Road”. I came across another house with the address “12/7/A/1 Ariff Road”
We find such addresses in many areas in North Kolkata. We will have to refer to history of this city to find how such addresses came into existence. It is not a planned city and was formed by the British in the late 17th. Centaury after the agent J. Charnok purchased three villages from local landlord Sabarna Chowdhury. Slowly but steadily this city as desired by the British, started to grow without any master plan.
So when the postal system was put in place, the numbers were assigned in some order. But subsequently new houses were built and house-holds/families got split requiring separate addresses. Second observation is crucial here. Like in many places, in this part of the country too, the system of joint family (extended family) was prevailing. Obviously they required large houses. But with the passage of time, this system changed and some of the family member moved out while some others continued to live under the same roof but built separate dwelling units. Some of them rented out a portion of their premises. A significant number of these old houses are now sold to the real estate developers and promoter who are building multi-storied apartments. And the entire system is becoming complicated.

Main Road & Cross Road

There are a few places (Bangalore or Bangaluru is one of those) where the entire area is divided by main roads running in one direction and cross roads running perpendicular to it. An address in such a place is described by the nearest main and cross road information besides the house name/number.

Plot/Block/Sector

Marking an address by these (or some of these) is found in the planned cities in the country. Addresses in Chandigarh (capital of two neighboring states as well as a union territory itself) contain sector information.
In Salt lake area (a suburb of Kolkata), the entire region is divided into sectors. There are blocks in each sector and plots in each block. So a typical address here looks like:
“Plot Y 14, Block – EP, Sector 5 Salt Lake”

Laxminarayan Jewelers - different entities with similar names/addresses

Few months back, I took a tour of the city Kolkata. My intention was to observe the addresses on the houses I come across. In one area, I saw a number of shops with the same name
“Laxminarayan Jewelers”. Sometimes I noticed a little variation “Laxminarayan & Sons”. These shops were located in the basement of a huge building.
It took a few weeks for me to find out the history behind this. Someone called “Laxminarayan” established a shop many years ago. But his sons got separated and started their own business under the same roof but as different entities. They all bore matching names and addresses (addresses differed by a number like UNO 1 55 XYZ Road, UNO 2 55 XYZ Road etc.)

“Diagonally Opposite to” -a land of landmarks

Last month I took a new telephone connection. In the process, one executive from this telecom company called me to verify/cross-check the address that I provided in the application. She repeatedly asked for a landmark near my house.
While profiling addresses in India, rampant usage of landmark information is noticed. Some of the identifiers for landmark are “Near”, “Opposite to”, “Behind”, “Beside”, “Next to” and not to forget “Diagonally Opposite to”
Adjacent houses or apartments


Look at this address - Office Space 2 & 5, Paramount Complex, Navelim, Goa – 403707. It is a commercial address that points to a shop. This shop, however, is spread across two shopping units in the same floor of a shopping complex.
Also, one my friend has this address: Apt 5C & 5D, 12 Mandevilla Gardens, Kolkata – 19

This possess a serious challenge for address parsing as we need to have multiple fields for containing similar  information like two fields for apartment number, two fields for street number etc.

Personal Names in addresses

Many Indian addresses and esp. the ones from rural areas begin with a personal name. Usually the name of the head of the family is mentioned in the addresses. Many times, post men, in these areas, know people by the name and in-case the letter is addressed to someone else in the family who is not known to the post man, the letter gets delayed. This is the primary reason for using the name of the head of the family in the address. Most popular keyword to identify such names is C/O or “care of”.
There are other variations of C/O where the exact relation is mentioned like “or S/O or son of”, “D/O or daughter of”, “M/O or mother of” etc.
A typical address in this format looks like:
“C/O Ashim Biswas, 33 Govinda Naskar Lane, Sriharipara”

Addresses in Goa

Goa is a famous tourist spot in India. There is another equally interesting fact surrounding this place. India was dominated by the British for over two hundred years. Goa, on the other hand, was dominated by the Portuguese for over four hundred years. India got its independence in 1947 but the operation “Vijay” was carried out by Indian army in 1961 to liberate Goa.
Names including individual names as well as name of places/buildings/roads in this place sometimes follow the Portuguese style.
Here one can find a street named “18th. June Road”

Now another controversy and debate is going on regarding renaming these streets and buildings!

Roman digits

Usage of roman digits is abundant in Indian addresses. Consider the address:
“C-1 295/296 Rohini Sector-11 Near Bay Japanese Park Back Of welcome Hotel”. In addresses like this, Sector-11 sometimes written as Sector – XI (or Sec XI). Usually sector numbers in Indian addresses at times, are written using roman digits.

New and Old

Yesterday evening, I was walking down a street named “Camac Street”. When I was looking at a new sign board which displaying another name for this famous street, one my friend happened to call me and asked me where I was. I said “Abanindranath Thakur Sarani” and he expressed his concerns that I was in a weird place. I had to tell him that the new name for the “Camac Street”, according to the Kolkata Municipal Corporation, was “Abanindranath Thakur Sarani” to settle things!
Places in India are slowly coming out of its colonial structure and conventions and as a part of this, renaming things is a commonplace now. For this reason, cities like “Bombay” has renamed as “Mumbai” or “Madras” has become “Chennai” and the list continues. Well... “Kolkata” is no different here. Few years back, this city was known as “Calcutta”. It seems that my state “West Bengal” will soon become “Pashchimbanga”.

Also, the postal department has introduced new postal code (or PIN code) numbers recently. PIN code of the place where I live is 700136. Some of the courier companies refused to deliver packages to my residence initially as they were using the old number 700059

How about this address? “Old No 160, New No 111,6th Street Extension, 100 Feet Road, Gandhipuram”

Multiple Locations

Many addresses in this country contains one or two locations. An example will be:
“36, Arakashan Road, Ram Nagar, Paharganj, Diagonally Opposite to New Delhi Railway Station”
This address points to a commercial place in “Paharganj” area of “New Delhi” (capital of India). It also says that the place is in “Ram Nagar” area which is located in “Paharganj”. Usually the first location encountered when the address is read (from left to right) is located within the second location mentioned in the address.
But one will come across plenty of addresses involving more than two locations.
An example will be:
“8A/48, W.E.A. Channa Market, Karol Bagh, Behind Pusa Road”

Incomplete or partial address

Look at the business address: “C/O Star Investments, OPP. Head Post Office Panjim, Tiswadi – 403001”
This address does not contain a street name and number and yet a valid address i.e. addressee can be reached using the postal service.
Most of addresses such as the above can be written in multiple ways which are apparently not similar.

3 comments:

  1. Hi Thirthankar,
    The points you mention are definitely encountered however all of these can be solved and I know of the corporate house which has solved this too . many more instances which also emanates from rural areas,partial addresses etc have also been dealt with.

    ReplyDelete
  2. Dear Anonymous,

    I have not said these are are not solvable. In fact I implemented a few solutions with various clients in the country. I intended to showcase the complexity of handling Indian data (conventions and nuances).
    Having said this, I must say that any of these cannot be 100% addressed. You need to live with some error margin. 100% data quality is just imaginary like the concept of reaching the destination in the dichotomy paradox (Zeno).

    ReplyDelete