Table of Content:
- Origin of problem
- Internet Revolution 1.0 and 2.0
- Future Outlook and ongoing projects
Back in 2018, I used to work as an ML engineer/data scientist at a last mile delivery startup in India (similar to DoorDash / Postmates here). Some of our hardest projects in the team used to involve a ton of geocoding data and using ML to solve operational issues related to that.
In this blog, I'll be talking about the unique challenges of geocoding Indian addresses nomenclature and how it became one of the harder problems to solve for e-commerce companies in India. The addresses in India are as subjective and verbose as they can get. Couple of examples:
There were more problems than just this:
- Places adjacent to each other might chose to use extremely different nomenclature.
- Common to miss city names / pin codes or even street names.
- People wouldn't know their pin codes and would frequently put it incorrectly.
Origins of the problem:
Due to non-uniformity in language (topic for another day), decentralised city planning (more like no city planning 🥲) and rampant growth in the last few decades, this challenge has only increased.
Pre-internet, and even now, most of India's postal system depends on the govt. service India Post. Fun fact: it is the most "loss-making" public organisation in India, making a loss of ~$2.4B against a revenue of $1.4B in 2022-23. They have the highest "pincode coverage" (zipcode equivalent) and in efforts to making every part of the country accessible by post, they have had to invest in building post offices at a lot of places which would never make any financial sense.
Internet Revolution v1.0 -- E-commerce Deliveries:
Around 2007 was when Indian e-commerce world started to get picked up. A lot of it was with the launch of startups like Flipkart (bought by Walmart for $16B), Amazon's India entry.
While they tried to promise 1-2 day deliveries, last-mile deliveries were really hard for these businesses and became a moat for companies who could deliver fast. ~Upto 30% of shipments required phone call with customers to coordinate the last mile delivery. (anecdotal evidence from during my work, no proof).
A lot of effort was spent by logistics startups in the ecosystem to improve on the address mapping and it was still kind of non-trivial.
We worked extensively on Geocoding in the startup but not always with great success. One of the bigger successes was for the pin code misclassification identification (using Named Entity Recognition) -- basically being able to flag before the last mile delivery is initiated if the address is mapped to the wrong pin code. But as you might expect, even this involved significant amount of human-in-the-loop training.
Internet v2.0 -- Hyperlocal & On-demand services -- Onset of GPS:
Similar to other countries, the Indian on-demand economy boom has been real. I would actually go ahead and say that it has been crazier than other markets (because of some regional economic reasons).
As you might expect, ride hailing (Uber, Ola) are kind of essentials in most major cities. But what more?
- Bike taxi (Motorcycles): Rapido helps you get picked and dropped on a motorcycle
- P2P deliveries: You can get a forgotten key delivered from your house to office (~5 miles away) in 20 minutes & $1.
- Hyperfast commerce: You can also order groceries at your home worth $3-4 for almost no delivery fee in 10 minutes.
The leading companies in the market are doing > 3-4M deliveries every day already (possibly more). The revenue per order is lower but scale of operations is phenomenal. GPS was a real unlocker in this segment.
With access to accurate GPS locations, the delivery partner / cab would at least reach within 100 metres radius of your house and then call you! Now, here's a screenshot of when you try to add an address in swiggy: (just look at the amount of information they collect to tackle the problem -- GPS + landmark + voice instruction + text instruction)
We always used to push that e-commerce companies should mandate or ask for GPS (in their mobile apps) but we never really saw it happen, even when they could easily do it and save so much of operational hassle / cost.
I am positive that this problem will be solved largely for the metropolitan areas in the next decade for 2 reasons:
(a) Tech innovations -- Similar to our project at Shadowfax, there are similar projects that internal ML teams at other scaled tech logistics companies have spent energy to solve. Also, saw a startup that's trying to solve this problem (honestly, this was the trigger for this write up).
(b) There are a lot of ongoing projects to standardise nomenclatures -- both government driven and private / non-profits driven. These are really hard to execute and operationalise at scale like India's. There have been a lot of hard failures in this space but I'm hopeful some of these projects scale up. In case you're interested, this is an Indian govt project and this was a study of some private efforts (although I'm honestly skeptical of non-govt enforced initiatives as the incentive alignment problem is really hard here).
I have many more ideas in my head about the unique learnings from building, working and investing in startups in India that I'd love to share. If you enjoyed reading this, do let me know!