Published on

What is Geohash

Table of Contents

Geohash algorithm is an encoding system that encodes geographical location (latitude, longitude) into a short string of letters and digits and this short string is called a Geohash.

Geohashing guarantees that the longer the prefix, shared by the two geohashes, means they are spatially closer.

For example:
Let’s say we have these 3 points:

LocationLat, Lng
Punjabi University Outside30.356644, 76.455333
Punjabi University Inside30.359125, 76.452885
Patiala30.330939, 76.413182
alt text

Their Geohashes are:

LocationGeohash
Punjabi University Outside11001110011011010010011011111000
Punjabi University Inside11001110011011010010011011110011
Patiala11001110011011010010011000101101

As you can see Point 1 and 2 are relatively closer and end up having a longer shared prefix whereas Point 3 which is a few kilometers away has a relatively smaller shared prefix.

Geohash is a string of letters and digits but in the above example geohashes only contained zeroes’ and one’s that is because this bit string is further encoded using base32 encoding to get the final output. But for the purpose of this post, we’ll be using the bit string as it is sufficient.

Use Case of Geohash

Geohash is used to optimize Location-based services.

Let’s say you need to find all the restaurants near you. You can use Geohash to efficiently get those locations.

For example:

You can store the geohashes of the locations in your database and create an index on it and then you can issue a simple LIKE predicate: LIKE ‘1100111%’ to get all the locations that share the same prefix.

The length of the prefix you specify in the LIKE predicate determines the precision of the search. The longer the prefix the smaller the search radius would be. (Radius is not technically the correct term as you’ll see, because we’ll be dealing with rectangular grids.)

How does it work?

Subdivision

Geohash works by dividing the planet into four quadrants and then assigning each quadrant a unique ID. This unique ID will be the geohash for all the lat-lng values which fall into that quadrant.

Geohash with depth 1

As you can see in the image the planet is divided into four quadrants along the equator and prime meridian. Now if you are given a geohash that begins with 01 you can easily tell in which part of the world this geohash falls but the precision is very low as it could be anywhere in the huge quadrant.

Now we’ll subdivide each quadrant recursively into smaller grids till we reach a sufficiently small size. An important point here to note is that each child grid’s ID will start with its parent’s ID thus enforcing our shared prefix property and this is why geohashes that have longer shared prefixes are spatially closer.

Geohash with depth 2

ID Generation

The Ids assigned to each quadrant are not random but they have meaning.

Ids are formed by checking each quadrant’s longitude and latitude range and then assigning appropriate bits for each respectively. The first bit represents the longitude value and the second bit represents the latitude value. The bits you choose depend upon the lat-lng range which you can view in the table below.

RangeBit Value
Longitude range [-180, 0] is represented by0
Longitude range [0, 180] is represented by1
Latitude range [0, 90] is represented by1
Latitude range [-90, 0] is represented by0

Interactive example

Click anywhere in the window to highlight nearby points. You can also increase the search depth which would shorten the search radius.

Geohash algorithm implementation

Geohash is a string of letters and digits but in the above example geohashes only contained zeroes’ and one’s that is because this bit string is further encoded using base32 encoding to get the final output. But for the purpose of this post, we’ll be using the bit string as it is sufficient.

Geohash.cs
internal static string Encode(double lat, double lng, int depth)
{
    double minLat = -90;
    double maxLat = 90;
    double minLng = -180;
    double maxLng = 180;
    var hash = new int[2 * depth];

    for (var i = 0; i < 2 * depth; i++)
    {
        if (i % 2 == 0)
        {
            var mid = (minLng + maxLng) / 2;
            if (lng > mid)
            {
                hash[i] = 1;
                minLng = mid;
            }
            else
            {
                hash[i] = 0;
                maxLng = mid;
            }
        }
        else
        {
            var mid = (maxLat + minLat) / 2;
            if (lat > mid)
            {
                hash[i] = 1;
                minLat = mid;
            }
            else
            {
                hash[i] = 0;
                maxLat = mid;
            }
        }
    }

    return string.Join("", hash);
}

In the Encode method, we have taken the lat and lng arguments which we want to encode as geohash and a depth argument that will determine the recursive depth.

Each iteration, we are either calculating bit value for lat or lng depending upon the index of the iteration.

Drawbacks

When two points are closer but are on different sides of an edge then they will have different geohashes thus our nearby search will fail to retrieve those on the other side. We can reduce the precision of the search to include those but then there will be too many points included in the final result.