# Bokeh Maps

For some time now I have been searching for a correct shapefile for India. There are some shapefile at Global Administrative Areas but they are outdated. Additionally they don’t correctly represent Kashmir. While cleaning my desktop I came across some shapefiles that seems to be up-to-date and correct, unfortunately I don’t know the source of these files. If you know the source of these shapefiles please do let me know and I will duly reference the original author. Shapefile for Indian with State Boundaries

With an updated shapefile and latest interest of exploring Python for data analysis I thought of giving Bokeh with Python a shot. I really like where Bokeh is going, it’s pretty powerful albeit not so accessible just yet. But looking at the roadmap I believe user friendly and more intuitive interfaces (like ggplot) will be coming soon.

Some things I really like about Bokeh are:

• Pretty clean integration with IPython Notebook.
• Standalone html files without the need for any external dependencies.
• Interactive controls with Pan, Zoom, Hover capabilities
• A pretty good coverage of visualization capabilities (more coming)

Considering it’s still under heavy development I am very positive about the future of Bokeh. Below is small snippet that shows just how easy it is to create pretty interactive plots. Do note that majority of the code below is dealing with reading and massaging plot data from shapefile, the actual plot only needs less than 10 lines of code.

Note: I have moved the plot to an external page due to it’s size. The html file is fairly large for slow connection ~6.5MB so please give it sometime to load. Instead I have replaced the embedded plot with an static image.

Launch Interactive Plot

import pandas as pd
import numpy as np
import shapefile

# Create a Unique List of States (Administrative Regions)
states = set([i[2] for i in dat.iterRecords()])

# Assign colors to Indian States. The three colors from Indian Flag.
colors = { 'Andaman & Nicobar': "#138808" , 'Andhra Pradesh'      : "#138808",
'Arunachal Pradesh': "#FF9933" , 'Assam'               : "#FF9933",
'Bihar'            : "#FF9933" , 'Chandigarh'          : "#FF9933",
'Chhattisgarh'     : "#FFFEFF" , 'Dadra & Nagar Haveli': "#138808",
'Daman'            : "#FFFEFF" , 'Daman & Diu'         : "#FFFEFF",
'Delhi'            : "#FF9933" , 'Diu'                 : "#FFFEFF",
'Goa'              : "#138808" , 'Gujarat'             : "#FFFEFF",
'Haryana'          : "#FF9933" , 'Himachal Pradesh'    : "#FF9933",
'Jammu & Kashmir'  : "#FF9933" , 'Jharkhand'           : "#FFFEFF",
'Karnataka'        : "#138808" , 'Kerala'              : "#138808",
'Maharashtra'      : "#138808" , 'Manipur'             : "#FFFEFF",
'Meghalaya'        : "#FF9933" , 'Mizoram'             : "#FFFEFF",
'Nagaland'         : "#FFFEFF" , 'Orissa'              : "#FFFEFF",
'Pondicherry'      : "#138808" , 'Punjab'              : "#FF9933",
'Rajasthan'        : "#FF9933" , 'Sikkim'              : "#FF9933",
'Tamil Nadu'       : "#138808" , 'Tripura'             : "#FFFEFF",
'Uttar Pradesh'    : "#FF9933" , 'Uttaranchal'         : "#FF9933",
'West Bengal'      : "#FFFEFF" }

# Create the Plot

from bokeh.plotting import *
output_file("india_states.html")

hold()

TOOLS="pan,wheel_zoom,box_zoom,reset,previewsave"
figure(title="Map of India", tools=TOOLS, plot_width=900, plot_height=800)

for state_name in states:
data = getDict(state_name, dat)
patches(data[state_name]['lat_list'], data[state_name]['lng_list'], \
fill_color=colors[state_name], line_color="black")

show()

The two functions below are used to extract and transform the shape data from shapefile. To highlight the simplicity of Bokeh plots I have moved them in a seperate section.

# Given a shapeObject return a list of list for latitude and longitudes values
#       - Handle scenarios where there are multiple parts to a shapeObj

def getParts ( shapeObj ):

points = []

num_parts = len( shapeObj.parts )
end = len( shapeObj.points ) - 1
segments = list( shapeObj.parts ) + [ end ]

for i in range( num_parts ):
points.append( shapeObj.points[ segments[i]:segments[i+1] ] )

return points

# Return a dict with three elements
#        - state_name
#        - total_area
#        - list of list representing latitudes
#        - list of list representing longitudes
#
#  Input: State Name & ShapeFile Object

def getDict ( state_name, shapefile ):

stateDict = {state_name: {} }

rec = []
shp = []
points = []

# Select only the records representing the
# "state_name" and discard all other
for i in shapefile.shapeRecords( ):

if i.record[2] == state_name:
rec.append(i.record)
shp.append(i.shape)

# In a multi record state for calculating total area
# sum up the area of all the individual records
#        - first record element represents area in cms^2
total_area = sum( [float(i[0]) for i in rec] ) / (1000*1000)

# For each selected shape object get
# list of points while considering the cases where there may be
# multiple parts  in a single record
for j in shp:
for i in getParts(j):
points.append(i)

# Prepare the dictionary
# Seperate the points into two separate lists of lists (easier for bokeh to consume)
#      - one representing latitudes
#      - second representing longitudes

lat = []
lng = []
for i in points:
lat.append( [j[0] for j in i] )
lng.append( [j[1] for j in i] )

stateDict[state_name]['lat_list'] = lat
stateDict[state_name]['lng_list'] = lng
stateDict[state_name]['total_area'] = total_area

return stateDict