Thursday, August 20, 2015

Tutorial with Map Visualization in Apache Zeppelin

Zeppelin is using leaflet which is an open source and mobile friendly interactive map library.

Before starting the tutorial you will need dataset with geographical information. Dataset should contain location coordinates representing, longitude and latitude. Here the online csv file will be used for the next steps. Here I am sharing sample dataset in gist.

1 import org.apache.commons.io.IOUtils
2 import java.net.URL
3 import java.nio.charset.Charset
4
5
6 // load map data
7 val myMapText = sc.parallelize(
8 IOUtils.toString(
9 new URL("https://gist.githubusercontent.com/Madhuka/74cb9a6577c87aa7d2fd/raw/2f758d33d28ddc01c162293ad45dc16be2806a6b/data.csv"),
10 Charset.forName("utf8")).split("\n"))

Refine Data


Next to transform data from csv format into RDD of Map objects, run the following script. This will remove the csv headers using filter function.


 


1 case class Map(Country:String, Name:String, lat : Float, lan : Float, Altitude : Float)
2
3 val myMap = myMapText.map(s=>s.split(",")).filter(s=>s(0)!="Country").map(
4 s=>Map(s(0),
5 s(1),
6 s(2).toFloat,
7 s(3).toFloat,
8 s(4).toFloat
9 )
10
11
12 // Below line works only in spark 1.3.0.
13 // For spark 1.1.x and spark 1.2.x,
14 // use myMap.registerTempTable("myMap") instead.
15 myMap.toDF().registerTempTable("myMap")

Data Retrieval and Data Validation


Here is how the dataset is viewed as a table


map_dataset



Dataset can be vaildated by calling `dataValidatorSrv`. It will validate longitude and latitude. If any record is out of range it will point out the recordId and record value with a meaningful error message.



1 var msg = dataValidatorSrv.validateMapData(data);



Now data distributions can be viewed on geographical map as below.


1 %sql
2 select * from myMap
3 where Country = "${Country="United States"}

1 %sql
2 select * from myMap
3 where Altitude > ${Altitude=300}

maps

No comments:

Post a Comment