Madhuka: Zeppelin Note for load data and Analyzing

Previous post is introduction for zeppelin notebook. Here we will more more detail view where it will used for researcher. Using shell interpreter we can download / retrieve data sets / files from remote server or internet. Then using Scala in Spark to make class from that data and then used SQL to play with the data. You can analysis data very quickly as it support dynamic form in zeppelin display.

1. Loading Data for zeppelin from local file called csv

 1  1 val bankText = sc.textFile("/home/max/zeppelin/zp1/bank.csv")
 2  2 
 3  3 case class Bank(age:Integer,job:String,marital:String,education:String,balance:Integer)
 4  4 val bank = bankText.map(s=>s.split(";")).map(
 5  5  s=>Bank(s(0).toInt,
 6  6   s(1).replaceAll("\"",""),
 7  7   s(2).replaceAll("\"",""),
 8  8   s(3).replaceAll("\"",""),
 9  9   s(5).replaceAll("\"","").toInt  
10 10  )
11 11 )
12 12 bank.registerAsTable("bank");

Here we are, Creating an RDD of Bank objects and register it as a table called ‘bank’

Note: Case classes in Scala 2.10 can support only up to 22 fields.

2. Next run SQL for newly created table

%sql select count(*) as count from bank

Give total record count we have in table

1 %sql 
2 select age, count(1) value
3  from bank 
4   where age < 30 
5    group by age 
6    order by age

It support tool tip on chart where it show age and user count for that age

[1] http://madhukaudantha.blogspot.com/2015/04/zeppelin-notebook.html

Madhuka

Thursday, April 23, 2015

Zeppelin Note for load data and Analyzing

No comments:

Post a Comment