Thursday, March 19, 2015

Building Apache Zeppelin

Apache Zeppelin (incubator) is a collaborative data analytics and visualization tool for Apache Spark, Apache Flink. It is web-based tool for the data scientists to collaborate over large-scale data exploration. Zeppelin is independent of the execution framework. Zeppelin integrated full support of Apache Spark so I will try sample with spark in it's self. Zeppelin interpreter allows any language/data-processing-backend to be plugged into Zeppelin (Scala- Apache Spark, SparkSQL, Markdown and Shell)
Let’s build from the source.
1. Get repo to machine.
git clone https://github.com/apache/incubator-zeppelin.git
2. Build code
mvn clean package
You can used '-DskipTests' to skip the test in the build
[note]
For cluster mode
mvn install -DskipTests -Dspark.version=1.1.0 -Dhadoop.version=2.2.0
Change spark.version and hadoop.version to your cluster's one.
Mint
3. Add jars, files
spark.jars, spark.files property in ZEPPELIN_JAVA_OPTS adds jars, files into SparkContext
ZEPPELIN_JAVA_OPTS="-Dspark.jars=/mylib1.jar,/mylib2.jar -Dspark.files=/myfile1.dat,/myfile2.dat"
4. Start Zeppelin
bin/zeppelin-daemon.sh start
In console you will see print ‘Zeppelin start ‘ So go to  http://localhost:8080/
Mint1
Go to NoteBook –> Tutorial
There you can see the chats and graph with queries. There you can pick chart attributes with drag and drop mode.

Mint2

No comments:

Post a Comment