Madhuka: Building Apache Zeppelin

Apache Zeppelin (incubator) is a collaborative data analytics and visualization tool for Apache Spark, Apache Flink. It is web-based tool for the data scientists to collaborate over large-scale data exploration. Zeppelin is independent of the execution framework. Zeppelin integrated full support of Apache Spark so I will try sample with spark in it's self. Zeppelin interpreter allows any language/data-processing-backend to be plugged into Zeppelin (Scala- Apache Spark, SparkSQL, Markdown and Shell)
Let’s build from the source.
1. Get repo to machine.
git clone https://github.com/apache/incubator-zeppelin.git
2. Build code
mvn clean package
You can used '-DskipTests' to skip the test in the build
[note]
For cluster mode
mvn install -DskipTests -Dspark.version=1.1.0 -Dhadoop.version=2.2.0
Change spark.version and hadoop.version to your cluster's one.

3. Add jars, files
spark.jars, spark.files property in ZEPPELIN_JAVA_OPTS adds jars, files into SparkContext
ZEPPELIN_JAVA_OPTS="-Dspark.jars=/mylib1.jar,/mylib2.jar -Dspark.files=/myfile1.dat,/myfile2.dat"
4. Start Zeppelin
bin/zeppelin-daemon.sh start
In console you will see print ‘Zeppelin start ‘ So go to http://localhost:8080/

Go to NoteBook –> Tutorial
There you can see the chats and graph with queries. There you can pick chart attributes with drag and drop mode.

Madhuka

Thursday, March 19, 2015

Building Apache Zeppelin

No comments:

Post a Comment