Big Loupe Documentation

BigLoupe is a Web application which help you to manage your data in your Apache Hadoop Cluster.

Installation

Download last version in bitbucket and install a Java Developement Kit 1.6 or higher.

Unzip bigloupe and launch in your console (you can also use bigloupe.sh if you have no specific parameter) :

java -jar bigloupe.jar

BigLoupe launch Jetty, a JSP/Servlet Container and deploy a web application Java (WAR) available in web directory (bigloupe-war-${version}.war)

By default BigLoupe start on port 9090. You can change this port with option port :

java -jar bigloupe.jar -port 9090

To have the different options :

java -jar bigloupe.jar -help

BigLoupe-launcher deploy bigloue-war in a working directory ($/BIGLOUPE_HOME/work). You can modify libraries and configuration files in this directory.
BigLoupe-launcher will never delete this directory.

Log files

BigLoupe uses Log4j for logging. You can modify log4j.properties in work/webapp/WEB-INF/classes to modify log level.

Hadoop Support

BigLoupe is build to support natively Cloudera CDH3u4 (Hadoop Platform from Cloudera)
If you want to use BigLoupe with an another Hadoop distribution, you need to replace hadoop-core-0.20.2-cdh3u4.jar in $BIGLOUPE_HOME/work/webapp/WEB-INF/lib) by your own hadoop-core library.

Add a new Hadoop cluster in BigLoupe

BigLoupe uses configuration files in $BIGLOUPE_HOME/work/webapp/hdfs/cluster to define which Hadoop cluster can be watched and indexed.
You need to modify this directory to add you own configuration.
Each subdirectory must start by conf_$CLUSTER_NAME where you must replace $CLUSTER_NAME by DNS address of your hadoop cluster.
Under this directory you find 3 configuration files mandatory by Hadoop librairies.
– core-site.xml : manage NameNode hostname & port
– hdfs-site.xml
– mapred-site.xml : manage JobTracker hostname and port
– user.txt : define user identity used by BigLoupe (works only with simple security)

Installation in an another Servlet/JSP container

You can choose an another Servlet/JSP container if you want instead embedded Jetty in BigLoupe.
BigLoupe has been tested with Apache Tomcat 6.x and 7.x.

Check your configuration

After the first installation you must have a screen like this

init

A file has been written by BigLoupe in $BIGLOUPE_HOME/work/webapp/bigloupe-server-url.properties with your url.

 

Manage your jobs

BigLoupe integrates all features from LinkedIn Azkaban scheduler. Some additional jobs has been added : Map-Reduce jobs, Apache Sqoop jobs, Indexation with ElasticSearch jobs.

Declare a new job

Each jobs must be declared under $BIGLOUPE_HOME/work/webapp/scheduler/jobs.
Jobs can be organized under sub-directories.
A job is a file with extension .job and contains properties file in the format key=value.
Properties change and depend of the type parameter.

You have all generic job types and associated properties in the following table :

Property Required? Meaning Example
type required Command job command
command required Specifies the command to execute. ls -lh
command.n optional Defines additional commands that are run sequentially after command. ls -lh
working.dir optional Specifies the directory in which the command is invoked. The default working directory is the job’s directory. /home/ejk
env.property optional Specifies environment variables that should be set before running the command. property defines the name of the environment variable, so env.VAR_NAME=VALUE creates an environment variable $VAR_NAME and gives it the value of VALUE.
Property Required? Meaning Example
java.class required The class that contains the main function. your.package.HelloWorld
classpath optional A comma-delimited list of JAR files and directories to be added to the classpath. If not set, it adds all JARs in the working directory to the classpath. commons-io.jar,helloworld.jar
Xms optional The initial memory pool size to start the JVM. The default is 64M. 64M
Xmx optional The maximum memory pool size. The default is 256M. 256M
main.args optional List of comma-delimited arguments to pass to the Java main function. arg1,arg2
jvm.args optional Arguments set for the JVM. This is not a list. The entire string is passed intact as a VM argument. -Dmyprop=test -Dhello=world
working.dir optional Inherited from command jobs. /home/ejk
env.property optional Inherited from command jobs. env.MY_ENV_VARIABLE=testVariable
Inherited from java job – all java job properties can be used.
This job use the main class org.apache.hadoop.util.RunJar

Property Required? Meaning Example
type required Map Reduce Job map-reduce
classpath.ext optional A comma-delimited list of directories to be added to the distributed classpath. Use option libjars (in GenericOptionsParser) to uploads the given jars to the cluster and then makes them available on the classpath for each mapper / reducer instance lib,hadoop-client
jar required MapReduce jar file to run lib/compactor-1.1.jar
jar.args required Arguments to pass to the main-class in the map reduce jar file. --input-format org.apache.avro.mapred.AvroInputFormat --avro-input-schema --output-format org.apache.avro.mapred.AvroOutputFormat --compress none --verbose --tmp-dir /user/karma/demo
Inherited from map reduce job – all map reduce job properties can be used.
This job use the main class in pifg : org.apache.pig.Main

Property Required? Meaning Example
pig.script optional Specifies the pig script to run. If not set, it uses the job name to find jobname.pig. pig-example.pig
udf.import.list optional Comma-delimited list of UDF imports oink.,linkedin.udf.
param.name optional Used for parameter replacement to pass parameters from your job into your pig script. Order is not guaranteed. See the pig documentation for information on using pig parameters in your scripts. param.variable1=myvalue
paramfile optional Comma-delimited list of files used for variable replacement in your pig script. Order is not guaranteed, and param.name takes precedence. paramfile1,paramfile2
hadoop.job.ugi optional Sets the user name and group for Hadoop jobs. hadoop,group
classpath optional Inherited from javaprocess jobs. commons-io.jar,helloworld.jar
Xms optional Inherited from javaprocess jobs. 64M
Xmx optional Inherited from javaprocess jobs. 256M
jvm.args optional Inherited from javaprocess jobs. -Dmyprop=test -Dhello=world
working.dir optional Inherited from command jobs. /home/ejk
env.property optional Inherited from command jobs.
Inherited from java job – all map reduce job properties can be used.
This job use the main class org.apache.sqoop.Sqoop

Property Required? Meaning Example
type required Job Type for SQOOP Job sqoop
usage optional Usage of SQOOP (export/import) export to export from HDFS to RDBMS or import to import from RDBMS to HDFS
Inherited from pig job. This job calls a specific pig file index_generic_avro_file_with_elasticsearch.pig
to index avro file or directory.

Property Required? Meaning Example
type required Job Type for Elastic Search Job indexfile
fileToIndex required File or directory to index /yourpath/yourfile

Leave a Reply

Your email address will not be published. Required fields are marked *

Human comment ? *