Part-7 : Install Oozie

This article will show how you can install Oozie on hadoop 2.8 single node cluster. Oozie is a workflow scheduler system to manage Apache Hadoop jobs. I assume, you have followed previous articles on how to setup hadoop single node cluster or have a hadoop server already running.

We also need maven to be installed in order to compile oozie source. You can refer to the installation instructions on below link to install Maven.
http://hadooptutorials.info/2017/11/13/installing-apache-maven-on-ubuntu/

Download Oozie 4.3 source tarball from the Apache URL and save the tarball to any directory. At the time of writing this article latest version is 4.3. You should get new version if there are new versions available on apache site.

https://oozie.apache.org/

cd ~/Downloads
wget http://apache.mirrors.hoobly.com/oozie/4.3.0/oozie-4.3.0.tar.gz
tar -zxf oozie-4.3.0.tar.gz

Compile oozie to create binary. Ensure that you have Maven installed before compiling.

cd oozie-4.3.0/bin
 ./mkdistro.sh -DskipTests

For further information on various compilation options visit below link,
https://oozie.apache.org/docs/4.3.0/DG_QuickStart.html#Building_Oozie

Once build is successful, oozie binary is available in target folder, in my case it is /root/Downloads/oozie-4.3.0/distro/target
Extract oozie binary and copy it in /usr/local folder so that it can be used as home folder.

cd /root/Downloads/oozie-4.3.0/distro/target
tar -zxvf oozie-4.3.0-distro.tar.gz
cd oozie-4.3.0-distro
mkdir -p /usr/local/oozie
mv oozie-4.3.0 /usr/local/oozie
cd /usr/local/oozie/oozie-4.3.0
mkdir libext

Add below line in ~/.bashrc file. This will add environment variable for oozie home directory

export OOZIE_HOME=/usr/local/oozie/oozie-4.3.0

Reload environment variables

source ~/.bashrc

Copy below jar files from hadoop home directory to $OOZIE_HOME/libext folder. This is a necessary step to avoid errors.

cp $HADOOP_HOME/share/hadoop/common/*.jar $OOZIE_HOME/libext/
cp $HADOOP_HOME/share/hadoop/common/lib/*.jar $OOZIE_HOME/libext/
cp $HADOOP_HOME/share/hadoop/mapreduce/*.jar $OOZIE_HOME/libext/
cp $HADOOP_HOME/share/hadoop/mapreduce/lib/*.jar $OOZIE_HOME/libext/
cp $HADOOP_HOME/share/hadoop/hdfs/*.jar $OOZIE_HOME/libext/
cp $HADOOP_HOME/share/hadoop/hdfs/lib/*.jar $OOZIE_HOME/libext/
cp $HADOOP_HOME/share/hadoop/yarn/*.jar $OOZIE_HOME/libext/
cp $HADOOP_HOME/share/hadoop/yarn/lib/*.jar $OOZIE_HOME/libext/

Note: I was getting below error without copying various jar files from hadoop home directory to OOZIE libext folder.
Error:

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/cli/ParseException
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
        at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
        at java.lang.Class.getMethod0(Class.java:3018)
        at java.lang.Class.getMethod(Class.java:1784)
        at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.cli.ParseException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 7 more

After copying various hadoop jars as mentioned above, this error went away.

Now create WAR file necessary to run OOZIE server

./bin/oozie-setup.sh prepare-war

War file should be created successfully now

Add/update below lines in $HADOOP_CONF_DIR/core-site.xml. These configuration will use root as impersonating user. If you are using some other user to run oozie jobs then change root to whatever user name that you are using.

  <property>
   <name>hadoop.proxyuser.root.hosts</name>
   <value>*</value>
  </property>
  <property>
   <name>hadoop.proxyuser.root.groups</name>
   <value>*</value>
  </property>

Start OOZIE server

cd $OOZIE_HOME
bin/oozied.sh start

Hopefully OOZIE server will start without any errors.
You can verify if OOZIE web console is running with below command,

bin/oozie admin -oozie http://localhost:11000/oozie -status

Or try accessing http://localhost:11000

On accessing http://localhost:11000/oozie/, I got below error

HTTP Status 500 - java.lang.NullPointerException
type Exception report
message java.lang.NullPointerException
description The server encountered an internal error that prevented it from fulfilling this request.
exception
org.apache.jasper.JasperException: java.lang.NullPointerException
    org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:542)
    org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:370)
    org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:321)
    org.apache.jasper.servlet.JspServlet.service(JspServlet.java:267)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
    org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:171)
    org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:636)
    org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:588)
    org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:176)
    org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
root cause

java.lang.NullPointerException
    org.apache.jsp.index_jsp._jspInit(index_jsp.java:25)
    org.apache.jasper.runtime.HttpJspBase.init(HttpJspBase.java:52)
    org.apache.jasper.servlet.JspServletWrapper.getServlet(JspServletWrapper.java:164)
    org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:340)
    org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:321)
    org.apache.jasper.servlet.JspServlet.service(JspServlet.java:267)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
    org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:171)
    org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:636)
    org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:588)
    org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:176)
    org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)

To solve this error, we need to delete or rename below JAR file from WEB-INF folder.

cd $OOZIE_HOME/oozie-server/webapps/oozie/WEB-INF/lib
mv jsp-api-2.1.jar jsp-api-2.1.xxjar

Now restart oozie server

bin/oozied.sh stop

Above line may throw some error saying unable to remove pid file. In that case we may need to remove it manually. Use below commands to do that.

rm oozie-server/temp/*.pid
bin/oozied.sh start

Try accessing http://localhost:11000 again and see if it works this time. It should work.

Once oozie server is up and running, it’s time to run few samples provided with oozie setup.

Run Examples
To run bundled examples with oozie, first extract them.

cd $OOZIE_HOME
tar -zxvf oozie-examples.tar.gz
cd examples/apps/map-reduce

We will try to run basic Map Reduce sample. Edit job properties file for Map Reduce job.

vi job.properties

Update below 2 lines as per your hadoop configuration

nameNode=hdfs://localhost:54310
jobTracker=localhost:8032

Please note, for hadoop version 2.6 and above use jobTracker=localhost:8032. For old versions jobTracker=localhost:54310. These are hadoop defaults. If you changed job tracker port in yarn-site.xml then you should use that value here.

Copy examples on hdfs,

hadoop fs -copyFromLocal $OOZIE_HOME/examples hdfs://localhost:54310/user/root/

To run example,

cd $OOZIE_HOME
bin/oozie job -oozie http://localhost:11000/oozie -config $OOZIE_HOME/examples/apps/map-reduce/job.properties -run

Note: For me, on running map-reduce example it was throwing below error in Oozie web-console

File /user/root/share/lib does not exist

To solve this, run below command that will install oozie sharelib in HDFS

cd $OOZIE_HOME
bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310 -locallib oozie-sharelib-4.3.0.tar.gz

Then edit oozie-site.xml and add/update configuration as below,

  <property>
        <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
        <value>*=/usr/local/hadoop/etc/hadoop/</value>
        <description>
            Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
            the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
            used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
            the relevant Hadoop *-site.xml files. If the path is relative is looked within
            the Oozie configuration directory; though the path can be absolute (i.e. to point
            to Hadoop client conf/ directories in the local filesystem.
        </description>
    </property>
 
    <property>
        <name>oozie.service.WorkflowAppService.system.libpath</name>
        <value>hdfs:///user/${user.name}/share/lib</value>
        <description>
            System library path to use for workflow applications.
            This path is added to workflow application if their job properties sets
            the property 'oozie.use.system.libpath' to true.
        </description>
    </property>

<property>
<name>oozie.service.ProxyUserService.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>oozie.service.ProxyUserService.proxyuser.root.groups</name>
    <value>*</value>
</property>

Now restart oozie server

bin/oozied.sh stop

Above line may throw some error saying unable to remove pid file. So we may need to remove it manually.

rm oozie-server/temp/*.pid
bin/oozied.sh start

Now try running example again,

cd $OOZIE_HOME
bin/oozie job -oozie http://localhost:11000/oozie -config $OOZIE_HOME/examples/apps/map-reduce/job.properties -run

Hopefully this time it runs successfully and shows you below screen with RUNNING as status.
http://localhost:11000/oozie/
Oozie Server

Leave a Reply

Your email address will not be published. Required fields are marked *