Part-4 : Install Spark

In this tutorial I will show you how to install Spark on ubuntu machine. Spark do not have particular dependency on Hadoop or other tools, but if you are planning to use Spark with Hadoop then you should follow my Part-1, Part-2 and Part-3 tutorial which covers installation of HDP platform, Hive and Sqoop.

To install Spark, first you need to ensure you have java installed. Run command “java -version” to check installed version.

root@quickstart:/# java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

If you don’t see messages like above and see message like java is not installed then you can install it with below command.

sudo apt-get install default-jdk

For Spark to work, Scala needs to be installed.

sudo apt-get install scala

Once installed, type “scala” in terminal and you should see scala prompt like below.

root@quickstart:/# scala
Welcome to Scala version 2.11.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions to have them evaluated.
Type :help for more information.
scala>

You can then quit Scala prompt with “:q” command.

scala> :q

Visit below link and get latest link for pre-built version of Spark depending on your hadoop major version. For example if you are using hadoop version 2.x.x then get link for .tgz file with 2.x.x version.

https://spark.apache.org/downloads.html

cd ~/Downloads
wget http://ftp.wayne.edu/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

Extract downloaded file,

tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz

Move extracted spark folder to /usr/lib/spark/ folder.

mkdir /usr/lib/spark
mv ~/Downloads/spark-2.2.0-bin-hadoop2.7 /usr/lib/spark

Set Spark environment variable in .bashrc file.

vi ~/.bashrc

Add below lines in .bashrc file at the end.

#Set SPARK home
export SPARK_HOME=/usr/lib/spark/spark-2.2.0-bin-hadoop2.7
PATH=$PATH:$SPARK_HOME/bin
export PATH
export SPARK_MASTER_HOST=localhost
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=7180

Save and exit .bashrc file. Then reload .bashrc file

source ~/.bashrc

Now Spark installation is complete. To start Spark master and all other services, use below command.

root@quickstart: $Spark_HOME/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/logs/spark--org.apache.spark.deploy.master.Master-1-quickstart.hadoop.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-quickstart.hadoop.out

Now you can access Spark WebUI using below URL.
http://localhost:7180

You should see webpage like below.

Spark installation is complete now.

To make it easier in future we should add Spark service in start.sh file.

vi ~/start.sh

Then add below line at the end.

$SPARK_HOME/sbin/start-all.sh

2 thoughts to “Part-4 : Install Spark”

  1. Kamal says:

    I am not able to access http://localhost:7180/ after following all steps. What could be the issue

  2. Hadoop Tutorials says:

    Make sure you have executed start-all.sh script and it runs successfully. Can you also check error log? It should exist in below path.

    /usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/logs/spark–org.apache.spark.deploy.master.Master-1-quickstart.hadoop.out

Leave a Reply

Your email address will not be published. Required fields are marked *