Part-4 : Install Spark

In this tutorial I will show you how to install Spark on ubuntu machine. Spark do not have particular dependency on Hadoop or other tools, but if you are planning to use Spark with Hadoop then you should follow my Part-1, Part-2 and Part-3 tutorial which covers installation of HDP platform, Hive and Sqoop.

To install Spark, first you need to ensure you have java installed. Run command “java -version” to check installed version.

root@quickstart:/# java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

If you don’t see messages like above and see message like java is not installed then you can install it with below command.

sudo apt-get install default-jdk

For Spark to work, Scala needs to be installed.

sudo apt-get install scala

Once installed, type “scala” in terminal and you should see scala prompt like below.

root@quickstart:/# scala
Welcome to Scala version 2.11.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions to have them evaluated.
Type :help for more information.

You can then quit Scala prompt with “:q” command.

scala> :q

Visit below link and get latest link for pre-built version of Spark depending on your hadoop major version. For example if you are using hadoop version 2.x.x then get link for .tgz file with 2.x.x version.

cd ~/Downloads

Extract downloaded file,

tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz

Move extracted spark folder to /usr/lib/spark/ folder.

mkdir /usr/lib/spark
mv ~/Downloads/spark-2.2.0-bin-hadoop2.7 /usr/lib/spark

Set Spark environment variable in .bashrc file.

vi ~/.bashrc

Add below lines in .bashrc file at the end.

#Set SPARK home
export SPARK_HOME=/usr/lib/spark/spark-2.2.0-bin-hadoop2.7
export PATH
export SPARK_MASTER_HOST=localhost

Save and exit .bashrc file. Then reload .bashrc file

source ~/.bashrc

Now Spark installation is complete. To start Spark master and all other services, use below command.

root@quickstart: $Spark_HOME/sbin/
starting org.apache.spark.deploy.master.Master, logging to /usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/logs/spark--org.apache.spark.deploy.master.Master-1-quickstart.hadoop.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-quickstart.hadoop.out

Now you can access Spark WebUI using below URL.

You should see webpage like below.

Spark installation is complete now.

To make it easier in future we should add Spark service in file.

vi ~/

Then add below line at the end.