In this tutorial I will show you how to install Spark on ubuntu machine. Spark do not have particular dependency on Hadoop or other tools, but if you are planning to use Spark with Hadoop then you should follow my Part-1, Part-2 and Part-3 tutorial which covers installation of HDP platform, Hive and Sqoop.
To install Spark, first you need to ensure you have java installed. Run command “java -version” to check installed version.
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
If you don’t see messages like above and see message like java is not installed then you can install it with below command.
For Spark to work, Scala needs to be installed.
Once installed, type “scala” in terminal and you should see scala prompt like below.
Welcome to Scala version 2.11.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions to have them evaluated.
Type :help for more information.
scala>
You can then quit Scala prompt with “:q” command.
Visit below link and get latest link for pre-built version of Spark depending on your hadoop major version. For example if you are using hadoop version 2.x.x then get link for .tgz file with 2.x.x version.
https://spark.apache.org/downloads.html
wget http://ftp.wayne.edu/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
Extract downloaded file,
Move extracted spark folder to /usr/lib/spark/ folder.
mv ~/Downloads/spark-2.2.0-bin-hadoop2.7 /usr/lib/spark
Set Spark environment variable in .bashrc file.
Add below lines in .bashrc file at the end.
export SPARK_HOME=/usr/lib/spark/spark-2.2.0-bin-hadoop2.7
PATH=$PATH:$SPARK_HOME/bin
export PATH
export SPARK_MASTER_HOST=localhost
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=7180
Save and exit .bashrc file. Then reload .bashrc file
Now Spark installation is complete. To start Spark master and all other services, use below command.
starting org.apache.spark.deploy.master.Master, logging to /usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/logs/spark--org.apache.spark.deploy.master.Master-1-quickstart.hadoop.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-quickstart.hadoop.out
Now you can access Spark WebUI using below URL.
http://localhost:7180
You should see webpage like below.
Spark installation is complete now.
To make it easier in future we should add Spark service in start.sh file.
Then add below line at the end.