n this tutorial we will discuss how to use Spark as execution engine for hive. MapReduce is a default execution engine for Hive. But usually it’s very slow execution engine. Spark is better faster engine for running queries on Hive.
In this tutorial we will discuss you how to install Spark on Ubuntu VM. Spark do not have particular dependency on Hadoop or other tools. But if you are planning to use Spark with Hadoop then you should follow my Part-1, Part-2 and Part-3 tutorial which covers installation of Hadoop and Hive. Install Java and… Read More »
In this part we will discuss how to install HIVE on Hadoop HDFS file system.
In this tutorial, we will discuss how to install Hortonworks hadoop platform with Ambari server. I am using Google Cloud VM for this tutorial. You can do same or create your own VM using either VirtualBox or AWS. You can also use your own Physical server if you wish. Steps will remain same except some… Read More »
In this video tutorial I will show you how to install Cloudera Hadoop 5.14 version on google cloud virtual machine. Setup includes one master node and 2 slave nodes. Follow steps in video. Below are initial commands that you need for starting Cloudera installation. Download Cloudera Manager installer from cloudera site. Make installer file as… Read More »
This article will show you how to install Hue on a hadoop cluster. It assumes that you have a working hadoop cluster along with Hive installed and working. If not then follow various articles on this site to install hadoop and hive first.
This article will show how you can install Apache Oozie on hadoop 2.8 single node cluster. Oozie is a workflow scheduler system to manage Apache Hadoop jobs. I assume, you have followed previous articles on how to setup hadoop single node cluster or have a Hadoop server already running. Apache Maven should be installed first.… Read More »
In this part we will discuss how to install SQOOP on HDP platform. I assume that you have followed instructions from Part-1 to 6 on how to install HDP and Hive on a single node cluster. Get latest version of “sqoop-*-bin.tar.gz” file link from below official SQOOP site, http://sqoop.apache.org Go to Downloads directory and download… Read More »