Relative Content

Author Archive for Hadoop Tutorials

Intall Hortonworks HDP hadoop platform with Ambari server

Create Master node on google cloud with highmem-8core cpu and 52GB RAM SSH into node sudo yum update sudo yum install wget Note master host name with below command. This will be needed later hostname -f Login as root account sudo su root Setup passwordless SSH for root account Generate ssh keys. ssh-keygen cd ~/.ssh […]

Install Cloudera Hadoop 5.14 on Google cloud Virtual Machine

In this video tutorial I will show you how to install Cloudera Hadoop 5.14 version on google cloud virtual machine. Setup includes one master node and 2 slave nodes. Follow steps in video. Below are initial commands that you need for starting Cloudera installation. update Centos packages sudo yum update sudo yum install wget Download […]

Installing Apache Maven on ubuntu

This article will guide you on how to install apache Maven on Ubuntu. Same instructions could be followed for other Linux distributions as well. Get latest version of Maven binary from below official site. Download Maven, cd ~/Downloads wget Extract and copy maven to local directory tar -xvf apache-maven-3.5.2-bin.tar.gz mv apache-maven-3.5.2 /usr/local/ Set […]

Part-8 : Install Hue

This article will show you how to install Hue on a hadoop cluster. It assumes that you have a working hadoop cluster along with Hive installed and working. If not then follow various articles on this site to install hadoop and hive first. Install Hue dependencies apt-get install ant gcc g++ libkrb5-dev libffi-dev libmysqlclient-dev libssl-dev […]

Part-6 Installing HBase

In this article I will show you how to install HBase. I assume you have followed previous articles on how to install HDFS single node cluster. At minimum you need any HDP Platform (HDFS) running on your server. Download HBase tar.gz file from below official site cd ~/Downloads wget Extract .tar.gz file […]

Part-5 : Using Spark as execution engine for Hive

In this tutorial I will demonstrate how to use Spark as execution engine for hive. MapReduce is a default execution engine for Hive. But usually it’s very slow execution engine. Spark execution engine is better faster engine for running queries on Hive. I assume you already have a running Hive and Spark installation. If not […]

Set passwordless SSH for linux servers using private/public keys

This post will describe how to set passwordless SSH access on a linux server for a particular user. Login to linux server with a username and password first. Generate SSH key for this user using below command, ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa Above command will generate 2 files in ~/.ssh/ directory. (1)id_rsa and […]

Part-7 : Install Oozie

This article will show how you can install Oozie on hadoop 2.8 single node cluster. Oozie is a workflow scheduler system to manage Apache Hadoop jobs. I assume, you have followed previous articles on how to setup hadoop single node cluster or have a hadoop server already running. We also need maven to be installed […]

Part-4 : Install Spark

In this tutorial I will show you how to install Spark on ubuntu machine. Spark do not have particular dependency on Hadoop or other tools, but if you are planning to use Spark with Hadoop then you should follow my Part-1, Part-2 and Part-3 tutorial which covers installation of HDP platform, Hive and Sqoop. To […]