Part-2: Install Hive

In this part we will discuss how to install HIVE on HDP platform. I assume that you have followed instructions from Part-1 on how to install HDP on single node cluster.

Get latest version of “hive-*-bin.tar.gz” file link from below official Hive site,

Go to Downloads directory and download latest tar.gz file.

cd ~/Downloads

extract tar file,

tar -zxvf apache-hive-2.3.0-bin.tar.gz

Create directory for hive and move extracted folder there.

mkdir /usr/lib/hive
mv apache-hive-2.3.0-bin /usr/lib/hive

Set hive home path in .bashrc file

vi ~/.bashrc

add below lines at the end

export HIVE_HOME="/usr/lib/hive/apache-hive-2.3.0-bin"
export PATH

Reload environment variables,

source ~/.bashrc

Edit hive configuration file,

vi /usr/lib/hive/apache-hive-2.3.0-bin/bin/

Go to line where following lines exist


Below above lines, add following line which is home path for hadoop.

# Write directory path where hadoop file is there
export HADOOP_HOME=/usr/local/hadoop

Create Hive database directory on HDFS,

hadoop fs -mkdir /usr/
hadoop fs -mkdir /usr/hive
hadoop fs -mkdir /usr/hive/warehouse
hadoop fs -mkdir /tmp

Set READ/WRITE permission for warehouse and temporary directory

hadoop fs -chmod g+w /usr/hive/warehouse
hadoop fs -chmod g+w /tmp

Edit Hive environment variables

cd /usr/lib/hive/apache-hive-2.3.0-bin/conf
vi /usr/lib/hive/apache-hive-2.3.0-bin/conf/

add below lines at the end of file,

export HADOOP_HOME=/usr/local/hadoop
export HIVE_CONF_DIR=/usr/lib/hive/apache-hive-2.3.0-bin/conf
export JAVA_HOME=/usr/local/java/jdk1.8.0_144

Set hive-env as executable

chmod +x

To enable logging when starting hive, you can rename below log4j template files. This step is optional but can be useful when you receive errors from Hive for additional debugging information.


Add/update below classpath property in mapred-site.xml file. This line may already be correct in mapred-site.xml, but what I found in my case is for some reason, default paths with environment variables did not work and they were throwing errors while running Hive queries. So I had to set add absolute paths without using environment variables which you can see at the end of property value starting with /usr/local/hadoop…


By default, Hive uses Derby database as metastore. If you want to use default Derby database, then initialize it’s schema using below commands.

cd /usr/lib/hive/apache-hive-2.3.0-bin
bin/schematool -initSchema -dbType derby

It’s also possible to use other databases like mysql as metastore. I was constantly getting Derby database corruption isue while running Hive queries so I recommend that you use mysql as metastore database. Hive will run much stable with it.

To use mysql with HIVE metastore, you need mysql server installed. If you don’t have it, then first install mysql server using below command,

apt-get install mysql-server

During installation it will ask you to set database user “root” password. Set it and note it down.

Install mysql java connector which is needed for making Hive connection.

apt-get install libmysql-java

Link mysql java connector in Hive library folder

ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar

Start MySQL server,

service mysql start

Connect to MySQL and create metastore database for hive,

mysql -u root -p
mysql>CREATE DATABASE hive_metastore;
mysql>USE hive_metastore;

On MySQL prompt, use mysql metastore database template (.sql file) provided with hive installation to create metastore tables.

Generate metastore using database template provided at hive installation directory. Make sure to use correct version of template as per Hive version. In our case Hive version is 2.3.0

mysql>SOURCE /usr/lib/hive/apache-hive-2.3.0-bin/scripts/metastore/upgrade/mysql/hive-schema-2.3.0.mysql.sql

Create metastore specific mysql user to use with HIVE.

mysql>CREATE USER 'hive_user'@'%' IDENTIFIED BY 'hive_password';
mysql>GRANT all on *.* to 'hive_user'@localhost identified by 'hive_password';
mysql>flush privileges;

Create hive-site.xml file and add below configurations to use mysql database as default metastore.

vi $HIVE_HOME/conf/hive-site.xml

Add/update below lines. Please note that these properties may already exist for Derby database. In that case you should delete those properties and add this one. Or else you may get error.

Use Map Reduce as default execution engine
HIVE metadata is stored in a MySQL server
MySQL JDBC driver class
user name for connecting to mysql server
password for connecting to mysql server

Installation complete, now run hive command and you should see “hive>” prompt.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/apache-hive-2.3.0-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/lib/hive/apache-hive-2.3.0-bin/lib/hive-common-2.3.0.jar!/ Async: true

Run “show tables” command,

hive> show tables;
Time taken: 1.403 seconds, Fetched: 0 row(s)

If you receive below error then, may be duplicate derby metastore database properties may exist in hive-site.xml. You should remove those properties and just keep mysql database properties.

hive> show tables;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Now let’s create our first Hive table with below command on hive.

hive> create table employee (id string, name string, dept string);
Time taken: 0.617 seconds

Insert few records in this table. Please note while inserting records hive will generate Map Reduce jobs to do insert.

hive>insert into employee values("1","Allen","IT");
hive>insert into employee values("2","Mag","Sales");
hive>insert into employee values("3","Rob","Sales");
hive>insert into employee values("4","Dana","IT");

While inserting data if MAPREDUCE job gets stuck and does not return then try, stopping HDFS and starting it again, then wait for 5 minutes and then start hive. Sometimes HDFS is not ready before we start HIVE so waiting for few minutes after HDFS starts is recommended.

Finally let’s look at employee table values.

hive> select * from employee;
1 Allen IT
2 Mag Sales
3 Rob Sales
4 Dana IT
Time taken: 0.157 seconds, Fetched: 4 row(s)

To see this table directory on HDFS, first exit from Hive prompt and then run below commands,

hive> quit;
root@quickstart:/# hadoop fs -ls /user/hive/warehouse
Found 1 item
drwxr-xr-x - root supergroup 0 2017-09-20 14:51 /user/hive/warehouse/employee

Hive installation is complete now.

For easy start of hive and mysql service in future we can add below command line in file. We created this file in “Part-1” tutorial

vi ~/
#Add below line at the end of file
service mysql start

Next time when you start docker, you can just run file to start various hadoop services.