Part-2: Install Hive

In this part we will discuss how to install HIVE on HDP platform. I assume that you have followed instructions from Part-1 on how to install HDP on single node cluster.

Get latest version of “hive-*-bin.tar.gz” file link from below official Hive site,
https://hive.apache.org/downloads.html

Go to Downloads directory and download latest tar.gz file.

cd ~/Downloads
wget http://apache.claz.org/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz

extract tar file,

tar -zxvf apache-hive-2.3.0-bin.tar.gz

Create directory for hive and move extracted folder there.

mkdir /usr/lib/hive
mv apache-hive-2.3.0-bin /usr/lib/hive

Set hive home path in .bashrc file

vi ~/.bashrc

add below lines at the end

# Set HIVE_HOME
export HIVE_HOME="/usr/lib/hive/apache-hive-2.3.0-bin"
PATH=$PATH:$HIVE_HOME/bin
export PATH

Reload environment variables,

source ~/.bashrc

Edit hive configuration file,

vi /usr/lib/hive/apache-hive-2.3.0-bin/bin/hive-config.sh

Go to line where following lines exist

# Allow alternate conf dir location.HIVE_CONF_DIR="${HIVE_CONF_DIR:-$HIVE_HOME/conf"export HIVE_CONF_DIR=$HIVE_CONF_DIRexport HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH

Below above lines, add following line which is home path for hadoop.

# Write directory path where hadoop file is there
export HADOOP_HOME=/usr/local/hadoop

Create Hive database directory on HDFS,

hadoop fs -mkdir /usr/
hadoop fs -mkdir /usr/hive
hadoop fs -mkdir /usr/hive/warehouse
hadoop fs -mkdir /tmp

Set READ/WRITE permission for warehouse and temporary directory

hadoop fs -chmod g+w /usr/hive/warehouse
hadoop fs -chmod g+w /tmp

Edit Hive environment variables

cd /usr/lib/hive/apache-hive-2.3.0-bin/conf
cp hive-env.sh.template hive-env.sh
vi /usr/lib/hive/apache-hive-2.3.0-bin/conf/hive-env.sh

add below lines at the end of hive-env.sh file,

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_HEAPSIZE=512
export HIVE_CONF_DIR=/usr/lib/hive/apache-hive-2.3.0-bin/conf
export JAVA_HOME=/usr/local/java/jdk1.8.0_144
export YARN_HOME=$HADOOP_YARN_HOME

Set hive-env as executable

chmod +x hive-env.sh

To enable logging when starting hive, you can rename below log4j template files. This step is optional but can be useful when you receive errors from Hive for additional debugging information.

mv  hive-exec-log4j.properties.template  hive-exec-log4j.properties
mv hive-log4j.properties.template hive-log4j.properties

Add/update below classpath property in mapred-site.xml file. This line may already be correct in mapred-site.xml, but what I found in my case is for some reason, default paths with environment variables did not work and they were throwing errors while running Hive queries. So I had to set add absolute paths without using environment variables which you can see at the end of property value starting with /usr/local/hadoop…

mapreduce.application.classpath
$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH,/usr/local/hadoop/share/hadoop/mapreduce/*,/usr/local/hadoop/share/hadoop/mapreduce/lib/*,/usr/local/hadoop/share/hadoop/hdfs/*,/usr/local/hadoop/share/hadoop/hdfs/lib/*,/usr/local/hadoop/share/hadoop/common/lib/*,/usr/local/hadoop/share/hadoop/common/*,/usr/local/hadoop/share/hadoop/yarn/lib/*,/usr/local/hadoop/share/hadoop/yarn/*

By default, Hive uses Derby database as metastore. If you want to use default Derby database, then initialize it’s schema using below commands.

cd /usr/lib/hive/apache-hive-2.3.0-bin
bin/schematool -initSchema -dbType derby

It’s also possible to use other databases like mysql as metastore. I was constantly getting Derby database corruption isue while running Hive queries so I recommend that you use mysql as metastore database. Hive will run much stable with it.

To use mysql with HIVE metastore, you need mysql server installed. If you don’t have it, then first install mysql server using below command,

apt-get install mysql-server

During installation it will ask you to set database user “root” password. Set it and note it down.

Install mysql java connector which is needed for making Hive connection.

apt-get install libmysql-java

Link mysql java connector in Hive library folder

ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar

Start MySQL server,

service mysql start

Connect to MySQL and create metastore database for hive,

mysql -u root -p
mysql>CREATE DATABASE hive_metastore;
mysql>USE hive_metastore;

On MySQL prompt, use mysql metastore database template (.sql file) provided with hive installation to create metastore tables.

Generate metastore using database template provided at hive installation directory. Make sure to use correct version of template as per Hive version. In our case Hive version is 2.3.0

mysql>SOURCE /usr/lib/hive/apache-hive-2.3.0-bin/scripts/metastore/upgrade/mysql/hive-schema-2.3.0.mysql.sql

Create metastore specific mysql user to use with HIVE.

mysql>CREATE USER 'hive_user'@'%' IDENTIFIED BY 'hive_password';
mysql>GRANT all on *.* to 'hive_user'@localhost identified by 'hive_password';
mysql>flush privileges;

Create hive-site.xml file and add below configurations to use mysql database as default metastore.

vi $HIVE_HOME/conf/hive-site.xml

Add/update below lines. Please note that these properties may already exist for Derby database. In that case you should delete those properties and add this one. Or else you may get error.

hive.execution.engine
mr
Use Map Reduce as default execution engine
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost/hive_metastore?createDatabaseIfNotExist=true&useSSL=false
HIVE metadata is stored in a MySQL server
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
MySQL JDBC driver class
javax.jdo.option.ConnectionUserName
hive_user
user name for connecting to mysql server
javax.jdo.option.ConnectionPassword
hive_password
password for connecting to mysql server

Installation complete, now run hive command and you should see “hive>” prompt.

hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/apache-hive-2.3.0-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/lib/hive/apache-hive-2.3.0-bin/lib/hive-common-2.3.0.jar!/hive-log4j2.properties Async: true
hive>

Run “show tables” command,

hive> show tables;
OK
Time taken: 1.403 seconds, Fetched: 0 row(s)

If you receive below error then, may be duplicate derby metastore database properties may exist in hive-site.xml. You should remove those properties and just keep mysql database properties.

hive> show tables;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Now let’s create our first Hive table with below command on hive.

hive> create table employee (id string, name string, dept string);
OK
Time taken: 0.617 seconds

Insert few records in this table. Please note while inserting records hive will generate Map Reduce jobs to do insert.

hive>insert into employee values("1","Allen","IT");
hive>insert into employee values("2","Mag","Sales");
hive>insert into employee values("3","Rob","Sales");
hive>insert into employee values("4","Dana","IT");

While inserting data if MAPREDUCE job gets stuck and does not return then try, stopping HDFS and starting it again, then wait for 5 minutes and then start hive. Sometimes HDFS is not ready before we start HIVE so waiting for few minutes after HDFS starts is recommended.

Finally let’s look at employee table values.

hive> select * from employee;
OK
1 Allen IT
2 Mag Sales
3 Rob Sales
4 Dana IT
Time taken: 0.157 seconds, Fetched: 4 row(s)

To see this table directory on HDFS, first exit from Hive prompt and then run below commands,

hive> quit;
root@quickstart:/# hadoop fs -ls /user/hive/warehouse
Found 1 item
drwxr-xr-x - root supergroup 0 2017-09-20 14:51 /user/hive/warehouse/employee

Hive installation is complete now.

For easy start of hive and mysql service in future we can add below command line in start.sh file. We created this file in “Part-1” tutorial

vi ~/start.sh
#Add below line at the end of file
service mysql start

Next time when you start docker, you can just run start.sh file to start various hadoop services.

2 thoughts to “Part-2: Install Hive”

  1. sree says:

    Hi
    i have followed above steps and i getting permission denies on below steps.
    mv apache-hive-2.3.0-bin /usr/lib/hive

    then i used sudo, it is working fine.

    finally hive installed and getting below error while running show databases command on hive

    SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

    Thanks
    Sreeni

    1. Hadoop Tutorials says:

      Hi,
      My instructions above are using root account on a docker instance. If you are trying to install on a regular ubuntu machine, then I suggest using sudo with every command on this article. This way it will ensure that hive service is running with highest privilege.

Leave a Reply

Your email address will not be published. Required fields are marked *