In this part we will discuss how to install HIVE on HDP platform. I assume that you have followed instructions from Part-1 on how to install HDP on single node cluster.
Get latest version of “hive-*-bin.tar.gz” file link from below official Hive site,
Go to Downloads directory and download latest tar.gz file.
extract tar file,
Create directory for hive and move extracted folder there.
mv apache-hive-2.3.0-bin /usr/lib/hive
Set hive home path in .bashrc file
add below lines at the end
Reload environment variables,
Edit hive configuration file,
Go to line where following lines exist
Below above lines, add following line which is home path for hadoop.
Create Hive database directory on HDFS,
hadoop fs -mkdir /usr/hive
hadoop fs -mkdir /usr/hive/warehouse
hadoop fs -mkdir /tmp
Set READ/WRITE permission for warehouse and temporary directory
hadoop fs -chmod g+w /tmp
Edit Hive environment variables
cp hive-env.sh.template hive-env.sh
add below lines at the end of hive-env.sh file,
Set hive-env as executable
To enable logging when starting hive, you can rename below log4j template files. This step is optional but can be useful when you receive errors from Hive for additional debugging information.
mv hive-log4j.properties.template hive-log4j.properties
Add/update below classpath property in mapred-site.xml file. This line may already be correct in mapred-site.xml, but what I found in my case is for some reason, default paths with environment variables did not work and they were throwing errors while running Hive queries. So I had to set add absolute paths without using environment variables which you can see at the end of property value starting with /usr/local/hadoop…
By default, Hive uses Derby database as metastore. If you want to use default Derby database, then initialize it’s schema using below commands.
bin/schematool -initSchema -dbType derby
It’s also possible to use other databases like mysql as metastore. I was constantly getting Derby database corruption isue while running Hive queries so I recommend that you use mysql as metastore database. Hive will run much stable with it.
To use mysql with HIVE metastore, you need mysql server installed. If you don’t have it, then first install mysql server using below command,
During installation it will ask you to set database user “root” password. Set it and note it down.
Install mysql java connector which is needed for making Hive connection.
Link mysql java connector in Hive library folder
Start MySQL server,
Connect to MySQL and create metastore database for hive,
mysql>CREATE DATABASE hive_metastore;
On MySQL prompt, use mysql metastore database template (.sql file) provided with hive installation to create metastore tables.
Generate metastore using database template provided at hive installation directory. Make sure to use correct version of template as per Hive version. In our case Hive version is 2.3.0
Create metastore specific mysql user to use with HIVE.
mysql>GRANT all on *.* to 'hive_user'@localhost identified by 'hive_password';
Create hive-site.xml file and add below configurations to use mysql database as default metastore.
Add/update below lines. Please note that these properties may already exist for Derby database. In that case you should delete those properties and add this one. Or else you may get error.
Use Map Reduce as default execution engine
HIVE metadata is stored in a MySQL server
MySQL JDBC driver class
user name for connecting to mysql server
password for connecting to mysql server
Installation complete, now run hive command and you should see “hive>” prompt.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/apache-hive-2.3.0-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/spark/spark-2.2.0-bin-hadoop2-without-hive/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/lib/hive/apache-hive-2.3.0-bin/lib/hive-common-2.3.0.jar!/hive-log4j2.properties Async: true
Run “show tables” command,
Time taken: 1.403 seconds, Fetched: 0 row(s)
If you receive below error then, may be duplicate derby metastore database properties may exist in hive-site.xml. You should remove those properties and just keep mysql database properties.
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Now let’s create our first Hive table with below command on hive.
Time taken: 0.617 seconds
Insert few records in this table. Please note while inserting records hive will generate Map Reduce jobs to do insert.
hive>insert into employee values("2","Mag","Sales");
hive>insert into employee values("3","Rob","Sales");
hive>insert into employee values("4","Dana","IT");
While inserting data if MAPREDUCE job gets stuck and does not return then try, stopping HDFS and starting it again, then wait for 5 minutes and then start hive. Sometimes HDFS is not ready before we start HIVE so waiting for few minutes after HDFS starts is recommended.
Finally let’s look at employee table values.
1 Allen IT
2 Mag Sales
3 Rob Sales
4 Dana IT
Time taken: 0.157 seconds, Fetched: 4 row(s)
To see this table directory on HDFS, first exit from Hive prompt and then run below commands,
root@quickstart:/# hadoop fs -ls /user/hive/warehouse
Found 1 item
drwxr-xr-x - root supergroup 0 2017-09-20 14:51 /user/hive/warehouse/employee
Hive installation is complete now.
For easy start of hive and mysql service in future we can add below command line in start.sh file. We created this file in “Part-1” tutorial
#Add below line at the end of file
service mysql start
Next time when you start docker, you can just run start.sh file to start various hadoop services.