原创

Hive的安装与配置

安装与配置hive

安装hive之前需要到官网查询hive与Hadoop版本的兼容性。这里我所选的hive和Hadoop的版本如下:

  • hadoop-2.7.4
  • hive-2.3.5

1. 编辑hive的环境变量

2. 编辑 hive-env.sh 文件

# The heap size of the jvm stared by hive shell script can be controlled via:
#
export HADOOP_HEAPSIZE=2048
#
# Larger heap size may be required when running queries over large number of files or partitions. 
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 
# appropriate for hive server.

# export JAVA_HOME=/usr/local/java

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOM=/home/hadoop/hadoop-2.7.4/

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/hadoop/hive-235/conf

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
export HIVE_AUX_JARS_PATH=/home/hadoop/hive-235/lib

3. 编辑hive-site.xml文件

<configuration>
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
        <description>
            强制metastore的schema一致性,开启的话会校验在metastore中存储的信息的版本和hive的jar包中的版本一致性,并且关闭自动schema迁移,用户必须手动的升级hive并且迁移schema,关闭的话只会在版本不一致时给出警告,默认是false不开启;
        </description>
    </property>

    <property>
        <name>datanucleus.schema.autoCreateAll</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.auto.convert.join</name>
        <value>false</value>
        <description>根据输入文件的大小决定是否将普通join转换为mapjoin的一种优化,默认不开启false;</description>
    </property>

    <property>
        <name>hive.server2.enable.impersonation</name>
        <description>Enable user impersonation for HiveServer2</description>
        <value>true</value>
    </property>

    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>node1</value>
    </property>

    <property>
      <name>hive.exec.scratchdir</name>
      <value>/user/hive/tmp</value>
      <description>hive用来存储不同阶段的map/reduce的执行计划的目录,同时也存储中间输出结果,默认是/tmp/<user.name>/hive,我们实际一般会按组区分,然后组内自建一个tmp目录存储;</description>
    </property>

    <property>
      <name>hive.querylog.location</name>
      <value>/user/hive/log/hadoop</value>
      <description>Location of Hive run time structured log file</description>
    </property>

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
        <description>hive元数据目录</description>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://node6:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
        <description>hive MySQL元数据库连接URL</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>MySQL驱动程序</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>leo</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>Yyf5211314!</value>
    </property>

    <property>
        <name>hive.support.concurrency</name>
        <value>false</value>
        <description>hive是否支持并发,默认是false,支持读写锁的话,必须要起zookeeper;</description>
    </property>

    <property>
        <name>hive.enforce.bucketing</name>
        <value>false</value>
        <description>数据分桶是否被强制执行,默认false,如果开启,则写入table数据时会启动分桶.</description>
    </property>
    <property>
        <name>hive.exec.dynamic.partition.mode</name>
        <value>nonstrict</value>
        <description>默认strict,在strict模式下,动态分区的使用必须在一个静态分区确认的情况下,其他分区可以是动态;</description>
    </property>
    <property>
        <name>hive.txn.manager</name>
        <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
    </property>
    <property>
        <name>hive.compactor.initiator.on</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.compactor.worker.threads</name>
        <value>10</value>
    </property>

    <property>
        <name>hive.exec.max.dynamic.partitions</name>
        <value>100000</value>
        <description>动态分区的上限,默认1000;</description>
    </property>
    <property>
        <name>hive.exec.max.dynamic.partitions.pernode</name>
        <value>100000</value>
        <description>每个mapper/reducer节点可以创建的最大动态分区数,默认100;</description>
    </property>

    <property>
        <name>hive.exec.parallel.thread.number</name>
        <value>8</value>
        <description>就是控制对于同一个sql来说同时可以运行的job的最大值,该参数默认为8.此时最大可以同时运行8个job.</description>
    </property>
</configuration>
  • 创建hive的临时目录
mkdir -p /home/hadoop/hive-data/tmp/

并在hive-site.xml中修改:

把{system:java.io.tmpdir} 改成/home/hadoop/hive-data/tmp/

把 {system:user.name} 改成 {user.name}

  • 创建hive的hdfs目录
# 元数据目录
hadoop fs -mkdir -p /user/hive/warehouse

# 临时目录
hadoop fs -mkdir -p /user/hive/tmp

# 创建查询日志目录
hadoop fs -mkdir -p /user/hive/log

hadoop fs -chmod -R 777 /user/hive/warehouse  
hadoop fs -chmod -R 777 /user/hive/tmp  
hadoop fs -chmod -R 777 /user/hive/log

4. 下载或复制mysql的驱动包到hive的lib目录中

cd  /home/hadoop/hive-235/lib/
wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar

4. 初始化MySQL的hive数据库

cd  /home/hadoop/hive-235/bin/
./schematool -initSchema -dbType mysql

mysql

5. 简单测试,进入hive shell

hive> create database test;
OK
Time taken: 0.175 seconds
hive> create table test_tab (name string,age int);
OK
Time taken: 0.82 seconds
hive> insert into test_tab values('yyf',23);
hive> select * from test_tab;
OK
yyf 23
Time taken: 0.167 seconds, Fetched: 1 row(s)

小节

以上简单记录了hive的安装与配置过程,更多详细的配置还请参考官网,上述记录有错误之处,还请大家指正。

正文到此结束