架設Hive On Spark

將Hive 分工引擎從Map Reduce 換成 Spark

配置Hive

複製必要的Spark Jar到Hive Lib

$ cp ~/spark/jars/scala-library-2.11.8.jar ~/hive/lib/
$ cp ~/spark/jars/spark-network-common_2.11-2.3.1.jar ~/hive/lib/
$ cp ~/spark/jars/spark-core_2.11-2.3.1.jar ~/hive/lib/

配置hive-site.xml

$ vi ~/hive/conf/hive-site.xml 

配置內容

<configuration>
  <!--jdbc-->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>shark</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>shark</value>
  </property>
  <property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
  </property>
  <!--spark engine -->
  <property>
    <name>hive.execution.engine</name>
    <value>spark</value>
  </property>
  <property>
    <name>hive.enable.spark.execution.engine</name>
    <value>true</value>
  </property>
  <!--sparkcontext -->
  <property>
    <name>spark.master</name>
    <!--
    <value>yarn-cluster</value>
    -->
    <value>spark://hadoop1:7077</value>
  </property>
  <property>
    <name>spark.serializer</name>
    <value>org.apache.spark.serializer.KryoSerializer</value>
  </property>
</configuration>

在配置spark.master時,測試過用yarn-cluster有時候明明Spark工作執行結束,但是Yarn卻一直在Padding,所以乾脆直接指定Spark,不過這樣作要把hive資料夾從hadoop1複製到Spark Worker所在的主機,因為Spark Worker要參照Hive的Lib

測試Hive on Spark

啟動Hive

測試

Last updated