架設Hive On Spark
將Hive 分工引擎從Map Reduce 換成 Spark
配置Hive
複製必要的Spark Jar到Hive Lib
$ cp ~/spark/jars/scala-library-2.11.8.jar ~/hive/lib/
$ cp ~/spark/jars/spark-network-common_2.11-2.3.1.jar ~/hive/lib/
$ cp ~/spark/jars/spark-core_2.11-2.3.1.jar ~/hive/lib/
配置hive-site.xml
$ vi ~/hive/conf/hive-site.xml
配置內容
<configuration>
<!--jdbc-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>shark</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>shark</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<!--spark engine -->
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.enable.spark.execution.engine</name>
<value>true</value>
</property>
<!--sparkcontext -->
<property>
<name>spark.master</name>
<!--
<value>yarn-cluster</value>
-->
<value>spark://hadoop1:7077</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
</configuration>
在配置spark.master時,測試過用yarn-cluster有時候明明Spark工作執行結束,但是Yarn卻一直在Padding,所以乾脆直接指定Spark,不過這樣作要把hive資料夾從hadoop1複製到Spark Worker所在的主機,因為Spark Worker要參照Hive的Lib
$scp -r ~/hive hadoop3:/home/hadoop
$scp -r ~/hive hadoop4:/home/hadoop
測試Hive on Spark
啟動Hive
$ ~/hive/bin/hive
測試
hive> use demo;
hive> select count(*) from phone;
Query ID = hadoop_20180827103004_0fbfe6c7-f3c7-42ab-9161-9c2a06da2102
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Running with YARN Application = application_1534922925601_0006
Kill Command = /home/hadoop/hadoop/bin/yarn application -kill application_1534922925601_0006
Hive on Spark Session Web UI URL: http://hadoop6:34641
Query Hive on Spark job[0] stages: [0, 1]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------------
Stage-0 ........ 0 FINISHED 1 1 0 0 0
Stage-1 ........ 0 FINISHED 1 1 0 0 0
--------------------------------------------------------------------------------------
STAGES: 02/02 [==========================>>] 100% ELAPSED TIME: 10.19 s
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 10.19 second(s)
OK
7
Time taken: 51.209 seconds, Fetched: 1 row(s)
Last updated