架設Spark

Hadoop Server 分工內容

  • hadoop1 Master

  • hadoop3 Worker

  • hadoop4 Worker

配置Hadoop1、3、4

配置Scala

下載Scala到hadoop User家目錄

$ wget https://downloads.lightbend.com/scala/2.12.6/scala-2.12.6.tgz
$ tar -zxvf scala-2.12.6.tgz 
$ mv scala-2.12.6 scala

設定必要環境變數

請在.bashrc新增以下環境變數內容

vi ~/.bashrc 

變數內容

export SCALA_HOME=/home/hadoop/scala
export PATH=$SCALA_HOME/bin:$PATH
export SPARK_HOME=/home/hadoop/spark
export PATH=$SPARK_HOME/bin:$PATH

使環境變數生效

$ source ~/.bashrc

檢查Scala版本號

$ scala -version
Scala code runner version 2.12.6 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

安裝Spark

下載Spark 2.3.1到Hadoop1 Server的hadoop User家目錄

要下載Source Code版本,原因是要打包沒有Hive版本的Spark

$ cd ~
$ wget http://apache.stu.edu.tw/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
$ tar -zxvf spark-2.3.1.tgz
$ ~/spark/dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided,orc-provided"
$ cp ~/spark/spark-2.3.1-bin-hadoop2-without-hive.tgz ~/
$ tar -zxvf spark-2.3.1-bin-hadoop2-without-hive.tgz
$ mv spark-2.3.1-bin-hadoop2-without-hive spark

配置spark-env.sh

$ cp ~/spark/conf/spark-env.sh.template ~/spark/conf/spark-env.sh
$ vi ~/spark/conf/spark-env.sh

配置內容

export SCALA_HOME=/home/hadoop/scala
export JAVA_HOME=/usr
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/home/hadoop/spark
export SPARK_MASTER_IP=hadoop1
export SPARK_EXECUTOR_MEMORY=1G
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

配置slaves

$ vi ~/spark/conf/slaves

配置內容

hadoop3
hadoop4

將Spark從hadoop1 Server複製到其他Server的hadoop User家目錄

$ scp -r ~/spark hadoop3:/home/hadoop/
$ scp -r ~/spark hadoop4:/home/hadoop/

啟動Spark

在hadoop1 Server執行指令

$ ~/spark/sbin/start-all.sh 

測試

瀏覽hadoop1 Spark網站

看網站是否有順利開啟,以及是否有2個worker

http://hadoop1:8080/

 在hadoop1 Server測試單機執行圓周率任務

$ ~/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master local ~/spark/examples/jars/spark-examples_2.11-2.3.1.jar 

在hadoop1 Server測試獨立Spark集群模式執行圓周率任務

$ ~/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://hadoop1:7077 ~/spark/examples/jars/spark-examples_2.11-2.3.1.jar 

在hadoop1 Server測試yarn-cluster集群模式執行圓周率任務

$ ~/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ~/spark/examples/jars/spark-examples_2.11-2.3.1.jar

Last updated