架設Spark
Hadoop Server 分工內容
hadoop1 Master
hadoop3 Worker
hadoop4 Worker
配置Hadoop1、3、4
配置Scala
下載Scala到hadoop User家目錄
$ wget https://downloads.lightbend.com/scala/2.12.6/scala-2.12.6.tgz
$ tar -zxvf scala-2.12.6.tgz
$ mv scala-2.12.6 scala
設定必要環境變數
請在.bashrc新增以下環境變數內容
vi ~/.bashrc
變數內容
export SCALA_HOME=/home/hadoop/scala
export PATH=$SCALA_HOME/bin:$PATH
export SPARK_HOME=/home/hadoop/spark
export PATH=$SPARK_HOME/bin:$PATH
使環境變數生效
$ source ~/.bashrc
檢查Scala版本號
$ scala -version
Scala code runner version 2.12.6 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
安裝Spark
下載Spark 2.3.1到Hadoop1 Server的hadoop User家目錄
要下載Source Code版本,原因是要打包沒有Hive版本的Spark
$ cd ~
$ wget http://apache.stu.edu.tw/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
$ tar -zxvf spark-2.3.1.tgz
$ ~/spark/dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided,orc-provided"
$ cp ~/spark/spark-2.3.1-bin-hadoop2-without-hive.tgz ~/
$ tar -zxvf spark-2.3.1-bin-hadoop2-without-hive.tgz
$ mv spark-2.3.1-bin-hadoop2-without-hive spark
配置spark-env.sh
$ cp ~/spark/conf/spark-env.sh.template ~/spark/conf/spark-env.sh
$ vi ~/spark/conf/spark-env.sh
配置內容
export SCALA_HOME=/home/hadoop/scala
export JAVA_HOME=/usr
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/home/hadoop/spark
export SPARK_MASTER_IP=hadoop1
export SPARK_EXECUTOR_MEMORY=1G
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
配置slaves
$ vi ~/spark/conf/slaves
配置內容
hadoop3
hadoop4
將Spark從hadoop1 Server複製到其他Server的hadoop User家目錄
$ scp -r ~/spark hadoop3:/home/hadoop/
$ scp -r ~/spark hadoop4:/home/hadoop/
啟動Spark
在hadoop1 Server執行指令
$ ~/spark/sbin/start-all.sh
測試
瀏覽hadoop1 Spark網站
看網站是否有順利開啟,以及是否有2個worker
http://hadoop1:8080/
在hadoop1 Server測試單機執行圓周率任務
$ ~/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master local ~/spark/examples/jars/spark-examples_2.11-2.3.1.jar
在hadoop1 Server測試獨立Spark集群模式執行圓周率任務
$ ~/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://hadoop1:7077 ~/spark/examples/jars/spark-examples_2.11-2.3.1.jar
在hadoop1 Server測試yarn-cluster集群模式執行圓周率任務
$ ~/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ~/spark/examples/jars/spark-examples_2.11-2.3.1.jar
Last updated