架設Hadoop HA

跳過偽分布式直接架設HA,相關概念請看Youtube

Hadoop Server 分工內容

架設七台Server,取名分別是Hadoop1 - 7,每台Server分工內容如下

  • hadoop1 Name Node、 zkfc

  • hadoop2 Name Node、zkfc

  • hadoop3 Resource Manager

  • hadoop4 Resource Manager

  • hadoop5 Zookeeper、Journal Node、Data Node、Node Manager

  • hadoop6 Zookeeper、Journal Node、Data Node、Node Manager

  • hadoop7 Zookeeper、Journal Node、Data Node、Node Manager

配置hadoop1 - 7 Server

設定Hosts

其台都要設定,下方配置檔的IP是我目前環境的實際IP

$ vi /etc/hosts

加入以下IP

12.345.6.145 hadoop1
12.345.6.144 hadoop2
12.345.6.143 hadoop3
12.345.6.142 hadoop4
12.345.6.141 hadoop5
12.345.6.140 hadoop6
12.345.6.139 hadoop7

關掉防火牆跟SELinux

建議先關掉,以後可以在加回去

$ setenforce 0
$ sysctemctl stop firewalld
$ sysctemctl disable firewalld

新增hadoop帳號

每一台Server都要新增Hadoop帳號,並登入Hadoop身份

$ adduser hadoop
$ passwd hadoop
$ su - hadoop

建立無密登入

每一台Server彼此之間都要作無密登入設定 (包括自己)

建立Public Key

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

設定無密登入,haddop1 - 7,都要設定,下面指令只是某台Server跟hadoop1設定無密登入的指令

$ ssh-copy-id hadoop1

安裝Java

$ yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel

設定必要環境變數

$vi ~/.bashrc

請在.bashrc新增以下環境變數內容

export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_HOME=/usr
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"

使環境變數生效

$ source ~/.bashrc

安裝Hadoop

如果未來要支援Hbase 2.1.0,就要安裝Hadoop2.7.7版本

下載Hadoop 2.7.7到hadoop1 Server的hadoop User家目錄

$ cd ~
$ wget http://apache.stu.edu.tw/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
$ tar -zxvf hadoop-2.7.7.tar.gz
$ mv hadoop-2.7.7 hadoop

配置core-site.xml

$ vi ~/hadoop/etc/hadoop/core-site.xml

配置內容

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ns1/</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hadoop/tmp</value>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop5:2181,hadoop6:2181,hadoop7:2181</value>
  </property>
</configuration>

配置hdfs-site.xml

 $ vi ~/hadoop/etc/hadoop/hdfs-site.xml 

配置內容

<configuration>
  <property>
    <name>dfs.nameservices</name>
    <value>ns1</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn1,nn2</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn1</name>
    <value>hadoop1:9000</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn1</name>
    <value>hadoop1:50070</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn2</name>
    <value>hadoop2:9000</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn2</name>
    <value>hadoop2:50070</value>
  </property>
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop5:8485;hadoop6:8485;hadoop7:8485/ns1</value>
  </property>
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/hadoop/journaldata</value>
  </property>
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>
      sshfence
      shell(/bin/true)
    </value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
</configuration>

配置yarn-site.xml

$ vi ~/hadoop/etc/hadoop/yarn-site.xml

配置內容

<configuration>
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yrc</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>hadoop3</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>hadoop4</value>
  </property>
  <property> 
    <name>yarn.resourcemanager.webapp.address.rm1</name>  
    <value>hadoop3:8088</value> 
  </property>  
  <property> 
    <name>yarn.resourcemanager.webapp.address.rm2</name>  
    <value>hadoop4:8088</value> 
  </property>
  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>hadoop5:2181,hadoop6:2181,hadoop7:2181</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>
  <property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
  </property>
  <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
  </property>
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
  </property
</configuration>

配置core-site.xml

$ vi ~/hadoop/etc/hadoop/mapred-site.xml

配置內容

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <description>Classpath for typical applications.</description>
    <name>yarn.application.classpath</name>
    <value>/home/hadoop/hadoop/etc/hadoop,/home/hadoop/hadoop/share/hadoop/common/lib/*,/home/hadoop/hadoop/share/hadoop/common/*,/home/hadoop/hadoop/share/hadoop/hdfs,/home/hadoop/hadoop/share/hadoop/hdfs/lib/*,/home/hadoop/hadoop/share/hadoop/hdfs/*,/home/hadoop/hadoop/share/hadoop/mapreduce/*,/home/hadoop/hadoop/share/hadoop/yarn,/home/hadoop/hadoop/share/hadoop/yarn/lib/*,/home/hadoop/hadoop/share/hadoop/yarn/*</value>
  </property>
</configuration>

配置slaves

題外話,在hadoop 3.x版本slaves檔案變成workers

$ vi ~/hadoop/etc/hadoop/slaves

配置內容

hadoop5
hadoop6
hadoop7

將Hadoop從Hadoop1 Server複製到其他Server的hadoop User家目錄

$ scp -r ~/hadoop hadoop2:/home/hadoop/
$ scp -r ~/hadoop hadoop3:/home/hadoop/
$ scp -r ~/hadoop hadoop4:/home/hadoop/
$ scp -r ~/hadoop hadoop5:/home/hadoop/
$ scp -r ~/hadoop hadoop6:/home/hadoop/
$ scp -r ~/hadoop hadoop7:/home/hadoop/

安裝Zookeeper

下載Zookeeper 3.4.12到Hadoop5 Server的Hadoop User家目錄

$ wget http://apache.stu.edu.tw/zookeeper/stable/zookeeper-3.4.12.tar.gz
$ tar -xvzf zookeeper-3.4.12.tar.gz
$ mv zookeeper-3.4.12 zookeeper

配置zoo.conf

$ vi ~/zookeeper/conf/zoo.cfg 

配置內容

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/zookeeper/data
clientPort=2181
server.1=hadoop5:2888:3888
server.2=hadoop6:2888:3888
server.3=hadoop7:2888:3888

將Zookeeper從hadoop5 Server複製到其他Server的hadoop User家目錄

$ scp -r ~/zookeeper hadoop6:/home/hadoop/
$ scp -r  ~/zookeeper hadoop7:/home/hadoop/

配置MyId

配置hadoop5 Server MyId

$ echo 1 > ~/zookeeper/data/myid

配置hadoop6 Server MyId

$ echo 2 > ~/zookeeper/data/myid

配置hadoop7 Server MyId

$ echo 3 > ~/zookeeper/data/myid

啟動順序

啟動hadoop 5 - 7 Zookeeper

$ ~/zookeeper/bin/zkServer.sh start

啟動hadoop 5 - 7 Journal Node

$ ~/hadoop/sbin/hadoop-daemon.sh start journalnode

格式化HDFS Name Node

在Hadoop1 Server格式化,並將結果Copy 到Hadoop2 Server

$ ~/hadoop/bin/hdfs namenode -format
$ scp ~/hadoop/tmp hadoop2:/home/hadoop/hadoop

格式化ZKFC

$ ~/hadoop/bin/hdfs zkfc -formatZK

在hadoop1 Server啟動HDFS

在Hadoop1啟動時,會同時啟動Hadoop2的Name Node

$ ~/hadoop/sbin/start-dfs.sh

在hadoop3 Server啟動Map Reduce

$ ~/hadoop/sbin/start-yarn.sh

在hadoop4 Server 啟動 Map Reduce 另一個 Resource Manager

$ ~/hadoop/sbin/yarn-daemon.sh start resourcemanager

測試

在hadoop3 Server測試Map Reduce範例

如果執行成功就代表部屬成功

$ hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar pi 5 5 

GitHub範例

請下載我在GitHub的範例作測試,測試前請在本機端設定hadoop1 - 7的host

Last updated