架設Hadoop HA
跳過偽分布式直接架設HA,相關概念請看Youtube
Hadoop Server 分工內容
架設七台Server,取名分別是Hadoop1 - 7,每台Server分工內容如下
hadoop5
Zookeeper、Journal Node、Data Node、Node Manager
hadoop6
Zookeeper、Journal Node、Data Node、Node Manager
hadoop7
Zookeeper、Journal Node、Data Node、Node Manager
配置hadoop1 - 7 Server
設定Hosts
其台都要設定,下方配置檔的IP是我目前環境的實際IP
加入以下IP
12.345.6.145 hadoop1
12.345.6.144 hadoop2
12.345.6.143 hadoop3
12.345.6.142 hadoop4
12.345.6.141 hadoop5
12.345.6.140 hadoop6
12.345.6.139 hadoop7
關掉防火牆跟SELinux
建議先關掉,以後可以在加回去
$ setenforce 0
$ sysctemctl stop firewalld
$ sysctemctl disable firewalld
新增hadoop帳號
每一台Server都要新增Hadoop帳號,並登入Hadoop身份
$ adduser hadoop
$ passwd hadoop
$ su - hadoop
建立無密登入
每一台Server彼此之間都要作無密登入設定 (包括自己)
建立Public Key
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
設定無密登入,haddop1 - 7,都要設定,下面指令只是某台Server跟hadoop1設定無密登入的指令
$ ssh-copy-id hadoop1
安裝Java
$ yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel
設定必要環境變數
請在.bashrc新增以下環境變數內容
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_HOME=/usr
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
使環境變數生效
安裝Hadoop
如果未來要支援Hbase 2.1.0,就要安裝Hadoop2.7.7版本
下載Hadoop 2.7.7到hadoop1 Server的hadoop User家目錄
$ cd ~
$ wget http://apache.stu.edu.tw/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
$ tar -zxvf hadoop-2.7.7.tar.gz
$ mv hadoop-2.7.7 hadoop
配置core-site.xml
$ vi ~/hadoop/etc/hadoop/core-site.xml
配置內容
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop5:2181,hadoop6:2181,hadoop7:2181</value>
</property>
</configuration>
配置hdfs-site.xml
$ vi ~/hadoop/etc/hadoop/hdfs-site.xml
配置內容
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>hadoop1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>hadoop1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>hadoop2:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>hadoop2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop5:8485;hadoop6:8485;hadoop7:8485/ns1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/hadoop/journaldata</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
配置yarn-site.xml
$ vi ~/hadoop/etc/hadoop/yarn-site.xml
配置內容
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop4</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop3:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop4:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop5:2181,hadoop6:2181,hadoop7:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property
</configuration>
配置core-site.xml
$ vi ~/hadoop/etc/hadoop/mapred-site.xml
配置內容
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>/home/hadoop/hadoop/etc/hadoop,/home/hadoop/hadoop/share/hadoop/common/lib/*,/home/hadoop/hadoop/share/hadoop/common/*,/home/hadoop/hadoop/share/hadoop/hdfs,/home/hadoop/hadoop/share/hadoop/hdfs/lib/*,/home/hadoop/hadoop/share/hadoop/hdfs/*,/home/hadoop/hadoop/share/hadoop/mapreduce/*,/home/hadoop/hadoop/share/hadoop/yarn,/home/hadoop/hadoop/share/hadoop/yarn/lib/*,/home/hadoop/hadoop/share/hadoop/yarn/*</value>
</property>
</configuration>
配置slaves
題外話,在hadoop 3.x版本slaves檔案變成workers
$ vi ~/hadoop/etc/hadoop/slaves
配置內容
hadoop5
hadoop6
hadoop7
將Hadoop從Hadoop1 Server複製到其他Server的hadoop User家目錄
$ scp -r ~/hadoop hadoop2:/home/hadoop/
$ scp -r ~/hadoop hadoop3:/home/hadoop/
$ scp -r ~/hadoop hadoop4:/home/hadoop/
$ scp -r ~/hadoop hadoop5:/home/hadoop/
$ scp -r ~/hadoop hadoop6:/home/hadoop/
$ scp -r ~/hadoop hadoop7:/home/hadoop/
安裝Zookeeper
下載Zookeeper 3.4.12到Hadoop5 Server的Hadoop User家目錄
$ wget http://apache.stu.edu.tw/zookeeper/stable/zookeeper-3.4.12.tar.gz
$ tar -xvzf zookeeper-3.4.12.tar.gz
$ mv zookeeper-3.4.12 zookeeper
配置zoo.conf
$ vi ~/zookeeper/conf/zoo.cfg
配置內容
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/zookeeper/data
clientPort=2181
server.1=hadoop5:2888:3888
server.2=hadoop6:2888:3888
server.3=hadoop7:2888:3888
將Zookeeper從hadoop5 Server複製到其他Server的hadoop User家目錄
$ scp -r ~/zookeeper hadoop6:/home/hadoop/
$ scp -r ~/zookeeper hadoop7:/home/hadoop/
配置MyId
配置hadoop5 Server MyId
$ echo 1 > ~/zookeeper/data/myid
配置hadoop6 Server MyId
$ echo 2 > ~/zookeeper/data/myid
配置hadoop7 Server MyId
$ echo 3 > ~/zookeeper/data/myid
啟動順序
啟動hadoop 5 - 7 Zookeeper
$ ~/zookeeper/bin/zkServer.sh start
啟動hadoop 5 - 7 Journal Node
$ ~/hadoop/sbin/hadoop-daemon.sh start journalnode
格式化HDFS Name Node
在Hadoop1 Server格式化,並將結果Copy 到Hadoop2 Server
$ ~/hadoop/bin/hdfs namenode -format
$ scp ~/hadoop/tmp hadoop2:/home/hadoop/hadoop
格式化ZKFC
$ ~/hadoop/bin/hdfs zkfc -formatZK
在hadoop1 Server啟動HDFS
在Hadoop1啟動時,會同時啟動Hadoop2的Name Node
$ ~/hadoop/sbin/start-dfs.sh
在hadoop3 Server啟動Map Reduce
$ ~/hadoop/sbin/start-yarn.sh
在hadoop4 Server 啟動 Map Reduce 另一個 Resource Manager
$ ~/hadoop/sbin/yarn-daemon.sh start resourcemanager
測試
在hadoop3 Server測試Map Reduce範例
如果執行成功就代表部屬成功
$ hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar pi 5 5
GitHub範例