# 架設Hadoop HA

## Hadoop Server 分工內容

架設七台Server，取名分別是Hadoop1 - 7，每台Server分工內容如下

* **hadoop1**\
  Name Node、 zkfc&#x20;
* **hadoop2** \
  Name Node、zkfc&#x20;
* **hadoop3**\
  Resource Manager
* **hadoop4**\
  Resource Manager
* **hadoop5**\
  Zookeeper、Journal Node、Data Node、Node Manager
* **hadoop6**\
  Zookeeper、Journal Node、Data Node、Node Manager
* **hadoop7**\
  Zookeeper、Journal Node、Data Node、Node Manager

## 配置hadoop1 - 7 Server

### 設定Hosts

其台都要設定，下方配置檔的IP是我目前環境的實際IP

```
$ vi /etc/hosts
```

加入以下IP

```
12.345.6.145 hadoop1
12.345.6.144 hadoop2
12.345.6.143 hadoop3
12.345.6.142 hadoop4
12.345.6.141 hadoop5
12.345.6.140 hadoop6
12.345.6.139 hadoop7
```

### 關掉防火牆跟SELinux

建議先關掉，以後可以在加回去

```
$ setenforce 0
$ sysctemctl stop firewalld
$ sysctemctl disable firewalld
```

### 新增hadoop帳號

每一台Server都要新增Hadoop帳號，並登入Hadoop身份

```
$ adduser hadoop
$ passwd hadoop
$ su - hadoop
```

### 建立無密登入

每一台Server彼此之間都要作無密登入設定 (包括自己)

建立Public Key

```
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
```

設定無密登入，haddop1 - 7，都要設定，下面指令只是某台Server跟hadoop1設定無密登入的指令

```
$ ssh-copy-id hadoop1
```

### 安裝Java

```
$ yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel
```

### 設定必要環境變數

```
$vi ~/.bashrc
```

請在.bashrc新增以下環境變數內容

```
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_HOME=/usr
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
```

使環境變數生效

```
$ source ~/.bashrc
```

## 安裝Hadoop

如果未來要支援Hbase 2.1.0，就要安裝Hadoop2.7.7版本

### 下載Hadoop 2.7.7到hadoop1 Server的hadoop User家目錄

```
$ cd ~
$ wget http://apache.stu.edu.tw/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
$ tar -zxvf hadoop-2.7.7.tar.gz
$ mv hadoop-2.7.7 hadoop
```

### 配置core-site.xml

```
$ vi ~/hadoop/etc/hadoop/core-site.xml
```

配置內容

```
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ns1/</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hadoop/tmp</value>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop5:2181,hadoop6:2181,hadoop7:2181</value>
  </property>
</configuration>
```

### 配置hdfs-site.xml

```
 $ vi ~/hadoop/etc/hadoop/hdfs-site.xml 
```

配置內容

```
<configuration>
  <property>
    <name>dfs.nameservices</name>
    <value>ns1</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn1,nn2</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn1</name>
    <value>hadoop1:9000</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn1</name>
    <value>hadoop1:50070</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn2</name>
    <value>hadoop2:9000</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn2</name>
    <value>hadoop2:50070</value>
  </property>
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop5:8485;hadoop6:8485;hadoop7:8485/ns1</value>
  </property>
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/hadoop/journaldata</value>
  </property>
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>
      sshfence
      shell(/bin/true)
    </value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
</configuration>
```

### 配置yarn-site.xml

```
$ vi ~/hadoop/etc/hadoop/yarn-site.xml
```

配置內容

```
<configuration>
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yrc</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>hadoop3</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>hadoop4</value>
  </property>
  <property> 
    <name>yarn.resourcemanager.webapp.address.rm1</name>  
    <value>hadoop3:8088</value> 
  </property>  
  <property> 
    <name>yarn.resourcemanager.webapp.address.rm2</name>  
    <value>hadoop4:8088</value> 
  </property>
  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>hadoop5:2181,hadoop6:2181,hadoop7:2181</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>
  <property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
  </property>
  <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
  </property>
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
  </property
</configuration>
```

### 配置core-site.xml

```
$ vi ~/hadoop/etc/hadoop/mapred-site.xml
```

配置內容

```
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <description>Classpath for typical applications.</description>
    <name>yarn.application.classpath</name>
    <value>/home/hadoop/hadoop/etc/hadoop,/home/hadoop/hadoop/share/hadoop/common/lib/*,/home/hadoop/hadoop/share/hadoop/common/*,/home/hadoop/hadoop/share/hadoop/hdfs,/home/hadoop/hadoop/share/hadoop/hdfs/lib/*,/home/hadoop/hadoop/share/hadoop/hdfs/*,/home/hadoop/hadoop/share/hadoop/mapreduce/*,/home/hadoop/hadoop/share/hadoop/yarn,/home/hadoop/hadoop/share/hadoop/yarn/lib/*,/home/hadoop/hadoop/share/hadoop/yarn/*</value>
  </property>
</configuration>
```

### 配置slaves

題外話，在hadoop 3.x版本slaves檔案變成workers

```
$ vi ~/hadoop/etc/hadoop/slaves
```

配置內容

```
hadoop5
hadoop6
hadoop7
```

### 將Hadoop從Hadoop1 Server複製到其他Server的hadoop User家目錄

```
$ scp -r ~/hadoop hadoop2:/home/hadoop/
$ scp -r ~/hadoop hadoop3:/home/hadoop/
$ scp -r ~/hadoop hadoop4:/home/hadoop/
$ scp -r ~/hadoop hadoop5:/home/hadoop/
$ scp -r ~/hadoop hadoop6:/home/hadoop/
$ scp -r ~/hadoop hadoop7:/home/hadoop/
```

## 安裝Zookeeper

### 下載Zookeeper 3.4.12到Hadoop5 Server的Hadoop User家目錄

```
$ wget http://apache.stu.edu.tw/zookeeper/stable/zookeeper-3.4.12.tar.gz
$ tar -xvzf zookeeper-3.4.12.tar.gz
$ mv zookeeper-3.4.12 zookeeper
```

### 配置zoo.conf

```
$ vi ~/zookeeper/conf/zoo.cfg 
```

配置內容

```
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/zookeeper/data
clientPort=2181
server.1=hadoop5:2888:3888
server.2=hadoop6:2888:3888
server.3=hadoop7:2888:3888
```

### 將Zookeeper從hadoop5 Server複製到其他Server的hadoop User家目錄

```
$ scp -r ~/zookeeper hadoop6:/home/hadoop/
$ scp -r  ~/zookeeper hadoop7:/home/hadoop/
```

### 配置MyId

#### 配置hadoop5 Server MyId

```
$ echo 1 > ~/zookeeper/data/myid
```

#### 配置hadoop6 Server MyId

```
$ echo 2 > ~/zookeeper/data/myid
```

#### 配置hadoop7 Server MyId

```
$ echo 3 > ~/zookeeper/data/myid
```

## 啟動順序

### 啟動hadoop 5 - 7 Zookeeper

```
$ ~/zookeeper/bin/zkServer.sh start
```

### 啟動hadoop 5 - 7 Journal Node

```
$ ~/hadoop/sbin/hadoop-daemon.sh start journalnode
```

### 格式化HDFS Name Node

在Hadoop1 Server格式化，並將結果Copy 到Hadoop2 Server

```
$ ~/hadoop/bin/hdfs namenode -format
$ scp ~/hadoop/tmp hadoop2:/home/hadoop/hadoop
```

### 格式化ZKFC

```
$ ~/hadoop/bin/hdfs zkfc -formatZK
```

### 在hadoop1 Server啟動HDFS

在Hadoop1啟動時，會同時啟動Hadoop2的Name Node

```
$ ~/hadoop/sbin/start-dfs.sh
```

### 在hadoop3 Server啟動Map Reduce

```
$ ~/hadoop/sbin/start-yarn.sh
```

### 在hadoop4  Server 啟動 Map Reduce 另一個 Resource Manager

```
$ ~/hadoop/sbin/yarn-daemon.sh start resourcemanager
```

## 測試

### 在hadoop3 Server測試Map Reduce範例

如果執行成功就代表部屬成功

```
$ hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar pi 5 5 
```

### GitHub範例 <a href="#github-li" id="github-li"></a>

請下載我在GitHub的[範例](https://github.com/Shark0/HadoopExample)作測試，測試前請在本機端設定hadoop1 - 7的host


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://shark.gitbook.io/hadoop/jia-hadoop-ha.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
