Hadoop for DBAs (2/13): Building Hadoop for Oracle Linux 7

You may prefer to install Hadoop from distributions like the ones from Cloudera or Hortonworks… Even better, deploy Hadoop in the cloud or with an appliance like Netapp’s or Oracle’s. Those solutions help build, manage and are ready for your operating system. That’s not so obvious with Apache Hadoop software library.
No Pain, no gain! These series of articles are written so that you can taste the benefits and also some of the Hadoop challenges. They rely on the rough latest release from Apache, i.e. 2.4.1. It allows to test the latest features if needed. And because Apache Hadoop comes with 32-bit compiled libraries, we’ll need to rebuild it from source. I’m kin to it, so I’ll be using Oracle Linux 7. It should not be too difficult to adapt it to RHEL7, CentOS7 or Fedora…

Package Installation

To build Hadoop from source, several packages, libraries and tools are required. The 3 commands below install more than what is necessary to perform that task:

# Tools and libraries useful for Oracle
yum install procps module-init-tools ethtool \
initscripts bc bind-utils nfs-utils \
util-linux-ng pam xorg-x11-utils \
xorg-x11-xauth smartmontools binutils \
compat-libstdc++-33 gcc gcc-c++ glibc \
glibc-devel ksh libaio libaio-devel \
libgcc libstdc++ libstdc++-devel make \
sysstat openssh-clients compat-libcap1
# Additional tools and libraries for Hadoop
yum install  lzo  zlib-devel  autoconf automake \
libtool  openssl-devel cmake
# Additional tools for me (and probably you)
yum install curl zip unzip gzip bzip2 rsync git mlocate \
strace gdb perf openssh-server elinks

Google Protocol Buffers Installation

An important prerequisite to compile Apache Hadoop is the availability of Google Protocol Buffers 2.5. You might want to install it from the EPEL 7 repository  (for now beta). You can also install it from source:

curl -O https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.bz2
tar -jxvf protobuf-2.5.0.tar.bz2
cd protobuf-*
./configure
make
sudo make install

Note:
protobuf default installation location, from source, is /usr/local/bin. Make sure it is included in the PATH variable or change the prefix to build Hadoop.

Java SE 8 JDK Installation

Most of Hadoop is written in Java and you’ll need to install a Java SE JDK too. You can use Java SE 8 RPM from the Oracle website or rely on OpenJDK [2]:

yum install jdk-8u11-linux-x64.rpm

Add the 2 lines below in ~/.bashrc or a profile file to access Java during the build:

export JAVA_HOME=/usr/java/jdk1.8.0_11
export PATH=$JAVA_HOME/bin:$PATH

Maven Installation

Maven is used to build Hadoop. Download and install Maven distribution from one of Apache Mirror sites:

cd /usr/local/
sudo tar -zxvf /home/hadoop/apache-maven-3.2.2-bin.tar.gz \
   --transform s/apache-maven-3.2.2/apache-maven/

Add the lines below in ~/.bashrc or a profile file to access maven during the build:

export M2_HOME=/usr/local/apache-maven
export M2=$M2_HOME/bin
export PATH=$M2:$PATH

Download Hadoop Source

Like all Apache projects, Hadoop software configuration manager is subversion. The good news is Apache also provides a git repository. Download Hadoop and checkout the 2.4.1 version:

git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
git tag -l
git checkout tags/release-2.4.1

Build Hadoop

You are done with installing the prerequisites and you should be good to run the build. The command below generates the distribution file, including the 64-bit dynamic C libraries. It should be archived and compressed in the hadoop-dist/target directory:

mvn package -Dmaven.javadoc.skip=true -Pdist,native -DskipTests -Dtar

Note:
Hadoop Javadoc is not properly formed, including some unescaped punctuation characters. That is why you must skip it from the build.

Here we are, ready to install an Hadoop cluster on Oracle Linux 7…

References :
To know more about Hadoop build, read:
[1] How to Contribute to Hadoop Common.
[2] Hadoop Wiki Java Versions Page

1 réflexion sur “Hadoop for DBAs (2/13): Building Hadoop for Oracle Linux 7”

  1. I am very much impressed with your article. I am working as Oracle DBA with 4 yrs of experience and maintaining huge database. and also having a knwledge on Hadoop..

Les commentaires sont fermés.