You may prefer to install Hadoop from distributions like the ones from Cloudera or Hortonworks… Even better, deploy Hadoop in the cloud or with an appliance like Netapp’s or Oracle’s. Those solutions help build, manage and are ready for your operating system. That’s not so obvious with Apache Hadoop software library.
No Pain, no gain! These series of articles are written so that you can taste the benefits and also some of the Hadoop challenges. They rely on the rough latest release from Apache, i.e. 2.4.1. It allows to test the latest features if needed. And because Apache Hadoop comes with 32-bit compiled libraries, we’ll need to rebuild it from source. I’m kin to it, so I’ll be using Oracle Linux 7. It should not be too difficult to adapt it to RHEL7, CentOS7 or Fedora…
Package Installation
To build Hadoop from source, several packages, libraries and tools are required. The 3 commands below install more than what is necessary to perform that task:
# Tools and libraries useful for Oracle yum install procps module-init-tools ethtool \ initscripts bc bind-utils nfs-utils \ util-linux-ng pam xorg-x11-utils \ xorg-x11-xauth smartmontools binutils \ compat-libstdc++-33 gcc gcc-c++ glibc \ glibc-devel ksh libaio libaio-devel \ libgcc libstdc++ libstdc++-devel make \ sysstat openssh-clients compat-libcap1 # Additional tools and libraries for Hadoop yum install lzo zlib-devel autoconf automake \ libtool openssl-devel cmake # Additional tools for me (and probably you) yum install curl zip unzip gzip bzip2 rsync git mlocate \ strace gdb perf openssh-server elinks
Google Protocol Buffers Installation
An important prerequisite to compile Apache Hadoop is the availability of Google Protocol Buffers 2.5. You might want to install it from the EPEL 7 repository (for now beta). You can also install it from source:
curl -O https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.bz2 tar -jxvf protobuf-2.5.0.tar.bz2 cd protobuf-* ./configure make sudo make install
Note:
protobuf
default installation location, from source, is/usr/local/bin
. Make sure it is included in thePATH variable or change the prefix to build Hadoop.
Java SE 8 JDK Installation
Most of Hadoop is written in Java and you’ll need to install a Java SE JDK too. You can use Java SE 8 RPM from the Oracle website or rely on OpenJDK [2]:
yum install jdk-8u11-linux-x64.rpm
Add the 2 lines below in ~/.bashrc
or a profile
file to access Java during the build:
export JAVA_HOME=/usr/java/jdk1.8.0_11 export PATH=$JAVA_HOME/bin:$PATH
Maven Installation
Maven is used to build Hadoop. Download and install Maven distribution from one of Apache Mirror sites:
cd /usr/local/ sudo tar -zxvf /home/hadoop/apache-maven-3.2.2-bin.tar.gz \ --transform s/apache-maven-3.2.2/apache-maven/
Add the lines below in ~/.bashrc
or a profile
file to access maven during the build:
export M2_HOME=/usr/local/apache-maven export M2=$M2_HOME/bin export PATH=$M2:$PATH
Download Hadoop Source
Like all Apache projects, Hadoop software configuration manager is subversion. The good news is Apache also provides a git repository. Download Hadoop and checkout the 2.4.1 version:
git clone git://git.apache.org/hadoop-common.git cd hadoop-common git tag -l git checkout tags/release-2.4.1
Build Hadoop
You are done with installing the prerequisites and you should be good to run the build. The command below generates the distribution file, including the 64-bit dynamic C libraries. It should be archived and compressed in the hadoop-dist/target
directory:
mvn package -Dmaven.javadoc.skip=true -Pdist,native -DskipTests -Dtar
Note:
Hadoop Javadoc is not properly formed, including some unescaped punctuation characters. That is why you must skip it from the build.
Here we are, ready to install an Hadoop cluster on Oracle Linux 7…
References :
To know more about Hadoop build, read:
[1] How to Contribute to Hadoop Common.
[2] Hadoop Wiki Java Versions Page
1 réflexion sur “Hadoop for DBAs (2/13): Building Hadoop for Oracle Linux 7”
I am very much impressed with your article. I am working as Oracle DBA with 4 yrs of experience and maintaining huge database. and also having a knwledge on Hadoop..
Les commentaires sont fermés.