InfoSphere BigInsights is IBM’s bigdata offering to help
organizations discover and analyze business insights hidden in large volumes of
a diverse range of data – data that’s often ignored or discarded because it’s
too huge, impractical or difficult to process using traditional means. Examples
include log records, click streams, social media data, news feeds, emails,
electronic sensor output, and even transactional data.
BigInsights brings the power of open source Apache Hadoop
project to enterprise. In addition,
there are a number of IBM value-add components that make up this Enterprise
Analytics platform. These value-adds are in the areas of analysis and
discovery, security, enterprise software integration, administrative and
platform enhancements. For more details please visit below URL.
You can also download no-charge Quick Start Edition of IBM
Infosphere BigInsight.
In this blog we’ll see steps involved in BigInsights
installation and configuration on RHEL. There are three major parts to it.
1)
Meet the pre-requisites (Hardware &
Software)
2)
Complete pre-installation activities
3)
Install BigInsights 3.0
Meet the pre-requisites (Hardware & Software)
Let’s start with step -1. You can go thru standard supported
environment specification on IBM site (http://www-01.ibm.com/support/docview.wss?uid=swg27027565).
Here I am going to install single-node BigInsights 3.0 on RHEL 6.4 system with
the specification shown in below screenshot.
We need to verify or install the Expect, Numactl,
and Ksh Linux packages. One way to get these libraries is to download
them independently from various Linux websites and install them. The other and
probably the better way is to use your OS (RHEL 6.4 in this case) disk or .ISO
image for the process. I am going to use the second option here. First I copied
“RHEL6.4-20130130.0-Server-x86_64-DVD1.iso” file in /data folder (newly
created) then mounted it as /media and update repository.
mount -oloop
RHEL6.4-20130130.0-Server-x86_64-DVD1.iso /media
vi
/etc/yum.repos.d/server.repo
rpm --import /media/*GPG*
yum clean all
Next step is to verify that the Expect, Numactl and Ksh
Linux packages are installed.
rpm -qa | grep expect
rpm -qa | grep numactl
rpm -qa | grep ksh
If the packages are not installed, then run the following
command to install them.
yum install expect
yum install numactl
yum install ksh
Now we are ready for step-2.
Complete pre-installation activities
In addition to product prerequisites, there are tasks common
to all InfoSphere BigInsights installation and upgrade paths. You must complete
these common tasks before you start an installation or upgrade.
Task – 1) Ensure that adequate disk space exists for these directories
- / (10GB), /tmp (5GB), /opt (15GB), /var (5GB) & /home (5GB).
df –h
Task – 2) Check that all devices have a Universally Unique
Identifier (UUID) and that the devices are mapped to the mount point
sudo blkid
vi /etc/fstab
Before
you edit /etc/fstab,
save a copy of the original file.
Task – 3) Create the biadmin user and group.
// Add the biadmin group.
groupadd
-g 123 biadmin
// Add the biadmin user to the biadmin group.
useradd -g biadmin -u 123 biadmin
//Set the password for the biadmin user.
passwd biadmin
//add the biadmin user to the sudoers group.
sudo visudo -f /etc/sudoers
Find out and add ‘#’ to comment below line if its not there
# Defaults requiretty
Also add these lines just below “# %wheel ALL=(ALL) NOPASSWD: ALL” line
biadmin
ALL=(ALL) NOPASSWD:ALL
root
ALL=(ALL) NOPASSWD:ALL
Open the /etc/security/limits.d/90-nproc.conf
file and add below lines.
@biadmin soft nofile 65536
@biadmin soft nproc 65536
@root soft nofile 65536
@root soft nproc unlimited
Open the /etc/security/limits.conf
file and add below lines.
Task – 4) Configure your network.
Edit the /etc/hosts
to include the IP address, fully qualified domain name. The format is IP_address
domain_name short_name. For example,
127.0.0.1
localhost.localdomain localhost
172.21.6.151
bda.iicbang.ibm.com bda
Edit the /etc/resolv.conf
to include the nameservers
domain
iicbang.ibm.com
search
iicbang.ibm.com
nameserver
172.21.4.40
Save your changes and then restart your network.
service
network restart
We need to configure passwordless SSH for the root and
biadmin.
su
root
ssh-keygen
-t rsa (When asked select the default file storage location and leave
the password blank.)
ssh-copy-id
-i ~/.ssh/id_rsa.pub root@bda.iicbang.ibm.com
Ensure that you can log in to the remote server without a
password.
ssh
root@bda.iicbang.ibm.com
exit
Repeat this SSH setting process for biadmin user also.
Run the following commands in succession to disable the
firewall.
service
iptables save
service
iptables stop
chkconfig
iptables off
Now disable IPv6 –
echo
“install ipv6 /bin/true” >> /etc/modprobe.d/disable-ipv6.conf
Edit the /etc/sysconfig/network
file and append the following lines.
NETWORKING=yes
NETWORKING_IPV6=no
Edit /etc/sysconfig/network-scripts/ifcfg-eth0
(assuming eth0 is used for networking) and add these lines –
IPV6INIT=no
Append following lines at the end of /etc/sysctl.conf file.
net.ipv6.conf.all.disable_ipv6
= 1
kernel.pid_max
= 4194303
net.ipv4.ip_local_port_range = 1024 64000
Restart your machine.
reboot
Verify that IPv6 is disabled.
ifconfig
IPv6 is disabled if all lines containing inet6 are not listed
in the output.
Task – 5) Synchronize the clocks of all servers using
Network Time Protocol (NTP) source.
Add below line in /etc/ntp.conf
server
172.21.4.40 iburst
Update the NTPD service with the time servers
that you specified.
chkconfig
--add ntpd
Start the NTPD service.
service
ntpd start
Verify that the
clocks are synchronized with a time server.
ntpstat
Step – 6) Run the pre-installation checker utility to verify
that your Linux environment readiness
I have copied BigInsights software copy in /data folder. Let’s unzip it.
tar
-xvf IS_BigInsights_EE_30_LNX64.tar.gz
cd IS_BigInsights_EE_30_LNX64/installer/hdm/bin
ls bi-prechecker.sh
We must run and pass all bi-prechecker.sh
tests before start BigInsights installation. Before that let’s create a file containing
your host name.
Echo
“bda.iicbang.ibm.com” > hostlist.txt
./bi-prechecker.sh –m ENTERPRISE –f hostlist.txt –u biadmin
If all the checks are [
OK ] then we are ready for next step. If there are [FAILED]
entries then go thru the log file created by utility in the same folder and
correct it.
Install BigInsights 3.0
Let’s start installation steps which are pretty easy if
previous steps are completed successfully.
Navigate to the directory where you extracted the biginsights
cd /data/IS_BigInsights_EE_30_LNX64/
Run the start.sh script.
./start.sh
The script starts WebSphere Application Server Community
Edition on port 8300. The script provides you with a URL to the installation wizard.
In my case I received -
http://172.21.6.151:8300/Install/
Open it in the browser. On the License Agreement panel,
accept the license agreement and then click Next.
On the Installation Type panel, select Cluster
installation, select the check box to Create a response file and save
your selections without completing an installation, and then click next.
On the File System panel, enter a name for your cluster
(BICluster is default), select Install Hadoop Distributed File
System (HDFS), enter the mount
point where you want to install HDFS, and then click Next. You
can choose other file system also.
On the 'Nodes' panel, click your node to use for HDFS. I can
see bda.ibm.com listed here.
Next, on 'Components 1' screen, pass on ‘catalog’ and ‘bigsql’
password whatever you desire to keep.
Click Next on the remaining panels until you reach
the Summary panel. On the Summary panel, click Create response file. The
installation program displays the location where your response file is saved.
Take note of this location so that you can easily locate your response file
after you install HDFS and are ready to install InfoSphere BigInsights.
Make sure you can see all the services running on your node on
‘results’ panel.
Next it’ll take you to BigInsights Console screen. That
shows your installation is successfully completed. You can browse information from Welcome tab and
decide your next action.
Now if you want to add more nodes in the cluster, prepare
them and add from Cluster Status tab.
To stop all the services, run below command -
cd /opt/ibm/biginsights/bin/
./stop-all.sh
Similarly there is ./start-all.sh to start all the services.
We also need to install “IBM InfoSphere
BigInsights Eclipse tools” for developing and deploying applications to the
BigInsights server and writing programs using Java MapReduce, JAQL, Pig, Hive
and BigSQL. First of all download Eclipse 4.3 + from www.eclipse.org. Then, add the http://<server>:<port>/updatesite/
URL to your Eclipse Software Updater (Help Menu -> Install) as shown below.
Select the location and all entries under the IBM InfoSphere BigInsights
category. Then simply follow the steps to install the InfoSphere BigInsights
plugins.
References:
Planning to install InfoSphere BigInsights 3.0
http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.install.doc/doc/c0057867.html?cp=SSPT3X
Preparing to install InfoSphere BigInsights 3.0
http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.install.doc/doc/bi_install_prep_overview.html
Installing
Infosphere BigInsights 3.0
BigInsight 3.0 Tutorials