Wednesday, 27 August 2014

Infosphere BigInsight 3.0 Installation and Configuration on Redhat Linux



InfoSphere BigInsights is IBM’s bigdata offering to help organizations discover and analyze business insights hidden in large volumes of a diverse range of data – data that’s often ignored or discarded because it’s too huge, impractical or difficult to process using traditional means. Examples include log records, click streams, social media data, news feeds, emails, electronic sensor output, and even transactional data.

BigInsights brings the power of open source Apache Hadoop project to enterprise.  In addition, there are a number of IBM value-add components that make up this Enterprise Analytics platform. These value-adds are in the areas of analysis and discovery, security, enterprise software integration, administrative and platform enhancements. For more details please visit below URL.


You can also download no-charge Quick Start Edition of IBM Infosphere BigInsight.

In this blog we’ll see steps involved in BigInsights installation and configuration on RHEL. There are three major parts to it.

1)      Meet the pre-requisites (Hardware & Software)
2)      Complete pre-installation activities
3)      Install BigInsights 3.0

Meet the pre-requisites (Hardware & Software)

Let’s start with step -1. You can go thru standard supported environment specification on IBM site (http://www-01.ibm.com/support/docview.wss?uid=swg27027565). Here I am going to install single-node BigInsights 3.0 on RHEL 6.4 system with the specification shown in below screenshot.


We need to verify or install the Expect, Numactl, and Ksh Linux packages. One way to get these libraries is to download them independently from various Linux websites and install them. The other and probably the better way is to use your OS (RHEL 6.4 in this case) disk or .ISO image for the process. I am going to use the second option here. First I copied “RHEL6.4-20130130.0-Server-x86_64-DVD1.iso” file in /data folder (newly created) then mounted it as /media and update repository.

mount -oloop RHEL6.4-20130130.0-Server-x86_64-DVD1.iso /media
vi /etc/yum.repos.d/server.repo
rpm --import /media/*GPG*
yum clean all


Next step is to verify that the Expect, Numactl and Ksh Linux packages are installed.

rpm -qa | grep expect
rpm -qa | grep numactl
rpm -qa | grep ksh

If the packages are not installed, then run the following command to install them.

yum install expect
yum install numactl
yum install ksh

Now we are ready for step-2.

Complete pre-installation activities

In addition to product prerequisites, there are tasks common to all InfoSphere BigInsights installation and upgrade paths. You must complete these common tasks before you start an installation or upgrade.

Task – 1) Ensure that adequate disk space exists for these directories - / (10GB), /tmp (5GB), /opt (15GB), /var (5GB) & /home (5GB).

df –h

Task – 2) Check that all devices have a Universally Unique Identifier (UUID) and that the devices are mapped to the mount point

sudo blkid

vi /etc/fstab

Before you edit /etc/fstab, save a copy of the original file.
            

 Task – 3) Create the biadmin user and group.

// Add the biadmin group.
groupadd -g 123 biadmin  

// Add the biadmin user to the biadmin group.
useradd -g biadmin -u 123 biadmin 

//Set the password for the biadmin user.
            passwd biadmin

//add the biadmin user to the sudoers group.
     sudo visudo -f /etc/sudoers

Find out and add ‘#’ to comment below line if its not there
            # Defaults requiretty

Also add these lines just below “# %wheel ALL=(ALL) NOPASSWD: ALL” line
biadmin ALL=(ALL) NOPASSWD:ALL
root ALL=(ALL) NOPASSWD:ALL

Open the /etc/security/limits.d/90-nproc.conf file and add below lines.

@biadmin  soft nofile    65536
@biadmin  soft nproc     65536
@root     soft nofile    65536
@root     soft nproc     unlimited

Open the /etc/security/limits.conf file and add below lines.

Task – 4) Configure your network.

Edit the /etc/hosts to include the IP address, fully qualified domain name. The format is IP_address domain_name short_name. For example,

127.0.0.1 localhost.localdomain localhost
172.21.6.151 bda.iicbang.ibm.com bda

Edit the /etc/resolv.conf to include the nameservers

domain iicbang.ibm.com
search iicbang.ibm.com
nameserver 172.21.4.40

Save your changes and then restart your network.
service network restart

We need to configure passwordless SSH for the root and biadmin.
su root
ssh-keygen -t rsa (When asked select the default file storage location and leave the password blank.)
ssh-copy-id -i ~/.ssh/id_rsa.pub root@bda.iicbang.ibm.com

Ensure that you can log in to the remote server without a password.
ssh root@bda.iicbang.ibm.com
exit

Repeat this SSH setting process for biadmin user also.

Run the following commands in succession to disable the firewall.
service iptables save
service iptables stop
chkconfig iptables off

Now disable IPv6 –

echo “install ipv6 /bin/true” >> /etc/modprobe.d/disable-ipv6.conf

Edit the /etc/sysconfig/network file and append the following lines.
NETWORKING=yes
NETWORKING_IPV6=no

Edit /etc/sysconfig/network-scripts/ifcfg-eth0 (assuming eth0 is used for networking) and add these lines –

IPV6INIT=no

Append following lines at the end of /etc/sysctl.conf file.
net.ipv6.conf.all.disable_ipv6 = 1
kernel.pid_max = 4194303
net.ipv4.ip_local_port_range = 1024   64000

Restart your machine.
reboot

Verify that IPv6 is disabled.
ifconfig
IPv6 is disabled if all lines containing inet6 are not listed in the output.
 
Task – 5) Synchronize the clocks of all servers using Network Time Protocol (NTP) source.

Add below line in /etc/ntp.conf
server 172.21.4.40 iburst

 Update the NTPD service with the time servers that you specified.
chkconfig --add ntpd

 Start the NTPD service.
service ntpd start

 Verify that the clocks are synchronized with a time server.
ntpstat


Step – 6) Run the pre-installation checker utility to verify that your Linux environment readiness

I have copied BigInsights software copy in /data folder. Let’s unzip it.

tar -xvf IS_BigInsights_EE_30_LNX64.tar.gz
            cd IS_BigInsights_EE_30_LNX64/installer/hdm/bin
     ls bi-prechecker.sh

We must run and pass all bi-prechecker.sh tests before start BigInsights installation. Before that let’s create a file containing your host name.

Echo “bda.iicbang.ibm.com” > hostlist.txt
            ./bi-prechecker.sh –m ENTERPRISE –f hostlist.txt –u biadmin

If all the checks are [ OK ] then we are ready for next step. If there are [FAILED] entries then go thru the log file created by utility in the same folder and correct it.

Install BigInsights 3.0

Let’s start installation steps which are pretty easy if previous steps are completed successfully.

Navigate to the directory where you extracted the biginsights
            cd /data/IS_BigInsights_EE_30_LNX64/
Run the start.sh script.
     ./start.sh

The script starts WebSphere Application Server Community Edition on port 8300. The script provides you with a URL to the installation wizard. In my case I received -

http://172.21.6.151:8300/Install/

Open it in the browser. On the License Agreement panel, accept the license agreement and then click Next.
 

On the Installation Type panel, select Cluster installation, select the check box to Create a response file and save your selections without completing an installation, and then click next.

On the File System panel, enter a name for your cluster (BICluster is default), select Install Hadoop  Distributed File System (HDFS), enter the mount point where you want to install HDFS, and then click Next. You can choose other file system also.
 

On the 'Secure Shell' panel, select the user (root in my case) that you want to install with, enter any required information, and then click Next.

On the 'Nodes' panel, click your node to use for HDFS. I can see bda.ibm.com listed here.

Next, on 'Components 1' screen, pass on ‘catalog’ and ‘bigsql’ password whatever you desire to keep.


 Click Next on the remaining panels until you reach the Summary panel. On the Summary panel, click Create response file. The installation program displays the location where your response file is saved. Take note of this location so that you can easily locate your response file after you install HDFS and are ready to install InfoSphere BigInsights.
 

Make sure you can see all the services running on your node on ‘results’ panel.
 

Next it’ll take you to BigInsights Console screen. That shows your installation is successfully completed.  You can browse information from Welcome tab and decide your next action.



Now if you want to add more nodes in the cluster, prepare them and add from Cluster Status tab.

To stop all the services, run below command -
            cd /opt/ibm/biginsights/bin/
            ./stop-all.sh
Similarly there is ./start-all.sh to start all the services.


We also need to install “IBM InfoSphere BigInsights Eclipse tools for developing and deploying applications to the BigInsights server and writing programs using Java MapReduce, JAQL, Pig, Hive and BigSQL. First of all download Eclipse 4.3 + from www.eclipse.org. Then, add the http://<server>:<port>/updatesite/ URL to your Eclipse Software Updater (Help Menu -> Install) as shown below. Select the location and all entries under the IBM InfoSphere BigInsights category. Then simply follow the steps to install the InfoSphere BigInsights plugins. 
 

References:

Planning to install InfoSphere BigInsights 3.0

http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.install.doc/doc/c0057867.html?cp=SSPT3X

Preparing to install InfoSphere BigInsights 3.0

http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.install.doc/doc/bi_install_prep_overview.html

Installing Infosphere BigInsights 3.0


BigInsight 3.0 Tutorials


3 comments:

  1. Thanks for taking the time to document this Vikas. Very helpful! This is the best set of instructions I've seen by far.

    ReplyDelete
  2. By using big data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyze it to transform your business decisions for the future. business analytics training

    ReplyDelete
  3. Thanks Mr Vikas to share your knowledge. This IBM Biginsights installation steps are easy to follow. Pls Share more tips. Thanks

    ReplyDelete