In this blog we’ll develop Cognos BI reports using BigInsights (Hadoop distribution) along
with warehouse data sources. “Data warehouse augmentation” is a Big Data use
case of huge importance to the traditional analytics industry (Visit http://www.ibm.com/developerworks/library/ba-augment-data-warehouse1/index.html
to know more). To explore and implement
a big data project, you can augment existing data warehouse environments by
introducing one or more use cases given below, as the business requires.
- Use case 1: Use Hadoop environment as the landing zone to pull in data from various sources, process it, and transfer the processed data to the existing data warehouse or other repositories. For details please visit - http://www.ibm.com/developerworks/library/ba-augment-data-warehouse2/index.html
- Use case 2: Use Hadoop environment to set up an active (query-able) archive for the data from the enterprise data warehouse. For details please visit - http://www.ibm.com/developerworks/library/ba-augment-data-warehouse3/index.html
- Use case 3: Use Hadoop environment to incrementally augment the existing data warehouse to gain better business insight from unstructured data that was previously unavailable for analysis due to limitations of the traditional data warehouse technology. For details please visit - http://www.ibm.com/developerworks/library/ba-augment-data-warehouse4/index.html
This blog is directly helpful in case-3 however use of BigSQL would be used effectively in all 3 cases. In my previous blogs, we discussed Cognos BI in detail so if you are new probably you can check details here - http://www.ibm.com/software/products/en/business-intelligence. Below I am giving a brief description of BigInsights and BigSQL before we start integration steps.
IBM InfoSphere BigInsights (http://www.ibm.com/software/data/infosphere/biginsights/) combines Apache Hadoop (including the MapReduce framework and the Hadoop Distributed File Systems) with unique, enterprise-ready technologies and capabilities from across IBM, including Big SQL, built-in analytics, visualization, BigSheets, and security. InfoSphere BigInsights is a single platform to manage all of the data. InfoSphere BigInsights offers many benefits:
- Provides flexible, enterprise-class support for processing large volumes of data by using streams and MapReduce
- Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner
- Applies advanced analytics to information in its native form to enable ad hoc analysis
- Integrates with enterprise software
Big SQL (http://www.ibm.com/developerworks/library/bd-bigsql/)
provides SQL access to data that is stored in InfoSphere BigInsight by using
JDBC, ODBC, and other connections. Big SQL supports large ad hoc queries
by using IBM SQL/PL support, SQL stored procedures, SQL functions, and IBM Data
Server drivers. These queries are low-latency queries that return information quickly
to reduce response time and provide improved access to data. Big SQL offers unmatched simplicity,
performance and security for SQL on Hadoop. It provides a single point of
access and view across all big data, exactly where it lives.
OK, so with this little background
we are ready to start with implementations tasks using BigSQL. In case you are
interested in Hive based work, please refer - http://www.ibm.com/developerworks/library/ba-cognosbi10-infospherebiginsights/index.html.
We’ll complete 3 tasks here –
1)
Setting up the environment with BigInsights 3.0,
DB2 10.5 Warehouse (BLU) and Cognos BI 10.2.1 FP 3
2)
Prepare data sources. Create tables and load
data in warehouse and Hadoop environment.
3)
Create Cognos data sources, meta-data model and
a sample report.
Task 1 - Setting up the
environment with BigInsights 3.0, DB2 10.5 (BLU) and Cognos BI 10.2.1 FP 3
In my case, all below software is
installed on Radhat Enterprise Linux 6.3. In your case they all can be on
different machines as well.
·
For Cognos BI 10.2.1 setup you can either
download free developer edition for Windows from IBM website (http://www.ibm.com/developerworks/downloads/im/cognosbi/)
or use the installation steps given in my previous blog (http://vmanoria.blogspot.in/2014/08/ibm-cognos-bi-installation.html)
if you have the software for Linux.
·
If you don’t have licensed version for DB2
10.5 please download and install DB2 10.5 express edition (http://www-01.ibm.com/software/data/db2/express-c/download.html).
Installation steps are shown here for Windows https://www.youtube.com/watch?v=2AtSEHC6iAQ
·
For BigInsights 3.0 setup you can either
download free QuickStart edition images from IBM website (http://www.ibm.com/developerworks/downloads/im/biginsightsquick/)
or use the installation steps given in my previous blog (http://vmanoria.blogspot.in/2014/08/infosphere-biginsight-30-installation.html)
if you have the software. If you are not using images then you also need to
follow below steps.
·
Copy BigSQL drivers in Cognos library folder and
restart Cognos BI services.
cp
/opt/ibm/biginsights/bigsql/bigsql1/jdbc/bigsql-jdbc-driver.jar /opt/ibm/cognos/c10_64/webapps/p2pd/WEB-INF/lib/
Task 2 - Prepare data
sources. Create tables and load data in warehouse and Hadoop environment.
To keep the things simple we are
going to work here with 3 tables – 1) Student 2) Student_Details and 3)
Student_Facts. First two tables are being created in DB2 environment. Third
table would be created BigInsights HDFS environment using BigSQL. After that we'll create Student_Details table in HDFS environment and load the data from DB2 DB using JDBC driver.
In DB2 BLU, lets create
table - 1) Student 2) Student_Details and load data from csv files. Below
commands are being run on RHEL shell.
[root@scekvm1 sample]# su db2inst1
[db2inst1@scekvm1 sample]$ ls
ER.jpg
Exam.csv Old Performance.csv QBank.csv
Student.csv Student_Details.csv StuFact.csv
[db2inst1@scekvm1 sample]$
db2 connect to gs_db
Database Connection Information
Database server = DB2/LINUXX8664 10.5.3
SQL authorization ID = DB2INST1
Local database alias = GS_DB
[db2inst1@scekvm1 sample]$
db2 -tvf db2ddl.sql
CREATE
TABLE DB2INST1.STUDENT ( STUDENT_ID INTEGER NOT NULL, STUDENT_NAME
VARCHAR (30)NOT NULL, YEAR_OF_ADMISSION INTEGER NOT NULL, SCHOOL VARCHAR (30)NOT
NULL, CLASS VARCHAR (10)NOT NULL, SECTION VARCHAR (3) NOT NULL, HOSTELER
VARCHAR (3) NOT NULL )
DB20000I The SQL command completed successfully.
CREATE
TABLE DB2INST1.STUDENT_DETAILS ( STUDENT_ID INTEGER NOT NULL, DOB DATE
NOT NULL, GENDER VARCHAR (2) NOT NULL, HOME_CITY VARCHAR (15) NOT NULL, HOME_STATE
VARCHAR (3) NOT NULL, ADMISSION_CATEGORY VARCHAR (15) NOT NULL, SOCIAL_CATEGORY
VARCHAR (15) NOT NULL, SCHOOL_CATEGORY VARCHAR (15) NOT NULL, NATIONALITY
VARCHAR (15) NOT NULL, RELIGION VARCHAR (15) NOT NULL )
DB20000I The SQL command completed successfully.
[db2inst1@scekvm1 sample]$
db2 import from Student.csv of del messages msg.txt insert into student
Number
of rows read = 1000
Number
of rows skipped = 0
Number
of rows inserted = 1000
Number
of rows updated = 0
Number
of rows rejected = 0
Number
of rows committed = 1000
[db2inst1@scekvm1 sample]$
db2 import from Student_Details.csv of del messages msg.txt insert into student_details
Number
of rows read = 1000
Number
of rows skipped = 0
Number
of rows inserted = 1000
Number
of rows updated = 0
Number
of rows rejected = 0
Number
of rows committed = 1000
SQL3107W At least one warning message was encountered
during LOAD processing.
Now in BigInsights, let us
create Hadoop tables for Student_Details & Student_Facts. After that we’ll
load data in Student_Details from DB2 and in Student_Facts from csv file. Before
we start please make sure BigInsights is running. If not then please start it
by running /opt/ibm/biginsights/bin/start-all.sh
A)
Using JSqsh. For details please refer - https://developer.ibm.com/hadoop/docs/tutorials/big-sql-hadoop-tutorial/big-sql-hadoop-lab-2-big-sql-command-line-interface/
B)
Using Eclipse environment. For details please visit - https://developer.ibm.com/hadoop/docs/tutorials/big-sql-hadoop-tutorial/big-sql-hadoop-lab-3-using-eclipse/
Here
we’ll use JSqsh. BigInsights supports a command-line
interface for Big SQL through the Java SQL Shell (JSqsh, pronounced
“jay-skwish”). JSqsh is an open source project for querying JDBC databases. You
may find it handy to become familiar with basic JSqsh capabilities,
particularly if you don’t expect to have access to an Eclipse environment at
all times for your work.
Below commands are being run on RHEL shell.
[root@scekvm1 sample]# su biadmin
[biadmin@scekvm1 sample]$ cd /opt/ibm/biginsights/jsqsh/bin/
[biadmin@scekvm1 bin]$ ls
jsqsh
jsqsh.bat
[biadmin@scekvm1 bin]$ ./jsqsh bigsql
Password:********
WARN
[State: ][Code: 0]: Statement
processing was successful.. SQLCODE=0, SQLSTATE= , DRIVER=3.67.33
JSqsh
Release 2.1.2, Copyright (C) 2007-2014, Scott C. Gray
Type
\help for available help topics. Using JLine.
[scekvm1.iicbang.ibm.com][biadmin]
1>
Just copy & paste below commands on JSqsh prompt to create tables -
CREATE HADOOP TABLE IF NOT EXISTS STUDENT_FACTS (
STUDENT_ID INTEGER NOT NULL,
ATTENDANCE INTEGER NOT NULL,
FEE_COLLECTED INTEGER NOT NULL,
FEE_BALANCE INTEGER NOT NULL,
MARKS INTEGER NOT NULL
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
;
CREATE HADOOP TABLE IF NOT EXISTS STUDENT_DETAILS (
STUDENT_ID INTEGER NOT NULL,
DOB DATE NOT NULL,
GENDER VARCHAR (2) NOT NULL,
HOME_CITY VARCHAR (15) NOT NULL,
HOME_STATE VARCHAR (3) NOT NULL,
ADMISSION_CATEGORY VARCHAR (15) NOT NULL,
SOCIAL_CATEGORY VARCHAR (15) NOT NULL,
SCHOOL_CATEGORY VARCHAR (15) NOT NULL,
NATIONALITY VARCHAR (15) NOT NULL,
RELIGION VARCHAR (15) NOT NULL
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
You'll get response like this -
Just copy & paste below commands on JSqsh prompt to create tables -
CREATE HADOOP TABLE IF NOT EXISTS STUDENT_FACTS (
STUDENT_ID INTEGER NOT NULL,
ATTENDANCE INTEGER NOT NULL,
FEE_COLLECTED INTEGER NOT NULL,
FEE_BALANCE INTEGER NOT NULL,
MARKS INTEGER NOT NULL
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
;
CREATE HADOOP TABLE IF NOT EXISTS STUDENT_DETAILS (
STUDENT_ID INTEGER NOT NULL,
DOB DATE NOT NULL,
GENDER VARCHAR (2) NOT NULL,
HOME_CITY VARCHAR (15) NOT NULL,
HOME_STATE VARCHAR (3) NOT NULL,
ADMISSION_CATEGORY VARCHAR (15) NOT NULL,
SOCIAL_CATEGORY VARCHAR (15) NOT NULL,
SCHOOL_CATEGORY VARCHAR (15) NOT NULL,
NATIONALITY VARCHAR (15) NOT NULL,
RELIGION VARCHAR (15) NOT NULL
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
You'll get response like this -
0
rows affected (total: 0.49s)
Now lets load the data in table STUDENT_FACTS from the csv file. Here's the command -
LOAD HADOOP USING FILE URL
'file:///images/vmanoria/engagements/sample/StuFact.csv'
WITH SOURCE PROPERTIES ('field.delimiter'=',')
INTO TABLE STUDENT_FACTS OVERWRITE;
On successful completion, you'll get response like this -
WARN [State: ][Code: 5108]: The LOAD HADOOP statement completed. Number of rows loaded into the Hadoop table: "1000". Total number of source records: "1000". If the source is a file, number of lines skipped: "0". Number of source records that were rejected: "0". Job identifier: "job_201409242156_0009".. SQLCODE=5108, SQLSTATE= , DRIVER=3.67.33
'file:///images/vmanoria/engagements/sample/StuFact.csv'
WITH SOURCE PROPERTIES ('field.delimiter'=',')
INTO TABLE STUDENT_FACTS OVERWRITE;
On successful completion, you'll get response like this -
WARN [State: ][Code: 5108]: The LOAD HADOOP statement completed. Number of rows loaded into the Hadoop table: "1000". Total number of source records: "1000". If the source is a file, number of lines skipped: "0". Number of source records that were rejected: "0". Job identifier: "job_201409242156_0009".. SQLCODE=5108, SQLSTATE= , DRIVER=3.67.33
0
rows affected (total: 21.26s)
Now lets load data in table STUDENT_DETAILS from DB2 database table we created earlier.
LOAD HADOOP USING JDBC CONNECTION URL
'jdbc:db2://scekvm1:50000/GS_DB'
WITH PARAMETERS (user = 'db2inst1',password='db2inst1')
FROM TABLE STUDENT_DETAILS SPLIT COLUMN STUDENT_ID
INTO TABLE STUDENT_DETAILS APPEND
WITH LOAD PROPERTIES ( 'num.map.tasks' = 1)
Now lets load data in table STUDENT_DETAILS from DB2 database table we created earlier.
LOAD HADOOP USING JDBC CONNECTION URL
'jdbc:db2://scekvm1:50000/GS_DB'
WITH PARAMETERS (user = 'db2inst1',password='db2inst1')
FROM TABLE STUDENT_DETAILS SPLIT COLUMN STUDENT_ID
INTO TABLE STUDENT_DETAILS APPEND
WITH LOAD PROPERTIES ( 'num.map.tasks' = 1)
For more info on BIGSQL LOAD command please visit -
http://www-01.ibm.com/support/knowledgecenter/api/content/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.commsql.doc/doc/biga_load_from.html
http://www-01.ibm.com/support/knowledgecenter/api/content/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.commsql.doc/doc/biga_load_from.html
Task-3) Create Cognos data sources, meta-data model
and a sample report.
Let’s quickly create two data
source connections from Cognos Administration interface. One with DB2 database
GS_DB in our case and another JDBC connection using “IBM Infosphere BigInsights
(Big SQL)” as shown below. Provide valid sign-on details and test the
connection.
Now let’s open Framework Manager
and pull the tables from respective sources. Create relationships between them
and set the query items properties correctly. Like in my case I changed usage
property of ‘Student_ID’ with ‘identifier’ which previously was ‘fact’ due to
its integer data type. Before you create and publish the package, just test if
the aggregate data is coming out correctly. Now you can create a package and
publish it on Cognos Connection.
Now you are ready to create your
report using Report Studio.
References:
Cognos Business Intelligence 10.2 reporting on InfoSphere
BigInsights (Using Hive)
Big data and data warehouse augmentation
Use big data technologies as a landing zone for source
data
Use big data technology for an active archive
Use big data technologies for initial data exploration
Whats the big deal about Big SQL?
Why aren't there any actors defined in these use case scenarios? When drawing use case scenarios it is a must to define actors as I know. This is a great post nothing to point out other than what I mentioned above
ReplyDeleteHi,Thanks a lot and really happy to see such a wonderful comment.
ReplyDeleteChennai Bigdata Training
Hiii
ReplyDeleteVery nice post.Thanku you for sharing this awesome post.
Keep sharing more blogs.
big data hadoop course
big data and hadoop training
online courses for big data and hadoop