Wednesday, 19 March 2014

Stream Computing for Real-time Analytics: Overview, Integration and Use Cases

Stream computing delivers real-time analytic processing on constantly changing data in motion. It enables descriptive and predictive analytics to support real time decisions. Stream computing allows you to capture and analyze all data - all the time, just in time. Relational databases and warehouses find information stored on disk. Streams analyses data before you store it. Key points here are -


1) Stream is the right capability when the primary big data challenge is analyze data that is in motion (Velocity) – because the business imperative requires a real-time response/action based on analyzing the data or the data is very large and want to more cost-effectively filter and remove data before moving into your data warehouse or Hadoop system.  It can handle continuous or bursty streams of data – millions of events per second with microsecond latency.
2) Streams can process any type of data (Variety) – audio, video, network logs, sensors, social media such as Twitter, in addition to structured data.
3) And, Streams is designed to scale to process any size of data from Terabytes to Zetabytes per day

Stream computing changes where, when and how much data you can analyze. Store less, analyze more, and make better decisions, faster with stream computing. The benefits of streaming analytics are immediately obvious.  Dramatic cost savings by analyzing data and only storing what is necessary.  The ability to detect and make real-time decisions, results in customer retention to detect fraud to cross-selling a product. 

IBM InfoSphere Streams for Stream Computing

IBM InfoSphere Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from real-time sources. InfoSphere Streams is designed to handle very high data throughput rates, up to millions of events per second. A market leader in providing sophisticated analytics for IoT, IBM received the 2013 Ventana Research award for Operational Intelligence in the IT Innovation category for InfoSphere Streams.

Core highlights are -

  • Perform advanced real-time analytics on data in motion
  • Rapidly ingest, correlate and continuously analyze a massive volume and variety of structured and unstructured streaming data as it arrives from thousands of sources
  • Make real-time predictions and discoveries as data arrives
  • Visualize data easily with drag-and-drop development tools
  • Detect and respond to critical events immediately
  • Learn and update models for future analysis and trend prediction with cognitive computing

InfoSphere Streams helps you:
  • Analyze data in motion—provides sub-millisecond response times, allowing you to view information and events as they unfold. Tools facilitate sophisticated analytics, such as geospatial, voice, image and text, and also update models on the fly.
  • Simplify development of streaming applications—uses an Eclipse-based integrated development environment (IDE). Developers are able to easily and rapidly build applications and connect to new data sources. Drag-and-drop editors, wizards, visualization tools, and runtime monitoring and debuggers are available.
  • Extend the value of existing systems—integrates with your applications, and supports both structured and unstructured data sources. The supporting infrastructure adapt to rapidly changing data formats, types and messaging protocols. It also read from and writes to a vast number of data sources. A massively parallel architecture is designed to deliver unlimited compute potential.
IBM Infosphere Streams capabilities are designed to work together and with existing bigdata & analytics applications such as BI and predictive analytics.  Here’s an example scenario:

1)     Historic data is stored in the DB/warehouse (DB2, Infosphere Warehouse, Informix, Oracle, solidDb, MySQL, SQLServer, Netezza etc.) where interesting patterns are detected using database toolkit operators, such as the pattern of credit card transactions that would indicate possible fraud. Support for XML allows developers to fuse a broader range of traditional and untraditional data.

2)     IBM SPSS leverages IBM SPSS Modeler to develop and build predictive models, and then deploy them using the SPSS Scoring Operator. The PMML models are then imported into InfoSphere Streams Studio to generate Streams programs that are executed to score the incoming records in real time without suspending InfoSphere Streams applications.

3)     Additional data sources such as RFID tags, blogs, or other information might be used to improve the confidence levels of the scoring algorithms.

4)     These measures can be sent to Dashboards like IBM Cognos Real Time Monitoring or business process management (BPM) systems to trigger business processes to take immediate action as required.

5)     IBM InfoSphere BigInsights lets you store streaming data in an enterprise-class Hadoop environment for additional analysis or historic retention. InfoSphere Streams and InfoSphere BigInsights use the same advanced text analytics capabilities to simplify natural language processing applications for both data in motion and data at rest. In addition, InfoSphere BigInsights can be used to augment streaming sources with contextual information, and users can visualize InfoSphere Streams data in the InfoSphere BigInsights console.

6)     Streams real-time analytics can be integrated with ETL solutions like IBM DataStage helps get more timely results and offload some analytics load from the warehouse. IBM InfoSphere DataStage helps users perform deep analysis and gain additional insight using contextual and source data from other parts of the infrastructure.

7)     Messaging queues allow InfoSphere Streams to receive data from or send data to IBM WebSphere MQ, IBM MessageSight and Java Messaging System (JMS) offerings.

8)  IBM InfoSphere Data Explorer enables users to visualize InfoSphere Streams data in the InfoSphere Data Explorer CXO dashboard and add streaming data to the InfoSphere Data Explorer index. 
Stream computing use cases
When companies can analyze ALL of their available data, rather than a subset, they gain a powerful advantage over their competition. Many customers are seeing tangible ROI using IBM Stream solutions to address their big data challenges:
  • Healthcare: 20% decrease in patient mortality by analyzing streaming patient data
  • Telco: 92% decrease in processing time by analyzing networking and call data
  • Utilities: 99% improved accuracy in placing power generation resources by analyzing 2.8 petabytes of untapped data
Below are few cross-industry scenarios best suitable for stream computing –

1)     Know Everything about your Customers
·         Social media customer sentiment analysis
·         Promotion optimization
·         Segmentation
·         Customer profitability
·         Click-stream analysis
·         CDR processing
·         Multi-channel interaction analysis
·         Loyalty program analytics
·         Churn prediction

2)     Innovate New Products at Speed and Scale
·         Social Media - Product/brand Sentiment analysis
·         Brand strategy
·         Market analysis
·         RFID tracking & analysis
·         Transaction analysis to create insight-based product/service offerings

3)     Instant Awareness of Risk and Fraud - Lower risk, detect fraud and monitor cyber security in real time. Augment and enhance cyber security and intelligence analysis platforms with big data technologies to process and analyze new types (e.g. social media, emails, sensors) and sources of under-leveraged data to significantly improve intelligence, security and law enforcement insight.
·         Multimodal surveillance
·         Cyber security
·         Fraud modeling & detection
·         Risk modeling & management
·         Regulatory reporting

4)     Exploit Instrumented Assets
·         Network analytics
·         Asset management and predictive issue resolution
·         Website analytics
·         IT log analysis

5)     Run Zero Latency Operations
·         Smart Grid/meter management
·         Distribution load forecasting
·         Sales reporting
·         Inventory & merchandising optimization
·         Options trading
·         ICU patient monitoring
·         Disease surveillance
·         Transportation network optimization
·         Store performance
·         Environmental analysis
·         Experimental research

Here’s few usecases in industries to get an idea about the breadth of possibilities that stream technology along with other bigdata products can offer. To explore more details, click on the industry title below.


  • Data warehouse optimization
  • Predictive asset optimization
  • Connected vehicle
  • Actionable customer insight


  • Optimize offers and cross sell
  • Contact center efficiency and problem resolution
  • Payment fraud detection and investigation
  • Counterparty credit risk management

Consumer Products

  • Optimized promotions effectiveness
  • Micro-market campaign management
  • Real-time demand forecast

Energy and Utilities

  • Distribution load forecasting and scheduling
  • Create targeted customer offerings
  • Condition-based maintenance
  • Enable customer energy management
  • Smart meter analytics


Geospatial analysis requires complex mathematics such as set theory and geospatial geometry. It is used for location intelligence and location-based services for security and surveillance, geographic information systems, traffic patterns and more. The city of Dublin, Ireland, uses InfoSphere Streams to analyze 50 bus locations per second for its fleet of roughly 1,000 buses. Check it out -
  • Threat prediction and prevention
  • Social program fraud, waste and errors
  • Tax compliance - fraud and abuse
  • Crime prediction and prevention


  • Measure and act on population health
  • Engage consumers in their healthcare
  • Health monitoring and intervention


Knowing the order of events can have profound impacts, for example in predicting the path of a natural disaster or picking the next best stock trade. InfoSphere Streams helps insurance companies plan for natural disasters and enables real-time public alerts. It also performs real-time analysis of sensor data collected from the Hudson River, one of the most instrumented bodies of water in the world. Check this out -
  • Claims fraud detection
  • Next best action and customer retention
  • Catastrophe risk modeling
  • Usage-based insurance
  • Portfolio management
  • Producer optimization

Oil & Gas

  • Advanced condition monitoring
  • Drilling surveillance & optimization
  • Production surveillance & optimization


  • Merchandise optimization
  • Actionable customer insight


Telecommunications service providers continue to experience a huge growth in smartphone and mobile device use. Growing text and data usage creates a deluge of context- and time-sensitive data. InfoSphere Streams enables telecommunications providers to analyze billions of call data records per day to detect fraud, ensure high asset utilization and create accurate customer profiles for heightened customer service and retention. Using InfoSphere Streams, Sprint reduced storage costs by 90 percent. Check this out -
  • Pro-active call center
  • Smarter campaigns
  • Network analytics
  • Location-based services

Travel & Transportation

  • Customer analytics and loyalty marketing
  • Capacity & pricing optimization
  • Predictive maintenance optimization

Wednesday, 12 March 2014

Predictive Analytics with IBM SPSS: Basic Q&As

What is Predictive Analytics?

Predictive Analytics is the transformational technology that enables more proactive decision making, driving new forms of competitive advantage by analyzing patterns found in historical and current transaction data as well as attitudinal survey data to predict potential future outcomes. This helps organizations to become more proactive in cutting cost, reducing risk and increasing profitability, optimizing their business and driving new forms of competitive advantage. Below figure shows how decision making is changed over the period.

“Predictive analytics helps connect data to effective action by drawing reliable conclusions about current conditions and future events. It enables organizations to make predictions and then proactively act upon that insight to drive better business outcomes and achieve measurable competitive advantage.” - Gareth Herschel, Research Director, Gartner Group

New approaches are being employed in order to take advantage of predictive analytics capabilities. Business leaders know that to meet their goals for profitability, revenue, cost reduction, and risk management, especially in the current economy; they cannot continue to operate the way they have in the past. Today’s marketplace involves an exponential increase in the number and source of customer interactions; it is now a high-volume, multi-channel game.

Through better management and use of information, business leaders can remove the blind spots that hinder informed decisions, and also achieve the next generation of efficiencies by providing precise, contextual analytics and insight at the point where these items can make a direct impact on business (point of impact). Doing so can enable micro-optimization, improving insight into patterns of customers, processes, and businesses, and deliver better real-time decisions and actions in every area of the organization.

This micro-optimization is made possible by establishing well-constructed processes and empowering individuals throughout the organization with pervasive, predictive real-time analytics. This approach can help shift from a sense-and-respond focus to a forward-looking predict-and-act focus. This approach also moves analysis from a back-office activity limited to a handful of experts to an approach that can empower everyone in the organization at the point of impact and in the context of the current situation. The result is rapid, informed, and confident decisions and actions throughout the organization, based on consistent and trusted information.

Why predictive analytics?

Eric Segel identified it very beautifully in “Seven Reasons You Need Predictive Analytics today”. Just summarizing his 7 reasons below —
1. Compete – Secure the Most Powerful and Unique Competitive Stronghold
2. Grow – Increase Sales and Retain Customers Competitively
3. Enforce – Maintain Business Integrity by Managing Fraud
4. Improve – Advance Your Core Business Capacity Competitively
5. Satisfy – Meet Today's Escalating Consumer Expectations
6. Learn – Employ Today's Most Advanced Analytics
7. Act – Render Business Intelligence and Analytics Truly Actionable

The power of predictive analytics in driving optimal outcomes and profitable revenue growth is clearly demonstrated by organizations that deploy predictive solutions. An independent financial impact study by IDC found that the median return on investment (ROI) for the projects that incorporated predictive technologies was 145%, compared with a median ROI of 89% for those projects that did not. Source: IDC Report: Predictive Analytics Yield Significant ROI - SPSS Inc. available at

An independent assessment of SPSS customers found that 94% achieved a positive ROI with an average payback period of 10.7 months. Returns were achieved through reduced costs, increased productivity, increased employee and customer satisfaction, and greater visibility. Flexibility, performance, and price were all key factors in purchase decisions. Source: Nucleus Research Report: The Real ROI of SPSS - SPSS Inc., available at

IBM offers strong capabilities in information management, reporting and analysis, and with the addition of SPSS, now can offer users predictive power that leverages both structured and unstructured data. This provides IBM SPSS users a distinct advantage as advanced analytics becomes a mainstream table stake in today’s hyper-competitive marketplace. Source: Nucleus Research Report: IBM and SPSS: Analytics for Everyone, available at

Where predictive analytics can help in private and public sectors?

Most commercial organizations share similar goals in private sector -
Typical application areas are as follows:
  • Attracting the best, most profitable customers through well-targeted campaigns.
  • Increasing revenues through cross- and up-sell to new and existing customers.
  • Reducing defection of high-quality customers and, conversely, identifying those who are costly and should be allowed to go through attrition.
  • Minimizing the effect of fraudulent activity by focusing the work of investigators appropriately.
  • Increasing customer satisfaction through faster response and processing of legitimate claims.
  • Building customer loyalty through effective and reliable inventory management.
  • Reducing operating costs by predicting maintenance needs proactively.

Typical public sector application areas are as follows:

  • Government agencies manage functions as diverse as tax audit selections, military force recruitment, and proactive policing and public safety.
  • Healthcare organizations seek to proactively manage their resources and fine-tune their practices to provide better patient care.
  • Colleges and universities manage the entire student life cycle more efficiently, recruiting the right mix of students, offering students a selection of programs and assistance to keep them enrolled, and managing alumni development programs with greater success.

Predictive analytics helps your organization predict with confidence what will happen next so that you can make smarter decisions and improve business outcomes. IBM offers easy-to-use predictive analytics products and solutions that meet the specific needs of different users and skill levels from beginners to experienced analysts.

With IBM SPSS predictive analytics software, you can:
  • Transform data into predictive insights to guide front-line decisions and interactions.
  • Predict what customers want and will do next to increase profitability and retention.
  • Maximize the productivity of your people, processes and assets.
  • Detect and prevent threats and fraud before they affect your organization.
  • Measure the social media impact of your products, services and marketing campaigns.
  • Perform statistical analysis including regression analysis, cluster analysis and correlation analysis.

The SPSS portfolio is designed to serve the three main phases of the analytical process, capture, predict, and act.

         Capture information
        Ability to capture attributes, interactions, behaviors, and attitudes for customers, employees or constituents
        Data collection capabilities for market  research and feedback management

         Predict behavior and preferences
        Top down statistical analysis, useful for all  data types and frequently used for survey  data, delivers deeper insight
        Data Mining enables predictive modeling
        Text Analytics extracts and categorizes concepts from unstructured text, making qualitative data more quantifiable and delivering new insights

         Act on results
        Unique technology and methodology streamlines deployment of analytical results throughout the enterprise to enable better decision making
        Provides reliable automation of analytical processes for better orchestration & discipline
        Enables collaboration to deliver more effective analytical results

Can you please explain SPSS product portfolio in detail?

Here we’ll discuss the full suite of IBM SPSS Predictive Analytics software:

  • IBM SPSS Data Collection for capturing attitudes, preferences, and feedback [Capture]
  • IBM SPSS Statistics Suite for research and analysis [Predict]
  • IBM SPSS Modeler for predicting future behavior [Predict]
  • IBM SPSS Decision Management for optimizing operational decisions [Act]
  • IBM SPSS Collaboration and Deployment Services for enterprise-wide management of analytical assets and results [Act]

IBM SPSS Data Collection is a complete suite of products for survey, market, or business researchers. It enables you to quickly and efficiently acquire clean data from the widest range of sources by using an expansive array of methods, and actively bring data about people’s attitudes and preferences into your analytical decision-making. It is the best way to capture a complete perspective about your important constituents, making research efforts more accurate and more efficient.

This is increasingly important as business today demands faster, more representative and more cost-effective surveys for deeper insight into thoughts and opinions of customers. Both commercial organizations and market research firms rely on data collection’s advanced technologies.

IBM SPSS Modeler is a predictive analytics platform that is designed to bring predictive intelligence to decisions made by individuals, groups, systems and the enterprise. It provides a range of advanced algorithms and techniques, including text analytics, entity analytics, decision management and optimization, to help you select the actions that result in better outcomes. Available in several editions, SPSS Modeler can scale from desktop deployments to integration within operational systems. Key benefits of using IBM SPSS Modeler are as follows:
  • Access, prepare, and integrate structured data and text, and web and survey data.
  • Support the entire data mining process with a broad set of tools that are based on Cross Industry Standard Process for Data Mining (CRISP-DM) methodology.
  • Identify and extract sentiments from text in more than 30 languages and use this insight to build more accurate predictive models.
  • Deploy textual insights so your entire organization benefits from a comprehensive, 360-degree view of the people you serve.

IBM SPSS Modeler help analysts build accurate predictive models quickly and intuitively, without programming. Modeling, also known as data mining, helps organizations take seemingly unrelated data and find hidden relationships in data. Using these models, an organization can look into the future and understand what will happen in any current or future case based on what has happened before. From predicting which offer will have the most impact, to understanding and preventing churn, the modeling family helps people consistently make decisions, maximizing the results. This process repeatability makes modeling a powerful tool for embedding best practices inside the systems and processes of a business. In addition to predicting outcomes, models can explain what factors influence them so users can take advantage of opportunities and mitigate risks.  Please visit for more details.

IBM SPSS Statistics uses sophisticated mathematics to help researchers validate assumptions and test hypotheses. From testing opinions about the latest product feature ideas or the viability of a political candidate, to the efficacy of a new drug treatment or prospective supply-chain allocation, statistics enables an organization to look at the beliefs of an organization and validate whether those views are based in fact. Statistics can give you confidence in the results and final outcomes of decisions you make.

SPSS Statistics provides essential statistical analysis tools for every step of the analytical process.  It is used by commercial, government, and academic organizations to solve business and research problems. IBM SPSS Statistics is one of the most accessible statistics tool in the market, enabling organizations to apply mathematical discipline to their decision-making.

  • A comprehensive range of statistical procedures for conducting accurate analysis.
  • Built-in techniques to prepare data for analysis quickly and easily.
  • Sophisticated reporting functionality for highly effective chart creation.
  • Powerful visualization capabilities that clearly show the significance of your findings.
  • Support for all types of data including very large data sets.

The deployment family of SPSS includes the following products:
  • Decision Management
  • Collaboration and Deployment Services

The deployment family can help you maximize the impact of analytics in your operation by embedding the results of your analytic efforts in the hearts of enterprise systems. Deployment is about making analytics practical for the people who handle real-world challenges every day. From helping the call center agent by alerting them to the risk of a churning customer, to recommending corrective action for a failing student, the deployment family ensures that the processes of your organization operate at peak efficiency and that objectives are met.

SPSS Decision Management harnesses the power of a variety of technologies that IBM offers, such as data mining, business intelligence, rules, event processing, data management, and then blends them all together. It is a great mashup. No longer is the business solely reliant on back-office data analysts, data miners, or expert statisticians. It facilitates the ability to create web-based, business user applications that are designed for a specific business problem. These applications help business people participate in the use of predictive analytics to meet their challenges.

Decision Management is a highly effective method for optimizing and automating your business decisions. It also helps organizations make the best decisions in real time. Supported by predictive analytics, it offers organizations the ability to move beyond reactive decisions to anticipate which actions are most likely to create successful outcomes in the future. All this technology is wrapped in a graphical user interface (GUI), using language that is familiar and meaningful to the business user.

Decision Management features:
  • Predictive tools and mathematical techniques to optimize transactional decisions.
  • Combined and integrated predictive models, rules and decision logic to deliver recommended actions.
  • “What if…” simulations to accommodate changing conditions based on the volume, variety and velocity of incoming data.
  • A flexible and intuitive user interface to support the development and implementation of targeted configurations and content.
  • Seamless integration with IBM Business Analytics and other software solutions.
SPSS Collaboration and Deployment Services (CD&S) lets you manage analytical assets, automate processes and efficiently share results widely and securely. Because when the people developing and the people using analytics can collaborate, your analytic efficiency increases.
SPSS CD&S capabilities can be described under these 3 headings:
Collaboration refers to the ability to share and reuse analytical assets efficiently, and is the key to developing and implementing analytics across an enterprise.
  • Analysts place files in C&DS repository that are made available to other analysts or business users with appropriate permissions.
  • The repository offers a search facility to assist users in finding assets, and backup and restore mechanism to protect the business from losing these crucial assets.
  • Logging features provide the ability to track file and system modifications.
Automate so you can construct flexible analytical processes that can be can be deployed throughout your operations – ensuring consistent results.
  • C&DS brings greater consistency to results by giving analysts the power to construct flexible, repeatable analytical processes, these analytical processes can be operationalized.
  • C&DS enables management to efficiently govern the analytical environments in which automated processes take place.
  • Analytical processes can be defined and executed in job. A job is a container for a set of steps. Each step has parameters associated with it. Before you execute a step, you must embed it within a job. Individual files stored in the repository can be included in processing jobs as job steps. Job steps can be executed sequentially or conditionally. The execution results can be stored in the repository, or on a file system. More important, the jobs themselves can be triggered according to defined time-based or message-based schedules.
Deploy by embedding analytic results in front-line business processes while integrating with your existing infrastructure with standard programming tools and interfaces.
  • C&DS supports application server clustering to optimize the performance of application.
  • Single sign-on reduces the need to manually provide credentials. Moreover, the system can be configured to be compliant with Federal Information Processing Standard for encryption (AES algorithm).
  • The scoring service of C&DS allows analytical results from deployed models to be delivered in real time when interacting with a customer. An analytical model configured for scoring can combine data collected from a current customer interaction with historical data to produce a score.
  • The deployment facilities of C&DS are designed to easily integrate with your enterprise infrastructure and other SPSS products, and built with enterprise readiness in mind.


1)     Redpaper - IBM SPSS predictive analytics: Optimizing decisions at the point of impact [‎]
2)     Seven Reasons You Need Predictive Analytics Today []
3)     Predicting the future, Part 1: What is predictive analytics? []
4)     Predicting the future, Part 2: Predictive modeling techniques []
5)     Predicting the future, Part 3: Create a predictive solution []
6)     Predicting the future, Part 4: Put a predictive solution to work []