Stream computing delivers real-time analytic
processing on constantly changing data in motion. It enables descriptive and
predictive analytics to support real time decisions. Stream computing allows
you to capture and analyze all data - all the time, just in time. Relational databases and warehouses find
information stored on disk. Streams analyses data before you store it. Key
points here are -
1) Stream is the right capability when the
primary big data challenge is analyze data that is in motion (Velocity) –
because the business imperative requires a real-time response/action based on
analyzing the data or the data is very large and want to more cost-effectively
filter and remove data before moving into your data warehouse or Hadoop
system. It can handle continuous or
bursty streams of data – millions of events per second with microsecond
latency.
2) Streams can process any type of data
(Variety) – audio, video, network logs, sensors, social media such as Twitter,
in addition to structured data.
3) And, Streams is designed to scale to
process any size of data from Terabytes to Zetabytes per day
Stream
computing changes where, when and how much data you can analyze. Store less,
analyze more, and make better decisions, faster with stream computing. The
benefits of streaming analytics are immediately obvious. Dramatic cost savings by analyzing data and
only storing what is necessary. The
ability to detect and make real-time decisions, results in customer retention
to detect fraud to cross-selling a product.
IBM
InfoSphere Streams for Stream Computing
IBM InfoSphere Streams is an
advanced analytic platform that allows user-developed applications to quickly
ingest, analyze and correlate information as it arrives from real-time sources.
InfoSphere Streams is designed to handle very high data throughput rates, up to
millions of events per second. A market leader in providing sophisticated
analytics for IoT, IBM received the
2013 Ventana Research award for Operational Intelligence in the IT Innovation
category for InfoSphere Streams.
Core highlights are -
- Perform advanced real-time analytics on data in
motion
- Rapidly ingest, correlate and continuously
analyze a massive volume and variety of structured and unstructured
streaming data as it arrives from thousands of sources
- Make real-time predictions and discoveries as
data arrives
- Visualize data easily with drag-and-drop
development tools
- Detect and respond to critical events
immediately
- Learn and update models for future analysis and
trend prediction with cognitive computing

InfoSphere Streams helps you:
- Analyze data in motion—provides sub-millisecond response times,
allowing you to view information and events as they unfold. Tools
facilitate sophisticated analytics, such as geospatial, voice, image and
text, and also update models on the fly.
- Simplify development of streaming applications—uses an Eclipse-based integrated development
environment (IDE). Developers are able to easily and rapidly build
applications and connect to new data sources. Drag-and-drop editors,
wizards, visualization tools, and runtime monitoring and debuggers are
available.
- Extend the value of existing systems—integrates with your applications, and supports
both structured and unstructured data sources. The supporting
infrastructure adapt to rapidly changing data formats, types and messaging
protocols. It also read from and writes to a vast number of data sources.
A massively parallel architecture is designed to deliver unlimited compute
potential.
IBM Infosphere Streams capabilities are designed to
work together and with existing bigdata & analytics applications such as BI
and predictive analytics. Here’s an
example scenario:
1) Historic data is stored in the DB/warehouse (DB2, Infosphere Warehouse, Informix,
Oracle, solidDb, MySQL, SQLServer, Netezza etc.) where interesting patterns
are detected using database toolkit operators, such as the pattern of credit
card transactions that would indicate possible fraud. Support for XML allows developers to fuse a broader range of traditional and untraditional data.
2) IBM SPSS leverages IBM SPSS Modeler to develop and build predictive models, and then deploy them using the SPSS Scoring Operator. The PMML models are then imported into InfoSphere Streams Studio to
generate Streams programs that are executed to score the incoming records in
real time without suspending InfoSphere Streams applications.
3) Additional data sources such as RFID tags, blogs, or other information might be used to improve the
confidence levels of the scoring algorithms.
4) These measures can be sent to Dashboards like IBM Cognos Real Time Monitoring or
business process management (BPM) systems to trigger business processes to take
immediate action as required.
5) IBM InfoSphere BigInsights lets you store streaming data in an enterprise-class Hadoop environment for additional analysis or historic retention. InfoSphere Streams and InfoSphere BigInsights use the same advanced text analytics capabilities to simplify natural language processing applications for both data in motion and data at rest. In addition, InfoSphere BigInsights can be used to augment streaming sources with contextual information, and users can visualize InfoSphere Streams data in the InfoSphere BigInsights console.
6) Streams real-time analytics can be integrated with
ETL solutions like IBM DataStage
helps get more timely results and offload some analytics load from the
warehouse. IBM InfoSphere DataStage helps users perform deep analysis and gain additional insight using contextual and source data from other parts of the infrastructure.
7) Messaging queues allow InfoSphere Streams to receive data from or send data to IBM WebSphere MQ, IBM MessageSight and Java Messaging System (JMS) offerings.
8) IBM InfoSphere Data Explorer enables users to visualize InfoSphere Streams data in the InfoSphere Data Explorer CXO dashboard and add streaming data to the InfoSphere Data Explorer index.
Stream
computing use cases
When companies can analyze ALL of their
available data, rather than a subset, they gain a powerful advantage over their
competition. Many customers are seeing tangible ROI using IBM Stream solutions
to address their big data challenges:
- Healthcare: 20% decrease
in patient mortality by analyzing streaming patient data
- Telco: 92% decrease in
processing time by analyzing networking and call data
- Utilities: 99% improved
accuracy in placing power generation resources by analyzing 2.8 petabytes
of untapped data
Below are few cross-industry scenarios best suitable
for stream computing –
1) Know Everything about your
Customers
·
Social media customer sentiment analysis
·
Promotion optimization
·
Segmentation
·
Customer profitability
·
Click-stream analysis
·
CDR processing
·
Multi-channel interaction analysis
·
Loyalty program analytics
·
Churn prediction
2) Innovate New Products at
Speed and Scale
·
Social Media - Product/brand Sentiment analysis
·
Brand strategy
·
Market analysis
·
RFID tracking & analysis
·
Transaction analysis to create insight-based product/service offerings
3) Instant Awareness of Risk
and Fraud - Lower risk, detect fraud and monitor cyber security in real time.
Augment and enhance cyber security and intelligence analysis platforms with big
data technologies to process and analyze new types (e.g. social media, emails,
sensors) and sources of under-leveraged data to significantly improve
intelligence, security and law enforcement insight.
·
Multimodal surveillance
·
Cyber security
·
Fraud modeling & detection
·
Risk modeling & management
·
Regulatory reporting
4) Exploit Instrumented Assets
·
Network analytics
·
Asset management and predictive issue resolution
·
Website analytics
·
IT log analysis
5) Run Zero Latency Operations
·
Smart Grid/meter management
·
Distribution load forecasting
·
Sales reporting
·
Inventory & merchandising optimization
·
Options trading
·
ICU patient monitoring
·
Disease surveillance
·
Transportation network optimization
·
Store performance
·
Environmental analysis
·
Experimental research
Here’s few usecases in industries to get an idea
about the breadth of possibilities that stream technology along with other
bigdata products can offer. To explore more details, click on the industry
title below.
- Data warehouse optimization
- Predictive asset optimization
- Connected vehicle
- Actionable customer insight
- Optimize offers and cross sell
- Contact center efficiency and problem resolution
- Payment fraud detection and investigation
- Counterparty credit risk management
- Optimized promotions effectiveness
- Micro-market campaign management
- Real-time demand forecast
- Distribution load forecasting and scheduling
- Create targeted customer offerings
- Condition-based maintenance
- Enable customer energy management
- Smart meter analytics
- Threat prediction and prevention
- Social program fraud, waste and errors
- Tax compliance - fraud and abuse
- Crime prediction and prevention
- Measure and act on population health
- Engage consumers in their healthcare
- Health monitoring and intervention
Knowing the order of events can have profound
impacts, for example in predicting the path of a natural disaster or picking
the next best stock trade. InfoSphere Streams helps insurance companies plan
for natural disasters and enables real-time public alerts. It also performs
real-time analysis of sensor data collected from the Hudson River, one of the most instrumented bodies of
water in the world. Check this out - https://www.youtube.com/watch?v=y3CZQOtVx6s&list=PLA98824D75176BAEB&index=18
- Claims fraud detection
- Next best action and customer retention
- Catastrophe risk modeling
- Usage-based insurance
- Portfolio management
- Producer optimization
- Advanced condition monitoring
- Drilling surveillance & optimization
- Production surveillance & optimization
- Merchandise optimization
- Actionable customer insight
Telecommunications service providers continue to
experience a huge growth in smartphone and mobile device use. Growing text and
data usage creates a deluge of context- and time-sensitive data. InfoSphere
Streams enables telecommunications providers to analyze billions of call data
records per day to detect fraud, ensure high asset utilization and create
accurate customer profiles for heightened customer service and retention. Using
InfoSphere Streams, Sprint reduced storage
costs by 90 percent. Check this out -
https://www.youtube.com/watch?v=eg8KSLAZ2HM&feature=player_embedded
- Pro-active call center
- Smarter campaigns
- Network analytics
- Location-based services
- Customer analytics and loyalty marketing
- Capacity & pricing optimization
- Predictive maintenance optimization
Reference: