1) Stream is the right capability when the
primary big data challenge is analyze data that is in motion (Velocity) –
because the business imperative requires a real-time response/action based on
analyzing the data or the data is very large and want to more cost-effectively
filter and remove data before moving into your data warehouse or Hadoop
system. It can handle continuous or
bursty streams of data – millions of events per second with microsecond
latency.
2) Streams can process any type of data
(Variety) – audio, video, network logs, sensors, social media such as Twitter,
in addition to structured data.
3) And, Streams is designed to scale to
process any size of data from Terabytes to Zetabytes per day
IBM InfoSphere Streams for Stream Computing
IBM InfoSphere Streams is an
advanced analytic platform that allows user-developed applications to quickly
ingest, analyze and correlate information as it arrives from real-time sources.
InfoSphere Streams is designed to handle very high data throughput rates, up to
millions of events per second. A market leader in providing sophisticated
analytics for IoT, IBM received the
2013 Ventana Research award for Operational Intelligence in the IT Innovation
category for InfoSphere Streams.
Core highlights are -
Core highlights are -
- Perform advanced real-time analytics on data in motion
- Rapidly ingest, correlate and continuously analyze a massive volume and variety of structured and unstructured streaming data as it arrives from thousands of sources
- Make real-time predictions and discoveries as data arrives
- Visualize data easily with drag-and-drop development tools
- Detect and respond to critical events immediately
- Learn and update models for future analysis and trend prediction with cognitive computing
InfoSphere Streams helps you:
- Analyze data in motion—provides sub-millisecond response times, allowing you to view information and events as they unfold. Tools facilitate sophisticated analytics, such as geospatial, voice, image and text, and also update models on the fly.
- Simplify development of streaming applications—uses an Eclipse-based integrated development environment (IDE). Developers are able to easily and rapidly build applications and connect to new data sources. Drag-and-drop editors, wizards, visualization tools, and runtime monitoring and debuggers are available.
- Extend the value of existing systems—integrates with your applications, and supports both structured and unstructured data sources. The supporting infrastructure adapt to rapidly changing data formats, types and messaging protocols. It also read from and writes to a vast number of data sources. A massively parallel architecture is designed to deliver unlimited compute potential.
IBM Infosphere Streams capabilities are designed to
work together and with existing bigdata & analytics applications such as BI
and predictive analytics. Here’s an
example scenario:
1) Historic data is stored in the DB/warehouse (DB2, Infosphere Warehouse, Informix,
Oracle, solidDb, MySQL, SQLServer, Netezza etc.) where interesting patterns
are detected using database toolkit operators, such as the pattern of credit
card transactions that would indicate possible fraud. Support for XML allows developers to fuse a broader range of traditional and untraditional data.
2) IBM SPSS leverages IBM SPSS Modeler to develop and build predictive models, and then deploy them using the SPSS Scoring Operator. The PMML models are then imported into InfoSphere Streams Studio to
generate Streams programs that are executed to score the incoming records in
real time without suspending InfoSphere Streams applications.
3) Additional data sources such as RFID tags, blogs, or other information might be used to improve the
confidence levels of the scoring algorithms.
4) These measures can be sent to Dashboards like IBM Cognos Real Time Monitoring or
business process management (BPM) systems to trigger business processes to take
immediate action as required.
5) IBM InfoSphere BigInsights lets you store streaming data in an enterprise-class Hadoop environment for additional analysis or historic retention. InfoSphere Streams and InfoSphere BigInsights use the same advanced text analytics capabilities to simplify natural language processing applications for both data in motion and data at rest. In addition, InfoSphere BigInsights can be used to augment streaming sources with contextual information, and users can visualize InfoSphere Streams data in the InfoSphere BigInsights console.
6) Streams real-time analytics can be integrated with
ETL solutions like IBM DataStage
helps get more timely results and offload some analytics load from the
warehouse. IBM InfoSphere DataStage helps users perform deep analysis and gain additional insight using contextual and source data from other parts of the infrastructure.
7) Messaging queues allow InfoSphere Streams to receive data from or send data to IBM WebSphere MQ, IBM MessageSight and Java Messaging System (JMS) offerings.
8) IBM InfoSphere Data Explorer enables users to visualize InfoSphere Streams data in the InfoSphere Data Explorer CXO dashboard and add streaming data to the InfoSphere Data Explorer index.
8) IBM InfoSphere Data Explorer enables users to visualize InfoSphere Streams data in the InfoSphere Data Explorer CXO dashboard and add streaming data to the InfoSphere Data Explorer index.
Stream
computing use cases
When companies can analyze ALL of their
available data, rather than a subset, they gain a powerful advantage over their
competition. Many customers are seeing tangible ROI using IBM Stream solutions
to address their big data challenges: - Healthcare: 20% decrease in patient mortality by analyzing streaming patient data
- Telco: 92% decrease in processing time by analyzing networking and call data
- Utilities: 99% improved accuracy in placing power generation resources by analyzing 2.8 petabytes of untapped data
Below are few cross-industry scenarios best suitable
for stream computing –
1) Know Everything about your
Customers
·
Social media customer sentiment analysis
·
Promotion optimization
·
Segmentation
·
Customer profitability
·
Click-stream analysis
·
CDR processing
·
Multi-channel interaction analysis
·
Loyalty program analytics
·
Churn prediction
2) Innovate New Products at
Speed and Scale
·
Social Media - Product/brand Sentiment analysis
·
Brand strategy
·
Market analysis
·
RFID tracking & analysis
·
Transaction analysis to create insight-based product/service offerings
3) Instant Awareness of Risk
and Fraud - Lower risk, detect fraud and monitor cyber security in real time.
Augment and enhance cyber security and intelligence analysis platforms with big
data technologies to process and analyze new types (e.g. social media, emails,
sensors) and sources of under-leveraged data to significantly improve
intelligence, security and law enforcement insight.
·
Multimodal surveillance
·
Cyber security
·
Fraud modeling & detection
·
Risk modeling & management
·
Regulatory reporting
4) Exploit Instrumented Assets
·
Network analytics
·
Asset management and predictive issue resolution
·
Website analytics
·
IT log analysis
5) Run Zero Latency Operations
·
Smart Grid/meter management
·
Distribution load forecasting
·
Sales reporting
·
Inventory & merchandising optimization
·
Options trading
·
ICU patient monitoring
·
Disease surveillance
·
Transportation network optimization
·
Store performance
·
Environmental analysis
·
Experimental research
Here’s few usecases in industries to get an idea
about the breadth of possibilities that stream technology along with other
bigdata products can offer. To explore more details, click on the industry
title below.
Automotive
- Data warehouse optimization
- Predictive asset optimization
- Connected vehicle
- Actionable customer insight
Banking
- Optimize offers and cross sell
- Contact center efficiency and problem resolution
- Payment fraud detection and investigation
- Counterparty credit risk management
Consumer Products
- Optimized promotions effectiveness
- Micro-market campaign management
- Real-time demand forecast
Energy and Utilities
- Distribution load forecasting and scheduling
- Create targeted customer offerings
- Condition-based maintenance
- Enable customer energy management
- Smart meter analytics
Government
Geospatial analysis requires complex mathematics such
as set theory and geospatial geometry. It is used for location intelligence and
location-based services for security and surveillance, geographic information
systems, traffic patterns and more. The city of Dublin, Ireland,
uses InfoSphere Streams to analyze 50 bus locations per second for its
fleet of roughly 1,000 buses. Check it out - http://www-01.ibm.com/software/success/cssdb.nsf/CS/RNAE-9C9PN5?OpenDocument&Site=software&cty=en_us
- Threat prediction and prevention
- Social program fraud, waste and errors
- Tax compliance - fraud and abuse
- Crime prediction and prevention
Healthcare
- Measure and act on population health
- Engage consumers in their healthcare
- Health monitoring and intervention
Insurance
Knowing the order of events can have profound
impacts, for example in predicting the path of a natural disaster or picking
the next best stock trade. InfoSphere Streams helps insurance companies plan
for natural disasters and enables real-time public alerts. It also performs
real-time analysis of sensor data collected from the Hudson River, one of the most instrumented bodies of
water in the world. Check this out - https://www.youtube.com/watch?v=y3CZQOtVx6s&list=PLA98824D75176BAEB&index=18
- Claims fraud detection
- Next best action and customer retention
- Catastrophe risk modeling
- Usage-based insurance
- Portfolio management
- Producer optimization
Oil & Gas
- Advanced condition monitoring
- Drilling surveillance & optimization
- Production surveillance & optimization
Retail
- Merchandise optimization
- Actionable customer insight
Telecommunications
Telecommunications service providers continue to
experience a huge growth in smartphone and mobile device use. Growing text and
data usage creates a deluge of context- and time-sensitive data. InfoSphere
Streams enables telecommunications providers to analyze billions of call data
records per day to detect fraud, ensure high asset utilization and create
accurate customer profiles for heightened customer service and retention. Using
InfoSphere Streams, Sprint reduced storage
costs by 90 percent. Check this out -
https://www.youtube.com/watch?v=eg8KSLAZ2HM&feature=player_embedded
- Pro-active call center
- Smarter campaigns
- Network analytics
- Location-based services
Travel & Transportation
- Customer analytics and loyalty marketing
- Capacity & pricing optimization
- Predictive maintenance optimization
Reference:
- InfoSphere Streams Playbook
- Real Time Analytic Processing with IBM InfoSphere Streams
- InfoSphere Streams Information Center
- White paper: IBM InfoSphere Streams: Redefining Real Time Analytics
- SPSS and InfoSphere Streams
- IBM InfoSphere Streams Quick Start Edition software can be downloaded free from here: http://www.ibm.com/software/data/infosphere/streams/quick-start/