IT organizations around the world are actively wrestling with the practical challenges of creating a big data program. The tantalizing combination of advanced analytics, a wide variety of interesting new data sets, an attractive cost model, and a proven scientific rigor put big data on pretty firm footing as an investment target for CIOs. But because of the rapid innovation cycle, the dizzying variety of new tools, and the relative newness of many of the key concepts (e.g., micro services, containerization, schemaless database operations, and data science), some IT organizations are finding it difficult to know where to begin—and perhaps most importantly—how to progress and avoid having their big data ambitions derailed.
This article provides IT leaders and enterprise architects some guidance as to what to plan for and what to expect as they progress through four phases of big data adoption.
• Phase 1. Experimentation — Understanding the capabilities of a big data platform
• Phase 2. Implementation — Developing first production use cases
• Phase 3. Expansion — Expanding to multiple use cases across the company
• Phase 4. Optimization — Optimize and integrate apps on the converged data platform
These phases represent discernable patterns that we have observed from hundreds of engagements with our production customers.
"The goal is to be able to be predictive about most aspects of the business, and be able to respond and change operations in real time"
In the first phase of the big data journey, companies are exploring how the Hadoop/Spark ecosystem works and how it can fit in with their existing enterprise data architecture. Those organizations that are trying to tackle the scale and richness of this new data with standard tools (RDBMS, enterprise middleware, premium storage arrays, and data warehouses) find that traditional IT economics don’t make sense beyond a certain threshold.
Initially, IT organizations are motivated by two factors. First is the IT preparedness. IT may or may not always completely drive or control big data adoption, but they do control their own readiness to implement big data when they are called upon. And the second factor is cost takeout. The combination of low cost commodity servers with cheap direct attached storage and free open source software provides an enticing reason to explore big data software.
IT-driven big data program will initiate cost takeout use cases like a Data Warehouse offload, where cool or cold data is stored in Hadoop. This can reduce costs in premium storage, server hardware, and database licenses. Other targets include mainframe offload, lowering dependence on premium storage arrays, and even replatforming RDBMS applications.
In terms of what actions can be taken, start by identifying a team or creating the beginning of a big data Center Of Excellence (COE) that would include distinguished staffers from IT Ops (for cluster admin and data management strategies), the BI/DW/Analytics team, and development. Engage in technology experimentation including Proof of Concepts (POCs), competitive bake-offs between vendors and approaches such as Hadoop/Spark vendors, NoSQL databases, and commercial vs. open source analytics.
In the implementation phase, companies are ready to get started in creating production use cases. They should be confident in their ability to build a solution on this platform. A key step is creating a professional big data discipline that moves beyond the proof of concept or value stage, and big data proponents must take steps to establish key platform components (like Spark, Hadoop, Elastic search, etc,) as permanent parts of their technology portfolio.
The most common production use cases in Phase 1 are an enterprise data lake or data hub that is used as a basis for a broad spectrum of analytics. Other common use cases include analytics use cases that move closer to the business. This could include security log analytics, fraud detection, and sales or marketing use cases such as creating a 360-degree view of the customer, reducing customer churn, and building a recommendation engine.
The implementation phase is all about creating a Lighthouse Win. The big data program team must drive a small number of use cases into production that deliver a demonstrable benefit to the business. These early use cases will set the tone for the entire program going forward. Looking ahead, we work on to expand use cases through an ideation process.
The expansion phase is where your big data program accelerates at a rapid rate. Success with early use cases demonstrates to others in the company about what can be accomplished. Implementing new applications will be a little easier since the platform is in place. However, if the use cases cross organizational boundaries, then multi-tenancy, security, and data governance will become a concern.
Now that your organization has a firm handle on the tools, data sources, formats, and algorithms, dozens if not hundreds of use cases become reasonable investment targets. With the expanded understanding of the big data ecosystem, the focus on use cases often turns to the customer’s vertical industry. For financial services, it may be risk management, fraud detection, and large scale self-service analytics. For healthcare, the focus may be on reducing readmission, population health management, and coordinating smart medical devices.
Organizations are now focused on industrializing their big data capability and making it relevant to all lines of business. That means establishing a dedicated IT or Dev staff, a dedicated cluster, and budgeting, staffing, training, and big data teams in central IT, BI/DW, enterprise development, and LOB leads.
In this final phase, there is critical mass in the organization and all LOBs are seeing the benefits of big data. You should leverage this capability to improve business and operational processes, which will reshape the business and give you a competitive edge. The goal is to be able to be predictive about most aspects of the business, and be able to respond and change operations in real time across more than one line of business.
At this point, you may have created a Chief Data Officer (CDO) role and have hired data scientists and data engineers to look for problems that no one has solved yet. There is a deep investment and commitment to data science across all lines of business. Any actionable insights that are created have the potential to be automated. All key decision makers should have real-time visibility into the performance of their operations.
Admittedly, the number of organizations at this advanced level is relatively small. However, companies that are operating at this level are disrupting their industries and changing our notion of how responsive and adaptable a modern enterprise can be. Like any important discipline, it is important to observe the most advanced players to help you up your own game, and to successfully progress through the four phases of big data adoption.