Enterprises need to begin thinking if they really should continue building out more physical Hadoop compute and storage infrastructure. Up until recently, the only way to perform production compute on big data was over clusters of physical nodes running some distribution of HDFS and mostly as YARN batch jobs. Data IT organizations spent a great deal of effort to convince corporate business units to bring their data into the “Data Lake.” However, ingestion is one thing, consumption is altogether a different consideration. For large enterprises who were successful with data consolidation, this meant your infrastructure continued to grow. It’s time to start thinking about making it shrink instead because now Big Data is just Data.
While business units typically have a few data experts who could write SQL queries and could do useful joins within their domain expertise, most requests for reshaping, curating or creating special views of data are handled as IT projects. The problem is you can never hire enough data engineers to keep up with the consumption demand from the business, it’s not even the type of work most of them like to do. There are many data tools, some licensed and some open sourced, however most are made for power users or data engineers who can bring about real business value. The key is to stratify the architecture of your environment and match the best solution at each layer. By abstracting away the physical attributes of data or where its stored, you can decouple data access from data storage and decouple the consumption platform from access methods. This way you can make consumption more self-service.
Much progress has already been made in data science use cases. For example, Spark can run without Hadoop and you can run commercial or in-house applications to read and write data to Azure Blob or AWS S3 storage to perform data wrangling or analysis, HDFS is not needed. To replace the scaling and resource management capabilities provided by Hadoop, you can utilize Mesos or Kubernetes along with a wider choice of orchestration tools to accomplish the same thing. Spin up an Azure DataBricks (Spark as a Service) compute cluster in the cloud that reads and writes to Azure Data blobs, provision all the connections, and you’re good to go. Programmable auto-scaling and auto-termination of Spark clusters based on workload patterns can drastically reduce costs so business people can choose how fast they want results based on their budget.
There will be little innovation in Hadoop going forward. Spark and other Apache projects will make Hadoop obsolete in three years. Hadoop is 10 years old, that’s old even in elephant years. I see Hadoop being relegated to the island of misfit legacy systems. It’s strength as an inexpensive storage repository and ETL batch workload engine has lost its muscle. Most everyone agrees that legacy big data distributions are not suitable for any interactive, client-facing applications. Business people want to join, reshape and curate their structured or unstructured data - regardless of where it is. The time is right to free the data, and let the elephant go while you’re at it.
The widespread use of open source container technology has enabled many development shops to quickly bring about the adoption of DevOps, hybrid cloud and microservices. Unlike virtual machines, containers allow you to apply “minimalism” by paring down to only the necessary parts need to run whatever the container was designed to do. Containers eliminate the packing in of non-essential functions into the same VM or physical machine.
Think of containers in two parts. At the base level, they are an OS kernel technology that can run anywhere, whether on premise in a virtual host or in the public cloud. They provide the access support for resources like networking, security and storage. Containers are the instances that run on virtualized operating systems. The second part is the payload containing the application code and its dependences optimized by DevOps teams as a unit of work. The payload itself can be anything whether an isolated package of legacy code or modern code based on microservices. The real value of containerized infrastructure is it allows enterprises to quickly introduce “born-in-the-cloud” microservices running in the building block environment containers enable.
Now Microservices is a modular software architecture where the application components are designed and advanced both independently of each other and of their surrounding environment. They are fully agile and built with small teams creating discrete modules with published API interfaces. As long as these APIs are managed effectively you can develop at a much faster pace using a team of teams building, testing and publishing the many parts making up a large application. All functionality is decomposed and separated at an architectural level allowing every team to develop and experiment without the need to coordinate every change. You don’t have to worry about breaking something outside of your own code stack. The culture supports better creativity and innovation that is then amplified using CI/CD (Continuous Integration / Continuous Delivery) automation tools. This sounds so wonderful however there is one important thing to remember. You have to spend the extra time and effort up front on API integration, design principles and architecture to make sure all the services can talk to each other. You have to resist the temptation to just sit down and start writing code. There are many tech companies using these approaches such as Amazon, Facebook, Google and others with complex application platforms. I see the combination of containers and microservices becoming the force multiplier many organizations are looking for to deliver working code faster with the highest business value.
It’s interesting that contemporary architectural approaches such as factoring and tiered layers are aggressively being applied to blockchain enterprise digital contracts. These Smart Contracts consist of three basic components: the logic, the static and variable properties and the ledger. They are deployed on a single computer or node then the node is replicated so that exact results are produced for each contract step no matter what node in a network produces it. For trustless environments like Bitcoin or Ethereum, counterparties place their trust in cryptography. For enterprise consortium networks, a semi-trust construct can be formed where certain levels of identities are required. The contract can be separated into three familiar layers to developers - the data, business and presentation layer. For Smart Contracts the properties represent the data schema, the logic represents code and the ledger represents a database. Complex business logic is removed from the execution path to allow the data tier to be optimized reflecting the distributed nature of a network.
Dependent services the code needs can be provided in a “fabric” where it can execute and then send transactions to blockchain nodes where they are bound to its schema in the data layer. The host containers running in the fabric are referred to as “Cryptlets” and the nice thing is they can be run on a different compute platform or cloud rather than having to be executed across every node in a network. This framework allows developers to write in their preferred programming languages and deploy into environments that meet the business needs of a Smart Contract. So going back to our familiar tiered architecture, we can summarize the analogies. The data layer defines the data schema and has only logic for validation of inserts using languages like Solidity and Ivy. The business layer contains logic for the smart contracts and APIs for interacting with the presentation layer. These are the cryptlets that can be written in any language. The presentation layer handles interfaces to other platforms or applications built using the exposed APIs of the cryptlets.
I expect Smart Contracts based on blockchain technology to become more prevalent in healthcare during the next three years. Think of your medical record and provider encounters forming a digital care path where every trusted entity along the way adds a “block” to your healthcare blockchain - whether it’s an x-ray, biometric data, a lab report, a prescription or doctor notes. You control the key and your identity is verified & protected along with every other counterparty in the ecosystem.
Historically, the term Internet of Things (IoT), describes a physical network of embedded dedicated objects sensing and interacting with their own and external environments. The advances in context-aware software to “learn and analyze” create scenarios where things are becoming active players in digital relationships. Imagine a near future where “Things” become independent business entities with pre-determined capacity to act like “customers” or “suppliers” within a commercial construct. Through automation, Things would be able to make their own purchasing decisions, receive messages, request service, negotiate for the best terms and report disputes - essentially just like a human would. Along the same growth trajectory is “algorithmic business” where the interaction, exchange, interplay and network effect of value is encapsulated in programming logic and inserted in the transaction flow between customers and suppliers. At the intersection of these two trends lie not only new opportunities for revenue generation and operational efficiencies but also new ways for managing relationships. Much like we do today with Customer Relationship Management (CRM), leaders will need to develop strategies for Thing Relationship Management (TRM).
A useful thought example would be a Thing that has a service utility requiring replenishment of supply, such as a soap dispenser in a hospital. The monitoring system would detect its refill requirement and before initiating an alert to housekeeping, it checks the on-site inventory. There could be pre-determined business logic that requires refilling within an hour and if the on-site inventory is depleted, it begins to initiate a refill order with the preferred supplier. However, the preferred vendor cannot fill until the following day and here is where the Thing becomes a pro-active commercial participant in the supply chain. It begins successive requests to alternative suppliers and negotiates the best price and terms for delivery. It places the order when it finds a supplier who can meet the specification. The implications of this scenario are far reaching. The Thing will need a digital identity, delegated authority, trust levels and financial compliance for auditing just to name a few. Managing these attributes would be very similar to how we manage relationships today in the sales process. Things would essentially be viewed like “people” within a broad set of commercial transactions. I expect to see an adaptation of CRM to TRM in the very near future.
There is a growing interest in sharing clinical algorithms on the part of medical institutions utilizing proven open marketplace models. The idea is to provide advanced analytic algorithms freely while charging provider delivery organizations for other related tools. Hospitals, surgical centers, home-health agencies, outpatient facilities, labs and urgent care centers do not have economical access to the best-in-class analytic engines without contracting with specialty vendors. Many of these provider groups have their own data scientists, researchers and clinicians mining and analyzing their vast amounts of healthcare information. However, developing and testing algorithms designed to improve quality, patient outcomes or administrative operations using this data is expensive and time consuming.
So what are these algorithms? Authored by some of the most advanced data and medical scientists in the world, they include clinical pathways, protocols, quality & safety measures and disease predictors - just to name a few. They are all evidence-based and offered by organizations such as the Mayo Clinic, Cleveland Clinic and many other medical institutions globally. Technology providers in Big Data & Analytics have historically offered these tools and services to their provider customers. But now the medical centers themselves are offering to share, promote and sell their expertise and knowledge via a marketplace model such as those offered by Apervita or Teradata’s Aster Community. These are not simple formulas or canned reports. They are precision algorithms designed to produce the highest level of accuracy with the lowest level of false positives for clinical outcomes. With widespread use, peer review and outcome case studies by provider users; the best ones will naturally rise in popularity and quality ratings. Much of consumer selection of products and services are “review” based with recommendations and star type ratings available from relevant online sites. More of this transparency is coming to healthcare.
These “pre-packaged” insights are important to providers who are moving to value-based healthcare where delivering preemptive and predictive clinical decisions at the point of care is critical. By improving patient outcomes with fewer repeat visits and less trial and error, we believe this lowers the overall cost of healthcare for everyone. Many providers are not equipped to create and deliver an extensive portfolio of predictive or prescriptive models to improve population health. An open marketplace reduces or prevents vendor lock-in and by testing and validating the offered models with their own data, providers also make them better. It’s a continuous loop of discovery, testing and validation that is driven on data portability, self-service and open access. More importantly, this forces new valuation models for information assets on the part of these disparate providers and institutions. Through subscription fees on a trading platform, stakeholders create monetization opportunities to continue ongoing funding and investment of their own capabilities. As industry participants learn how to sell, trade or license their intellectual property, while providing much of it freely, they can help drive innovation in the healthcare sector where it is greatly needed.