Rethinking Enterprise Data Pipelines: One Pipeline for Any Source and Any Target
Andy Blum
July 7, 2025

The Enterprise Problem: Pipeline Chaos
Across enterprises, data integration is fragmented. Every new source and target, whether it is Salesforce, Snowflake, APIs, or cloud storage, leads to another custom-built pipeline.
Each pipeline becomes a separate project with its own logic, codebase, and monitoring stack.
- •Duplicate effort, as teams rebuild the same processes repeatedly
- •Inconsistent standards that make governance and auditing difficult
- •Long delays to deliver new data integrations
- •Increased costs from maintaining redundant systems
When you step back, nearly every data flow follows the same pattern. Data moves from a Source to a Target, passing through common stages along the way.
"Why rebuild that pattern every single time?"
The Shift: Metadata-Driven, Universal Pipelines
We took a different approach.
Instead of writing new visual or programmatic routines for every S→T connection, we describe each pipeline through a standard schema. This is a universal schema that instructs the engine on the source, target, behaviours to run, and transformations to apply.
At runtime, the pipeline reads this metadata and configures itself dynamically for each job. This allows one pipeline framework to process:
It works with our framework, which powers EveryArrow, as well as any engine that supports injecting runtime parameters. This includes Spark, Airflow, NiFi, or your in-house tools.
For enterprise teams, this is more than an engineering shortcut. It is the foundation of a consistent standard across all data integrations, reducing friction and improving speed.
My Journey to This Solution
This idea is not theoretical. It is the result of decades of solving these problems inside some of the world's most complex organizations.
In the 1990s, I built one of the first metadata-driven programming tools. Applications described themselves in data instead of code.
Later, as a technical architect at companies like Pepsi, HarperCollins, Merck, Morgan Stanley, and NBC, I applied the same approach. Even before ETL became a defined practice, I was building metadata layers on top of tools like Informatica and SSIS. This allowed the same pipeline structure to run hundreds of distinct processes. These patterns I built helped save a merger from going awry, enabling Macquarie Bank to acquire Delaware Investments when they discovered their mainframe was a giant ETL machine. They needed to replace hundreds of routines in a few months or risk heavy fines under their agreement.
The problem reappeared in the Big Data era. At a Fortune 200 company, I used metadata-driven designs to reduce a 50-person data engineering team down to 8 people while delivering better results.
But there was resistance. A Big Four consulting leader once told me, "Your ideas worked when we tried them. But the person who made them work did not get a Christmas bonus." Faster results conflicted with the consulting model, where profitability depends on large teams, not small ones.
Building the Future of Data Integration
Rather than compromise, I invested nearly a million dollars of my own capital and lost income to build a better way.
The result is DITTOE: Data Integration, The Theory of Everything.
The name is a nod to both the universal nature of the framework and the old ditto machines that copied information from one page to another.
DITTOE is comprised of first principles encapsulated in this formula:

Every data integration routine ever written moves resources from a Source to a Target. Those resources fall into four fundamental categories: Files, APIs, Databases, and Streams, or what we call FADS. Irony alert, these are the least fadful facets of computing.
Every data flow follows the same core stages:
By maintaining consistent activity swim lanes for pipelines, we enhance both governance and analysis. Whether it is a scheduled batch job, real-time event stream, or an AI Agent MCP request, the pattern remains the same.
A Cloud-Native Framework for Today's Workloads
DITTOE is more than a schema. It is a production-ready Java framework that is powering EveryArrow.io and is available for your enterprise.
Key features include:
- •Defining pipelines as JSON metadata describing sources, targets, transformations, mappings, and behaviors.
- •Dynamically configuring pipeline behaviors such as encryption, masking, deduplication, and even AI Agents at runtime without additional code
- •Security features include Role Based Application Control, Secrets Management, Authentication, and private key access.
- •Handling pipeline dependencies, retries, and error recovery automatically
- •Generating DITTOE schemas from your organization's own metadata
- •Seamless integration with any File, API, DATABASE, OR STREAM with Snowflake, BigQuery, Salesforce, JDBC/ODBC targets, and REST APIs working and more to come.
- •Supporting triggers from schedulers, message queues, REST APIs, file watchers, and user interfaces
- •Providing a full observability layer, audit trails, and real-time monitoring in a SaaS UI
- •Running each pipeline in a temporary cloud server that shuts down when the work is done, reducing infrastructure costs dramatically
In November, we validated this framework by moving billions of rows of historical election data into Snowflake and BigQuery. We outperformed traditional ETL tools in both speed and cost.
The Takeaway for Enterprise Leaders
If your company is still hand-building pipelines for every new source and target, you are wasting time and money. Worse, you are making governance and scalability nearly impossible.
The solution is not another tool. It is a universal pipeline standard that your entire enterprise can align around, regardless of the pipeline engine you run beneath it.
DITTOE is our implementation of that standard. However, the approach is one that every enterprise should adopt to build faster and smarter.