From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake

From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake

Summary of our recent article published on the Snowflake Medium blog

Traditional ingestion methods rely on custom-built JDBC connectors – flexible, but difficult to scale, maintain, and deploy. Diskover needed a different approach. Snowflake Openflow provided a low-code, processor-driven framework that handled much of the orchestration behind the scenes, while Kafka acted as the durable, future-proof transport layer.

This allowed Diskover to:

  • Scale ingestion without custom code, using Openflow’s built-in orchestration
  • Standardize on Kafka, enabling future expansion to other warehouses with Kafka consumers
  • Accelerate onboarding, even for customers without deep engineering teams

Diskover’s connector processes two major data flows:

  1. Quota records: storage allocation and usage over time
  2. File metadata: billions of filenames, paths, timestamps, sizes, and owners indexed by Diskover

Openflow organizes these flows into parallel branches on the canvas, separates metadata-only records from those requiring additional handling, and streams the results into Snowflake using Snowpipe Streaming.

diskover + openflow overview

As Byron built the first working version, a few themes emerged:

  1. Low-code still requires engineering discipline.
    • Each processor needs careful configuration – thread counts, schemas, partitioning strategies. The visual workflow hides orchestration, but not design responsibility.
  2. Error messages can be…cryptic.
    • Some Openflow errors were vague or misleading. Debugging often required digging into SQL roots or using AI tools to interpret internal messages.
  3. Scaling is about design, not just settings.
    • Kafka partitioning proved essential. Too few partitions bottleneck the pipeline; too many add overhead. Openflow helps surface bottlenecks, but thoughtful upfront design matters most.
  4. Automate everything possible.
    • To avoid manual copy-paste errors across dozens of processors, Byron built a Go-based code generator to produce consistent JSON, SQL, and YAML configs, a key step toward repeatability.
  • Start small and iterate. Build a functional pipeline before stressing it with scale.
  • Leverage the Apache NiFi community. Openflow concepts closely follow NiFi, and existing documentation is invaluable.
  • Automate configs early. Avoid manual parameter editing at all costs.
  • Expect tuning and refinement. Openflow accelerates development, but production pipelines still require thoughtful engineering.

The connector is fully functional today, with beta deployments beginning soon. These real-world environments will answer open questions around autoscaling, throughput, and handling more complex metadata or blob-level extraction.

Long term, the architecture positions Diskover to extend similar pipelines to other cloud warehouses, because Kafka remains at the core, the pipeline can evolve without major redesign.

Scroll to Top