Data Quality · Multi-Engine
Agile data quality for functional analysts.
Define rules in business language. The engineering team implements them in SQL or notebooks. NeptunoDQ runs them on Apache Spark and Databricks — no lock-in, full traceability.
Apache Spark · Databricks · Multi-engine ready
Who it's for
One platform, three roles.
Functional Analyst · Audit
Define what to validate in business language. Submit proposals, review changes, and sign off approvals with full traceability.
Data Engineer
Implement rules in SQL or notebooks, configure variables and thresholds, integrate with your existing pipeline on Spark or Databricks.
Platform Team
Deploy a single quality engine across multiple compute engines. No lock-in. No in-house framework to maintain.
Capabilities
Define once. Deploy with confidence.
Built for functional analysts
Proposals, reviews, and approvals in business language. Analysts don't need to fight Spark just to define a rule.
Declarative rules
JSON or YAML. SQL_FILE, SQL_TEXT, TABLE, FILE, or ADBNOTEBOOK. Zero boilerplate code.
DAG with dependencies
If a rule depends on another, the engine waits. If the parent fails, the child is skipped. No orphan executions.
Parallel execution
Spark Fair Scheduler distributes independent DAG branches across pools. Full suite in a fraction of the time.
True multi-engine
Apache Spark open source and Databricks runtimes. Same workflow, same inventory, different deployment target.
End-to-end auditability
Every change is recorded: proposals, reviews, hotfixes, and executions. Traceability is not a feature — it's the design.
Engines
Genuinely multi-engine.
The same rule inventory. The same approval workflow. Different execution engines.
Configuration
project_id: "neptuno-demo"
department: "analytics"
rules:
- control_type: SQL_FILE
rule_id: check_total_money
file: "${PATH}/rules/validate_money.sql"
table: "${USERS_TABLE}"
umbral: "0.01"
variables:
table: "${NEPTUNO_SCHEMA}.${USERS_TABLE}"
max_age: "75"
sql_aggregations:
total_money: "sum(money)"YAML or JSON. No boilerplate.
Define each quality control in a declarative file. The engine handles the rest: dependencies, parallelism, notifications, and traceability.
- SQL_FILE / SQL_TEXT
Logic in .sql files or inline. Variables and thresholds configurable per environment.
- TABLE / FILE
Null, duplicate, and whitespace checks without writing SQL.
- ADBNOTEBOOK
Run a Databricks notebook as a quality step inside the same DAG.
Product
Define together. Deploy with trust.
A data quality platform built for audit and functional analysts — without sacrificing technical rigor.