Skip to content
NeptunoDQ

Data Quality · Multi-Engine

Agile data quality for functional analysts.

Define rules in business language. The engineering team implements them in SQL or notebooks. NeptunoDQ runs them on Apache Spark and Databricks — no lock-in, full traceability.

Apache Spark · Databricks · Multi-engine ready

Who it's for

One platform, three roles.

Functional Analyst · Audit

Define what to validate in business language. Submit proposals, review changes, and sign off approvals with full traceability.

Data Engineer

Implement rules in SQL or notebooks, configure variables and thresholds, integrate with your existing pipeline on Spark or Databricks.

Platform Team

Deploy a single quality engine across multiple compute engines. No lock-in. No in-house framework to maintain.

Capabilities

Define once. Deploy with confidence.

Built for functional analysts

Proposals, reviews, and approvals in business language. Analysts don't need to fight Spark just to define a rule.

Declarative rules

JSON or YAML. SQL_FILE, SQL_TEXT, TABLE, FILE, or ADBNOTEBOOK. Zero boilerplate code.

DAG with dependencies

If a rule depends on another, the engine waits. If the parent fails, the child is skipped. No orphan executions.

Parallel execution

Spark Fair Scheduler distributes independent DAG branches across pools. Full suite in a fraction of the time.

True multi-engine

Apache Spark open source and Databricks runtimes. Same workflow, same inventory, different deployment target.

End-to-end auditability

Every change is recorded: proposals, reviews, hotfixes, and executions. Traceability is not a feature — it's the design.

Engines

Genuinely multi-engine.

The same rule inventory. The same approval workflow. Different execution engines.

Apache Spark
Apache SparkAvailableOSS 3.5, 4.0, 4.1
Databricks
DatabricksAvailableRuntimes 15.4, 16.4, 17.3
Snowflake
SnowflakeRoadmap
AWS
AWSCompatibleEMR, Glue, S3
Azure
AzureCompatibleADB, ADLS

Configuration

project_id: "neptuno-demo"
department: "analytics"

rules:
  - control_type: SQL_FILE
    rule_id: check_total_money
    file: "${PATH}/rules/validate_money.sql"
    table: "${USERS_TABLE}"
    umbral: "0.01"
    variables:
      table: "${NEPTUNO_SCHEMA}.${USERS_TABLE}"
      max_age: "75"
    sql_aggregations:
      total_money: "sum(money)"

YAML or JSON. No boilerplate.

Define each quality control in a declarative file. The engine handles the rest: dependencies, parallelism, notifications, and traceability.

  • SQL_FILE / SQL_TEXT

    Logic in .sql files or inline. Variables and thresholds configurable per environment.

  • TABLE / FILE

    Null, duplicate, and whitespace checks without writing SQL.

  • ADBNOTEBOOK

    Run a Databricks notebook as a quality step inside the same DAG.

Product

Define together. Deploy with trust.

A data quality platform built for audit and functional analysts — without sacrificing technical rigor.