CHANGELOG¶
v2.0.0 (2026-03-09)¶
Bug Fixes¶
-
pandas: Resolve indexing mismatch when filling empty error lists (
ea89997) -
sql: Ensure proper table identifier formatting and import structure (
cb2de80)
Build System¶
- deps: Update Python version to 3.10 and bump dependencies
(
9f9d32c)
Chores¶
-
Clean up pyproject.toml dependencies and extras (
571aa3b) -
Remove legacy monolithic engine (
5c96bcc) -
Remove legacy monolithic engine files (
a21b217) -
cleanup: Remove empty baseline and sinks modules (
53fffbf) -
tests: Remove outdated v1.3.0 test suite (
d7b2f21)
Code Style¶
-
Apply consistent code formatting across all modules (
7db9778) -
Fix formatting and imports across sink modules (
aac1e9f) -
Fix import ordering and remove unused imports across codebase (
01d6c59) -
Format code with consistent imports and line breaks (
cbff8e0) -
Normalize quotes and formatting across codebase (
bbd15a3)
Documentation¶
-
Update module docstrings and implement lazy engine loading (
85aafaa) -
init: Shorten top-level engine exports comment (
30d5173)
Features¶
-
Add auto-profiler, schema validation, and engine capabilities (
6a47750) -
Add AWS Athena SQL-based validation engine (
7a2b018) -
Add CSV rule loader and SQL generators (
32bee27) -
Add DataFrame accessors to ValidationReport (
5c426c5) -
Add distributed Dask engine with lazy evaluation (
976f2f3) -
Add get_validation_sql helper to BigQuery and DuckDB engines (
f304dc3) -
Add OpenMetadata sink with SinkProtocol (
fe4a940) -
Add PyFlink streaming validation engine (
e62d2d5) -
Add PySpark engine with pure Column API (zero UDFs) (
f488d47) -
Add rule validation and metadata preservation (
0722918) -
Add Snowflake engine with pure SQL validation (
5c5d416) -
Add SQL generator for Flink validation queries (
15df523) -
Add SQLCore-based BigQuery engine (
00b6b23) -
Add Trino, Redshift, and Doris SQL engines (
13a3d3e) -
Complete analyzers with all date, comparison, and aggregation rules (
02616fe) -
Complete README rewrite and project restructure for v2.0 release (
6ff4d39) -
Expand PyFlink validation engine to full feature parity (
c5840d4) -
Implement DuckDB engine with SQLCore validation (
8aef6a7) -
duckdb: Add data bifurcation and standardize code formatting (
b64bbc1) -
exporters: Add IExporter protocol for pure metadata formatting (
4866921) -
init: Replace dynamic engine loading with explicit imports for better IDE support (
cfc3859) -
polars: Add Polars engine with optimized bulk error aggregation (
6b337ef) -
sumeh: V2.0 rewrite with analyzer/constraint architecture (
cc05b17)
Performance Improvements¶
- Optimize pandas and polars engines with bulk error aggregation
(
bba5d81)
Refactoring¶
-
Remove OOM-prone ID collection from analyzers (
d5b6401) -
Simplify Ray Data engine implementation (
4e753d9) -
Split CLI into modular command files (
29253e3) -
core: Consolidate protocols and remove dead code (
ef6b43e) -
core: Rename RuleDef to RuleDefinition for consistency (
c54432c) -
core: Reorganize core module structure for better maintainability (
79a3298) -
engines: Simplify engine package exports (
deb9c45) -
models: Remove duplicate SinkResult class (
a8579bb) -
polars: Use consistent timestamp and simplify error aggregation (
c57fd5e) -
profiler: Clean up DataProfiler with better docs and safety checks (
3ba2565)
v1.3.0 (2025-10-16)¶
Bug Fixes¶
- Dask dependancie error
(
f8c1b62)
Code Style¶
-
Improve readability with standardized line breaks and spacing (
a220071) -
Standardize and clean up import ordering (
99cc1ae)
Continuous Integration¶
- Simplify Poetry install by using
--all-extrasin publish workflow (a8d59b3)
Features¶
-
Add full BigQuery table-level validation support and unify rule model across engines (
af58f5c) -
Enhance DuckDB detection and refactor Pandas date checks (
39a64ca) -
Enhance rule parsing and standardize rule usage in engines (
945ae8b) -
Refactor validation engine to use RuleDef model and fix ambiguity issues (
808ff22) -
Standardize aggregation checks and implement multi-level validation (
bc56830) -
Unify table-level validation engine interface across all backends (
e1234fa) -
core: Implement Dispatcher pattern for core modules (
13a4349) -
duckdb: Enhance validation dispatchers, add robust error handling & input checks** (
82c4274)
Refactoring¶
-
Clean up and organize imports across core modules and engines (
7953b20) -
Introduce RuleDef model and registry for configuration (
90522e5) -
Remove obsolete extract_params test and align test suite with current codebase_ (
5bfc07b) -
Standardize code formatting and improve error handling in BigQuery engine (
20c994a) -
Unify and modernize configuration dispatchers with clear, consistent API (
8915815) -
Unify date validation aliases across all engines for consistency (
7becdd5) -
cli: Migrate CLI implementation from argparse to Typer (
e161748) -
pyspark: Standardize validation functions and remove legacy logic (
fc03b78)
v1.2.0 (2025-10-09)¶
Chores¶
- deps: Update AWS, caching, and core dependencies
(
e6821e0)
Documentation¶
- Show private members in MkDocs API documentation
(
3f447e5)
Features¶
-
bigquery: Implement native Data Quality validation and summarization (
eeaf615) -
bigquery: Rewrite validation to use 100% SQLGlot and improve docs (
5ad0d7f)
v1.1.0 (2025-10-08)¶
Features¶
- schema: Decouple schema extraction and improve validation output
(
852c36b)
Refactoring¶
- core, duckdb: Minor cleanup and improved schema error formatting
(
cfbb695)
v1.0.1 (2025-10-08)¶
Bug Fixes¶
- Correctly parse field lists and handle complex string inputs
(
66a5a39)
v1.0.0 (2025-10-08)¶
Bug Fixes¶
- Sync version numbers with latest release tag
(
b65b420)
v1.0.0-rc.1 (2025-10-07)¶
Bug Fixes¶
- engines: Correct inverse logic for comparison validation functions
(
64dd3da)
Build System¶
- Update pyproject.toml with complete metadata
(
03f4fd2)
Continuous Integration¶
-
Adopt Trusted Publishers for PyPI deployment and refactor release flow (
e507717) -
Fix on ci/cd deployment (
420454f) -
config: Add python-semantic-release configuration (
95b3113) -
workflow: Configure conditional PyPI publishing for releases (
72f3bb6)
Documentation¶
-
Improve configuration examples and workflow clarity (
bf09a5f) -
Update documentation structure following module refactoring (
f938382)
Features¶
-
Add Schema Validation feature and various data source support (
4415c92) -
Centralized schema definition using Schema Registry (
53ee185) -
Implement interactive Streamlit dashboard for validation results (
7c9804a) -
Introduce Databricks rule source and refine configuration methods (
03ef55c) -
ci: Major package refactoring, automate PyPI publishing, and enhance SQL connections (
69bd9c7) -
cli: Add SQL DDL generation for 8 database dialects (
82ca12c) -
dashboard: Rework Streamlit dashboard with advanced visuals and filters (
64dd3da)
Refactoring¶
-
General code cleanup and API simplification (
99368aa) -
Make schema lookup flexible and enhance security checks (
ee2b41d) -
core, cli: Introduce core utility modules and prepare for 'validate' command (
7911465) -
core, config: Standardize config/schema API and enforce required parameters (
75614bc)
v0.3.0 (2025-05-16)¶
Bug Fixes¶
- dask_engine: Invert validation logic to flag non-compliant records
(
2a76fe7)
Code Style¶
-
Apply code formatting and cleanup across core and engine files (
dd0a0ad) -
Clean up whitespace and formatting in test files (
aa47a97)
Documentation¶
-
Complete Pandas engine docstrings and enhance core module documentation (
65d5110) -
Enhance documentation and reorganize validation rules (
0873ca3) -
polars_engine: Add comprehensive docstrings for data quality functions (
dcc93c5)
Features¶
-
Add 'is_in' and 'not_in' rule aliases to engines (
b522cb0) -
Add comprehensive date and numeric validation functions to pandas engine (
c23d513) -
Add date and numeric validation functions to Polars engine (
2675e84) -
Improve date and numeric validation rules in DuckDB engine (
6de088c) -
dask: Implement numeric threshold and detailed date/weekday validation rules (
522d332) -
duckdb: Implement numeric threshold and detailed date/weekday validation rules (
8112e96) -
pandas: Add new engine for Pandas DataFrames with comprehensive rule support (
d31234b) -
pyspark: Implement numeric threshold and detailed date validation rules (
073c0ad)
v0.2.6 (2025-05-16)¶
Documentation¶
- Add docstrings for date validation rules in Dask and DuckDB engines
(
42cb80a)
Features¶
-
dask: Implement date validation rules and add dedicated tests (
0a719d4) -
duckdb: Implement date and additional validation rules (
d45afc8) -
polars: Implement multiple validation rules and enhance documentation (
fc83ae6)
v0.2.5 (2025-05-16)¶
Documentation¶
- Update README with logo path and completed tasks
(
85fcc94)
v0.2.4 (2025-05-16)¶
Chores¶
- Version
(
92fa3c5)
Features¶
- Add quickstart guide and list supported validation rules
(
d307c85)
v0.2.0 (2025-04-29)¶
- Initial Release