Changes in version 0.2.0 A correctness and robustness release driven by a code review of 0.1.0 (see issue #12 for the full catalogue). Two changes alter previously-returned numeric output and are called out separately below. ⚠ Default-output changes (potentially breaking) - nndr() now standardises (z-scores) each numerical column by the real-data mean and standard deviation before the nearest-neighbour distance is computed. Without this, a single large-scale column (e.g. income in dollars) dominated the Euclidean distance and the score moved with measurement units rather than with row similarity. Pass normalize = FALSE to recover the previous behaviour exactly. - correlation_similarity() and contingency_similarity() now return score = NA_real_ (rather than 1) when there are fewer than two columns of the relevant type, and diagnostic_report() returns NA_real_ per column when the synthetic column is entirely NA. Aggregated property scores in quality_report() / diagnostic_report() skip these NAs (na.rm = TRUE) so they no longer overstate fidelity with a synthetic "1" where there is no signal to measure. New features - equality_constraint() gains a tolerance argument: with tolerance > 0 on numeric columns, the check is abs(a - b) <= tolerance instead of exact ==. Default 0 preserves prior behaviour. - custom_constraint() gains a vectorized argument: when TRUE, the predicate is called once with the whole data frame instead of once per row. Substantially faster on large synthetic samples for vectorisable predicates. - ml_efficacy() gains a seed argument for reproducible train/test splits. The caller's global RNG state is restored on exit, so callers using set.seed() elsewhere are unaffected. - nndr() gains a normalize argument (default TRUE) — see the default-output note above. - print() methods for equality_constraint, inequality_constraint, fixed_combinations_constraint, and custom_constraint. Robustness and correctness - metadata_to_json() / metadata_from_json() now round-trip the structural constraint types (equality, inequality, and fixed_combinations). custom_constraint cannot be serialised — it holds an R closure — and is dropped with a warning. Previously metadata_to_json() crashed on any constraint, so save_metadata() was effectively broken for non-trivial metadata. - check_constraint.equality_constraint and check_constraint.inequality_constraint now return FALSE (not NA) for rows containing NA. This prevents NA from propagating into the row selector used by sample()'s rejection loop, which previously inserted phantom NA-only rows. - sample_conditions() now honours metadata constraints alongside the user-supplied conditions (previously it filtered only on the conditions). - tvd_similarity() now strips NAs from both sides and divides by the non-NA count on each side; previously NA-padding inflated TVD. - ks_similarity() now suppresses the ks.test() "p-value will be approximate in the presence of ties" warning, which it leaked to users on any tied integer column (very common in tables with integer ages, capital gains, etc.). - fixed_combinations_constraint now uses a collision-free length-prefix key encoding (":"), removing a theoretical separator collision in the previous paste-based comparison. Clearer errors and validations - fit.gaussian_copula_synthesizer() errors clearly when a modeled column is entirely NA or when no row is complete across all modeled columns. Previously the user saw a cryptic 'dim' must be an integer (>= 2) from inside copula::normalCopula. - ml_efficacy() validates target_col (must be a column of real) and test_fraction (must be strictly between 0 and 1) up front. - attribute_disclosure_risk() validates that known_cols are present and numeric (one-hot encode categorical knowns first); previously triggered a cryptic FNN::knnx.index error. - gaussian_copula_synthesizer() cross-checks numerical_distributions names against the metadata's numerical columns; silently-ignored typos like list(capitl_gain = "gamma") now raise a clear error. - sample_conditions() validates that .n values are positive whole numbers (was silently truncating or accepting negatives). - privacy_report() errors when only one of sensitive_col / known_cols is supplied (previously silently dropped disclosure-risk computation). - set_primary_key() emits an advisory warning when the column's metadata type is not "id", since the column would otherwise be modeled as ordinary data and the diagnostic key-uniqueness check would typically fail. Documentation - set_column_type() docstring documents the level-ordering rule for categorical columns — factor keeps levels() order, character is sorted lexicographically (c("2", "10") becomes levels c("10", "2")). Changes in version 0.1.0 (2026-06-08) Initial CRAN release. Synthesizer - Gaussian copula synthesizer (gaussian_copula_synthesizer()) that fits a single joint copula over all modeled columns: numerical, categorical, and boolean. - Parametric marginal fitting for numerical columns with best-fit selection among norm, beta, gamma, truncnorm, and uniform by Kolmogorov-Smirnov distance. Per-column overrides via numerical_distributions; global default via default_distribution. - Categorical and boolean columns are embedded into the copula via their cumulative-frequency intervals, preserving cross-column dependence (numeric↔categorical and categorical↔categorical). - sample() for unconditional generation and sample_conditions() for conditional generation on categorical or boolean values via rejection sampling. - Per-column missingness rates from training data are reproduced in synthetic output. Metadata and constraints - Column-type metadata system (metadata(), set_column_type(), set_primary_key()) with auto-detection and JSON serialization (metadata_to_json(), save_metadata()). - Declarative constraint system: equality, inequality, fixed-combinations, and custom row-level predicates (add_constraint(), check_constraints()), enforced via rejection sampling. Evaluation - quality_report() aggregates metrics into the two-property hierarchy used by the Python SDMetrics library: - Column Shapes — per-column marginal fidelity (KS similarity for numerical, TVD similarity for categorical). - Column Pair Trends — pairwise dependence (correlation_similarity() for numerical pairs, contingency_similarity() for categorical pairs). ML efficacy (train-on-synthetic / test-on-real, TSTR/TRTR) is reported separately, not folded into the overall score. - diagnostic_report() checks structural validity: boundary adherence (numerical ranges), category adherence (categorical values), and key uniqueness for primary keys. - privacy_report() reports the nearest-neighbour distance ratio (NNDR) and, optionally, attribute disclosure risk. - autoplot() methods for quality, diagnostic, and privacy reports. Data - Bundled dataset adult_income — a 500-row sample of the UCI Adult Income dataset used in examples and vignettes. Vignettes - "Getting Started with rsdv" — practitioner-oriented guide covering metadata, fitting, conditional sampling, quality and diagnostic reports, privacy evaluation, constraints, and missing-data handling. - "Migrating from synthpop" — side-by-side comparison and feature table.