Package: rsdv 0.2.0

rsdv: Synthetic Tabular Data Generation with Gaussian Copulas

Generates synthetic tabular data from real datasets using Gaussian copula models, with parametric marginal selection for numerical columns and a cumulative-frequency embedding that brings categorical and boolean columns into the same joint copula. Includes a metadata system with column types and primary keys, declarative constraints enforced via rejection sampling, conditional sampling, and quality, validity and privacy reports modeled on those of the 'SDMetrics' library. Inspired by the Python 'SDV' (Synthetic Data Vault) library by 'DataCebo'; see Patki, Wedge and Veeramachaneni (2016) "The Synthetic Data Vault" <doi:10.1109/DSAA.2016.49>.

Authors:Kailas Venkitasubramanian [aut, cre]

rsdv_0.2.0.tar.gz
rsdv_0.2.0.zip(r-4.7)rsdv_0.2.0.zip(r-4.6)rsdv_0.2.0.zip(r-4.5)
rsdv_0.2.0.tgz(r-4.6-any)rsdv_0.2.0.tgz(r-4.5-any)
rsdv_0.2.0.tar.gz(r-4.7-any)rsdv_0.2.0.tar.gz(r-4.6-any)
rsdv_0.2.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
rsdv/json (API)
NEWS

# Install 'rsdv' in R:
install.packages('rsdv', repos = c('https://kvenkita.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/kvenkita/rsdv/issues

Pkgdown/docs site:https://kvenkita.github.io

Datasets:

On CRAN:

Conda:

4.48 score 1 stars 30 exports 38 dependencies

Last updated from:0f20dbd496. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK142
source / vignettesOK259
linux-release-x86_64OK152
macos-release-arm64OK133
macos-oldrel-arm64OK145
windows-develOK127
windows-releaseOK84
windows-oldrelOK112
wasm-releaseOK112

Exports:add_constraintattribute_disclosure_riskcheck_constraintcheck_constraintscontingency_similaritycorrelation_similaritycustom_constraintdiagnostic_reportequality_constraintfitfixed_combinations_constraintgaussian_copula_synthesizerinequality_constraintis_fittedks_similarityload_metadatametadatametadata_from_jsonmetadata_to_jsonml_efficacynndrprivacy_reportquality_reportsamplesample_conditionssave_metadataset_column_typeset_primary_keytvd_similarityvalidate_data

Dependencies:ADGofTestcliclustercolorspacecopulacpp11farverFNNgenericsggplot2gluegslgtableisobandjsonlitelabelinglatticelifecyclemagrittrMatrixmvtnormnumDerivpcaPPpillarpkgconfigpsplineR6RColorBrewerrlangrpartS7scalesstabledisttibbleutf8vctrsviridisLitewithr

Getting Started with rsdv: A Practitioner's Guide to Synthetic Data Generation

Rendered fromgetting-started.Rmdusingknitr::rmarkdownon Jun 09 2026.

Last update: 2026-06-08
Started: 2026-05-25

Migrating from synthpop

Rendered frommigrating-from-synthpop.Rmdusingknitr::rmarkdownon Jun 09 2026.

Last update: 2026-05-27
Started: 2026-05-25

Readme and manuals

Help Manual

Help pageTopics
Add a constraint to metadataadd_constraint
Adult Income dataset (500-row sample)adult_income
Attribute disclosure riskattribute_disclosure_risk
Plot a diagnostic reportautoplot.rsdv_diagnostic_report
Plot a privacy reportautoplot.rsdv_privacy_report
Plot a quality reportautoplot.rsdv_quality_report
Check a single constraint against each row of a data framecheck_constraint
Check all constraints in metadata against a data framecheck_constraints
Contingency similarity between real and synthetic categorical column pairscontingency_similarity
Correlation similarity between real and synthetic numerical column pairscorrelation_similarity
Constraint: arbitrary row-wise predicatecustom_constraint
Generate a diagnostic (validity) report for synthetic datadiagnostic_report
Constraint: two columns must be equal row-wiseequality_constraint
Constraint: only observed column combinations are validfixed_combinations_constraint
Create a Gaussian Copula synthesizergaussian_copula_synthesizer
Constraint: col_a must be less than / greater than col_binequality_constraint
Check whether a synthesizer has been fittedis_fitted
Kolmogorov-Smirnov similarity score per numerical columnks_similarity
Load metadata from a JSON fileload_metadata
Create a metadata object describing a dataset's column typesmetadata
Deserialize metadata from a JSON stringmetadata_from_json
Serialize metadata to a JSON stringmetadata_to_json
ML efficacy: train-on-synthetic / test-on-real accuracy ratio (TSTR)ml_efficacy
Nearest-Neighbor Distance Ratio privacy scorenndr
Print method for a custom_constraintprint.custom_constraint
Print method for an equality_constraintprint.equality_constraint
Print method for a fixed_combinations_constraintprint.fixed_combinations_constraint
Print method for an inequality_constraintprint.inequality_constraint
Print method for rsdv_diagnostic_reportprint.rsdv_diagnostic_report
Print method for rsdv_metadataprint.rsdv_metadata
Print method for rsdv_privacy_reportprint.rsdv_privacy_report
Print method for rsdv_quality_reportprint.rsdv_quality_report
Generate a privacy report comparing real and synthetic dataprivacy_report
Generate a quality report comparing real and synthetic dataquality_report
Sample synthetic rows from a fitted synthesizersample
Sample synthetic rows that match fixed column values (conditional sampling)sample_conditions
Save metadata to a JSON filesave_metadata
Set the type of a column in metadataset_column_type
Set the primary key column of the metadataset_primary_key
Total variation distance similarity score per categorical columntvd_similarity
Validate that a data frame is compatible with metadatavalidate_data