rsdv - Synthetic Tabular Data Generation with Gaussian Copulas
Generates synthetic tabular data from real datasets using
Gaussian copula models, with parametric marginal selection for
numerical columns and a cumulative-frequency embedding that
brings categorical and boolean columns into the same joint
copula. Includes a metadata system with column types and
primary keys, declarative constraints enforced via rejection
sampling, conditional sampling, and quality, validity and
privacy reports modeled on those of the 'SDMetrics' library.
Inspired by the Python 'SDV' (Synthetic Data Vault) library by
'DataCebo'; see Patki, Wedge and Veeramachaneni (2016) "The
Synthetic Data Vault" <doi:10.1109/DSAA.2016.49>.