GenAI-Evaluation: New Open Source Python Library Now Available
2 min readSep 22, 2023
Tested on multiple public data sets with my own NoGAN synthesizers (1000x faster and consistently better than solutions offered by synthetic data companies), this Python library implements the best evaluation metric to compare your synthetic data with the real data it is supposed to mimic.
A few highlights:
- First implementation of the multivariate Kolmogorov-Smirnov distance in any dimension, for categorical or numerical features, or a mix of both.
- Fast, returning results in a few seconds. The minimum value is zero (best fit), the maximum is one (worst fit). Thus, easy to interpret.
- Outperforms all other evaluation metrics currently implemented by vendors. Will correctly identify poor synthetizations even on the very challenging “circle dataset” pictured below.
- Adjusted for the number of features (the dimension). Produce a comparison scatterplot easy to interpret, regardless of dimension, see bottom picture below.
- Also returns the multivariate ECDF (empirical distribution) attached to your datasets, synthetic and real. Generalizing the unidimensional ECDF function available in Python, to any dimension. Based on multivariate quantiles.
- Free and easy to install with “pip install genai-evaluation”.
To learn more, see use cases and for documentation, visit the reference page.
First illustration: circle dataset
Second illustration: evaluation scatterplot, Telecom dataset