v0.2 Now with marketing spend, support tickets and NPS

Synthetic retail data that feels real.

Generate hundreds of thousands of rows of realistic, multi-year retail data โ€” across nine fully-relational tables โ€” in seconds. AdventureWorks-style schema, scientifically grounded customer behavior, fully reproducible.

scripts-and-tables.github.io/erp-synthetic-data-generator
Eleven years of monthly revenue unfolding frame by frame, with November and December holiday spikes flaring up as they arrive
Sales pulse: $2.20M last 12 months, with year-over-year growth and monthly revenue including holiday spikes
Cohort health: 1,000 customers across 6 behavioral segments with sticky cohort assignment
Markets out of the box: US, GCC, EU presets with currency, VAT rate, weekend definition, and holidays
100k+
Rows in seconds
9
Relational tables
11 yrs
Default horizon
3
Markets preset
6
Sticky cohorts
100%
Reproducible
why teams pick erp-synth

Made for analytics teams that need data yesterday.

Six product decisions that turn synthetic data from "dummy CSVs" into a production-grade dataset you can train models on, ship demos with, or teach from.

โšก

Massive volume, instantly

Generate hundreds of thousands of sales lines across nine relational tables in seconds. Streaming output keeps memory flat โ€” push to millions of rows on a laptop without breaking a sweat.

โ˜…

Scientifically grounded

Cohort behavior anchored in buy-till-you-die customer-base models from the marketing-science literature. Discrete choice from McFadden's Nobel-winning framework. Schema follows Microsoft AdventureWorks. Ten cited papers.

โš™

Highly customizable

Twenty-plus controls over market, seasonality, inflation rate, returns probability, promotion frequency, customer field completeness, basket composition. Every realism axis has a knob.

โ†—

Unlimited data export

Plain CSV. Works with pandas, Excel, Power BI, Tableau, dbt, DuckDB, Postgres, Snowflake โ€” anything that reads a delimited file. No vendor lock-in. No row caps. No throttling.

๐ŸŒ

Multi-market by design

US, GCC, and EU presets out of the box. Locale, currency, VAT, weekend (Sat/Sun vs Fri/Sat), payment methods, plus market-specific holiday calendars (Black Friday, Christmas, Ramadan, Eid, Boxing Week) โ€” all swappable.

โŸฒ

Reproducible to the byte

One seed value drives every random number in the system. Run twice โ†’ identical CSVs, byte-for-byte. Continuous integration verifies this on every push. Your demos won't drift.

three steps

From zero to production-grade data in one workflow.

No accounts, no API keys, no cloud setup, no NDAs. Pure Python, pure CSV out, pure determinism.

โš™
STEP 01

Configure

Pick a market โ€” US, GCC or EU. Set a date range. Decide how many customers, products, stores, and promotions you want. Every realism axis comes with a sensible default and an override.

โšก
STEP 02

Generate

Realistic, multi-year retail data fans out across nine fully-relational tables in seconds. Streaming output keeps memory flat, so generating millions of rows on a laptop is routine.

โ†—
STEP 03

Analyze

Plug straight into pandas, Power BI, Tableau, dbt, DuckDB, your warehouse โ€” anything that reads CSV. No vendor lock-in, no row caps. The data is yours, forever, byte-reproducible.

used for

Built for the work you actually do.

Whether you're prototyping a churn model, demoing a dashboard, teaching SQL, or staging a data pipeline โ€” erp-synth gives you data that looks and behaves like the real thing, without an NDA.

BI & analytics dashboards ML training & evaluation Cohort & churn modeling Data engineering portfolios Pricing & promo experiments dbt & warehouse staging SQL teaching material Synthetic-data benchmarks Vendor demo datasets RFM segmentation CAC & LTV analysis NPS & CSAT pipelines

Ready to see the data?

One workflow, a few seconds, hundreds of thousands of rows. Or browse the technical deep-dive for the schema, cohort math, and bibliography.