v0.2 Now with marketing spend, support tickets and NPS

Synthetic retail data that feels real.

Generate hundreds of thousands of rows of realistic, multi-year retail data — across nine fully-relational tables — in seconds. AdventureWorks-style schema, scientifically grounded customer behavior, fully reproducible.

▶ Get started View on GitHub →

scripts-and-tables.github.io/erp-synthetic-data-generator

Eleven years of monthly revenue unfolding frame by frame, with November and December holiday spikes flaring up as they arrive

Sales pulse: $2.20M last 12 months, with year-over-year growth and monthly revenue including holiday spikes

Cohort health: 1,000 customers across 6 behavioral segments with sticky cohort assignment

Markets out of the box: US, GCC, EU presets with currency, VAT rate, weekend definition, and holidays

why teams pick erp-synth

Made for analytics teams that need data yesterday.

Six product decisions that turn synthetic data from "dummy CSVs" into a production-grade dataset you can train models on, ship demos with, or teach from.

⚡

Massive volume, instantly

Generate hundreds of thousands of sales lines across nine relational tables in seconds. Streaming output keeps memory flat — push to millions of rows on a laptop without breaking a sweat.

★

Scientifically grounded

Cohort behavior anchored in buy-till-you-die customer-base models from the marketing-science literature. Discrete choice from McFadden's Nobel-winning framework. Schema follows Microsoft AdventureWorks. Ten cited papers.

⚙

Highly customizable

Twenty-plus controls over market, seasonality, inflation rate, returns probability, promotion frequency, customer field completeness, basket composition. Every realism axis has a knob.

↗

Unlimited data export

Plain CSV. Works with pandas, Excel, Power BI, Tableau, dbt, DuckDB, Postgres, Snowflake — anything that reads a delimited file. No vendor lock-in. No row caps. No throttling.

🌍

Multi-market by design

US, GCC, and EU presets out of the box. Locale, currency, VAT, weekend (Sat/Sun vs Fri/Sat), payment methods, plus market-specific holiday calendars (Black Friday, Christmas, Ramadan, Eid, Boxing Week) — all swappable.

⟲

Reproducible to the byte

One seed value drives every random number in the system. Run twice → identical CSVs, byte-for-byte. Continuous integration verifies this on every push. Your demos won't drift.

three steps

From zero to production-grade data in one workflow.

No accounts, no API keys, no cloud setup, no NDAs. Pure Python, pure CSV out, pure determinism.

⚙

STEP 01

Configure

Pick a market — US, GCC or EU. Set a date range. Decide how many customers, products, stores, and promotions you want. Every realism axis comes with a sensible default and an override.

⚡

STEP 02

Generate

Realistic, multi-year retail data fans out across nine fully-relational tables in seconds. Streaming output keeps memory flat, so generating millions of rows on a laptop is routine.

↗

STEP 03

Analyze

Plug straight into pandas, Power BI, Tableau, dbt, DuckDB, your warehouse — anything that reads CSV. No vendor lock-in, no row caps. The data is yours, forever, byte-reproducible.

→ Developers, jump to the technical quick-start

used for

Built for the work you actually do.

Whether you're prototyping a churn model, demoing a dashboard, teaching SQL, or staging a data pipeline — erp-synth gives you data that looks and behaves like the real thing, without an NDA.

BI & analytics dashboards ML training & evaluation Cohort & churn modeling Data engineering portfolios Pricing & promo experiments dbt & warehouse staging SQL teaching material Synthetic-data benchmarks Vendor demo datasets RFM segmentation CAC & LTV analysis NPS & CSAT pipelines