Data Analytics Academy

Dataset — Olist Brazilian E-commerce

What it is

Real anonymized data from Olist, a Brazilian e-commerce marketplace. ~100,000 orders placed between 2016 and 2018, with customers, sellers, products, payments, and customer reviews.

Source: Kaggle — olistbr/brazilian-ecommerce License: CC BY-NC-SA 4.0 (non-commercial, share-alike) Size: ~135 MB unzipped

Why we chose it

  • 8 related tables — makes Day 2 (SQL joins) meaningful instead of toy
  • ~100K orders — fits in Excel (well under the 1M row limit)
  • Free-text reviews — gives Day 5 (Claude Code) something genuinely AI-shaped to do
  • Real business problems — delivery delays, payment mix, seller performance, review drivers
  • Well-documented schema — easy to onboard students

Table overview

TableRows (approx)What it contains
olist_orders_dataset99,441One row per order: status, purchase/delivery timestamps
olist_order_items_dataset112,650One row per line item: product, seller, price, freight
olist_order_payments_dataset103,886Payment method, installments, value
olist_order_reviews_dataset99,224Review score (1–5) and free-text comments
olist_customers_dataset99,441Customer location (state, city, zip prefix)
olist_sellers_dataset3,095Seller location
olist_products_dataset32,951Product category, dimensions, weight
product_category_name_translation71Portuguese → English category names

Schema diagram is in the Kaggle page linked above.

How to get the data

Step 1 — download the CSVs

bash data/olist/download.sh

This downloads the Kaggle zip and unpacks all CSVs into data/olist/. The CSV files themselves are gitignored — every student downloads them locally.

If the kaggle CLI is not set up, the script prints manual download instructions.

Step 2 — load into SQLite (do this before Day 2)

Day 2 (SQL) uses SQLite. Build a single olist.db database file from the CSVs in one command:

cd data/olist
sqlite3 olist.db < load_into_sqlite.sql

You should see row counts print at the end (~99K orders, ~112K items, etc.). If you see “Error: no such file” for any CSV, re-run the download in Step 1.

After this you have data/olist/olist.db — that’s the database used by every Day 2 exercise and the SQL capstone. Open it with:

  • The CLI: sqlite3 data/olist/olist.db
  • A GUI (recommended for beginners): DB Browser for SQLite — free, runs on Mac/Windows/Linux, looks like Excel for databases.