← Back to landing

CRM Project — Source Files Details

CRM Project — Source Files Details

Purpose

This document defines the input CSV files used by the CRM project:

These files are expected to be stored in the project raw input folder (landing area) and serve as the authoritative source for downstream calculations (segments, activity status, events, etc.).


1) Sales Transactions (CSV)

File naming convention

One file per year.

Examples

What it represents

Transactional sales records at invoice-line level (one row per invoice_id × product_id). This file family is the primary behavioral source used to derive consumable behavior, consumption, lifecycle, and event logic.

Grain

Columns

Column Description Format / Type Notes
invoice_id Business invoice identifier Text / Integer Not a primary key; duplicates may exist depending on source rules
customer_id Customer identifier Text / Integer Foreign key → Customers master (customer_id)
invoice_date Invoice date YYYY-MM-DD (ISO) Date only (no time)
product_id Product identifier Text / Integer Foreign key → Products master (product_id)
quantity Quantity sold Numeric Negative values represent returns and are excluded from CRM aggregates (see crm_calculation_logic.md §6.3).
revenue Net amount (revenue) Numeric Net revenue amount per invoice line. Retained but not used by CRM logic; CRM measures volume in units.
store_id Store identifier Text / Integer Optional, but retained

Notes


2) Customers Master (CSV)

File name (example)

customers_master.csv

What it represents

Customer reference data (stable attributes used for segmentation, contactability, and grouping).

Grain

Columns

Column Description Format / Type Notes
customer_id Customer identifier Text / Integer Primary key
customer_name Customer name Text
customer_group Customer group / cluster Text Used for grouping/analysis
city City Text
created_date Customer creation date YYYY-MM-DD (ISO)
email Email address Text Nullable
mobile_number Mobile phone number Text Nullable; store as text to preserve leading zeros/formatting
opt_email Marketing consent for email Boolean / 0-1 Nullable allowed; define default handling
opt_sms Marketing consent for SMS Boolean / 0-1 Nullable allowed; define default handling
opt_phone Marketing consent for phone calls Boolean / 0-1 Nullable allowed; define default handling

3) Products Master (CSV)

File name (example)

products_master.csv

What it represents

Product reference data required to classify transactions (e.g., consumable vs non-consumable via category rules) and compute consumption.

Grain

Columns

Column Description Format / Type Notes
product_id Product identifier Integer Primary key
product_name Product name Text
brand Brand Text
category Product category Text Allowed values: device, consumable, accessories, spare_parts. CRM logic operates on consumable; device is used only for the First Device Purchase Date KPI.
unit_size Volume / pack size per unit (units) Numeric Used to translate quantity into the units measure consumed by all CRM aggregates. Required for all consumable rows.

4) File Relationships

Keys