Module 2 · Lesson 2.14

Power Query Basics: Get & Transform Data

Edit on GitHub

If you've ever built an ETL pipeline in Python (pandas.read_csv → clean → join → export) or an SQL view that joins three source tables and renames columns, you already know the mental model behind Power Query. Power Query is Excel's built-in ETL engine: connect to a source, define a sequence of transformation steps, and reload — and every step is recorded so the next refresh re-runs the same pipeline against fresh data. Where Flash Fill (lesson 2.1) is a one-time text fix and VLOOKUP is a single-formula join, Power Query is the durable, repeatable layer that sits between raw data and your analysis.


Part 1: Why Power Query?

A Power Query workflow has three distinguishing properties:

  • Repeatable — Each transformation is recorded as a step. When the source data updates, click Refresh and every step re-runs in order. No manual re-cleaning.
  • Source-agnostic — The same transformation steps work whether the source is a CSV, a SQL Server table, a SharePoint list, a folder full of Excel files, or a web table. You write the cleanup once and swap the source freely.
  • Non-destructive — The original data file is never modified. Power Query reads it, applies steps in memory, and loads the result into a new Table or PivotTable in your workbook.
  • Scales beyond the grid — Power Query can process millions of rows during transformation, even if the final output you load is filtered down to fit Excel's 1,048,576-row limit.
NOTE
Power Query is the same engine that powers **Power BI's Get Data**. Skills you build in Excel transfer directly to Power BI, and `.pbix` files can import queries from `.xlsx` files without rewriting them.

Part 2: Entry Points — Where to Start a Query

All Power Query commands live on the Data tab, in the Get & Transform Data group.

Get Data (External Sources)

Data > Get Data opens a categorised dropdown:

  • From File — Excel Workbook, CSV/Text, XML, JSON, PDF, or an entire Folder of files combined into one table.
  • From Database — SQL Server, Oracle, MySQL, PostgreSQL, Access, and others.
  • From Online Services — SharePoint, OneDrive, Salesforce, Microsoft Dataverse, etc.
  • From Other Sources — Web (scrape an HTML table), ODBC, OLE DB, blank query, blank table.

From Table/Range

Data > From Table/Range loads the current selection into Power Query. If the selection is not yet an Excel Table, Excel prompts you to convert it (Ctrl + T) first.

Recent Sources / Existing Connections

Data > Recent Sources and Data > Existing Connections reuse previously defined connections without re-entering credentials.

Data tab
└── Get & Transform Data group
    ├── Get Data           (any external source)
    ├── From Text/CSV
    ├── From Web
    ├── From Table/Range   (current data)
    └── Recent Sources

Part 3: The Power Query Editor

Selecting any source opens the Power Query Editor — a separate window with its own ribbon. This is where the actual transformation work happens.

Layout

Pane Purpose
Queries pane (left) Lists every query in the workbook. Right-click for rename, duplicate, reference, group.
Data preview (centre) A grid showing the first ~1,000 rows after the currently selected step. Updates live as you add steps.
Query Settings (right) Two sections: Properties (query name) and Applied Steps — the ordered list of every transform applied.
Formula bar (top) Shows the M-language code for the currently selected step. Edit it directly to tweak parameters.

The Ribbon Tabs

  • Home — Refresh, Close & Load, Manage Columns, Reduce Rows (Keep/Remove), Sort, Group By, Merge Queries, Append Queries.
  • Transform — Operations that change existing columns: data types, splitting, replacing, formatting, pivot/unpivot, fill, statistics.
  • Add Column — Operations that create new columns: custom column, conditional column, index column, column from examples.
  • View — Toggle Formula Bar, Query Settings pane, and the Advanced Editor for raw M code.

Applied Steps — The Most Important Idea

Every action you take adds an entry to Applied Steps. Click any step to rewind the preview to that point. Click the gear icon next to a step to re-open its dialog and tweak parameters. Click the X to delete a step. Drag steps to reorder.

Applied Steps
1. Source                 (the original connection)
2. Promoted Headers
3. Changed Type
4. Removed Columns
5. Filtered Rows
6. Replaced Value
7. Added Custom Column
TIP
The Applied Steps list **is** your transformation pipeline. Treat it like commit history — small, well-named steps are easier to debug than one giant step. Right-click any step > **Rename** to give it a meaningful label like `Removed admin rows`.

Part 4: Common Transformations

Each transformation below is one click in the editor. Examples assume a small sales dataset.

Promote First Row to Headers

Home > Use First Row as Headers — If the file's first row contains column names that landed as a data row, this promotes them.

Before:               After:
Column1  Column2      Region   Amount
Region   Amount       North    1500
North    1500         South    2200

Change Data Type

Click the type icon left of the column header (123, ABC, calendar, etc.) and pick the correct type: Whole Number, Decimal Number, Currency, Date, Date/Time, Text, True/False.

IMPORTANT
Always set types **immediately after** Promoted Headers. Many later transforms (date math, sum aggregations) silently fail or produce wrong results if a numeric column is still typed as `Text`.

Remove Columns / Choose Columns

  • Home > Remove Columns — drop the currently selected column(s).
  • Home > Choose Columns — open a dialog where you tick exactly which columns to keep.

Prefer Choose Columns when the source schema may add columns later — keeping a known whitelist is more stable than blacklisting columns one by one.

Filter Rows

Click the filter dropdown on any column header (same icon as Excel's AutoFilter). The menu offers value pickers, text filters (Equals, Begins With, Contains), number filters (Greater Than, Between), and date filters (In the Previous, Year > Quarter > Month).

Filter: Region equals "North" and Amount > 1000

Replace Values

Home > Replace Values — Find/replace inside a single column. Supports a "Match entire cell contents" checkbox and (for advanced users) regex via M code.

Split Column

Home > Split Column:

  • By Delimiter — split John, Smith on , into two columns.
  • By Number of Characters — split a fixed-width string at character N.
  • By Positions — split at a list of character positions.
  • By Lowercase to Uppercase / Digit to Non-Digit — useful for splitting concatenated identifiers like ORD12345.

Trim and Clean Text

Transform > Format > Trim removes leading/trailing whitespace. Transform > Format > Clean removes non-printable control characters (\t, \r, etc.) — invaluable for data pasted out of PDFs.

Add Custom Column

Add Column > Custom Column — open a dialog and type an M expression that produces a new column.

New column name:  TotalWithTax
Formula:          = [Amount] * 1.0825

Group By

Home > Group By — equivalent to SQL's GROUP BY. Choose grouping columns and aggregations (Sum, Average, Min, Max, Count, Count Distinct, All Rows).

Group by: Region
Aggregations: Sum of Amount, Count of OrderId

Pivot Column / Unpivot Columns

  • Transform > Pivot Column — turn distinct values from one column into new column headers (wide format).
  • Transform > Unpivot Columns — turn columns into rows (long/tidy format). The opposite operation, and a daily lifesaver when source data has a column-per-month layout.
Unpivot:
Region   Jan   Feb   Mar         Region   Attribute   Value
North    100   120   140    →    North    Jan         100
South    200   220   210         North    Feb         120
                                 North    Mar         140
                                 South    Jan         200
                                 ...
TIP
Most Excel data lives in wide format ("one column per month"); most analytical tools want long format ("one row per observation"). **Unpivot Columns** is the bridge.

Part 5: Merge Queries (Joins)

Home > Merge Queries is Power Query's JOIN. Pick a left query, pick a right query, click the matching key column(s) on each side, choose a Join Kind, and click OK.

Join Kind SQL Equivalent Result
Left Outer (default) LEFT JOIN All left rows + matching right rows; nulls where right has no match
Right Outer RIGHT JOIN All right rows + matching left
Full Outer FULL OUTER JOIN All rows from both, nulls where no match
Inner INNER JOIN Only rows with a match on both sides
Left Anti LEFT JOIN ... WHERE right IS NULL Left rows that have no match on the right
Right Anti Inverse of Left Anti Right rows with no left match

After the merge, the joined query has a new column whose cells contain a nested table. Click the expand icon (double arrow) in the column header to pick which fields to bring through.

TIP
A **Left Anti** join is the cleanest way to find "customers in the master list who placed no orders this month" — no `COUNTIF`, no helper columns, no formulas. Just two queries and a join.
NOTE
**Merge Queries vs. VLOOKUP/XLOOKUP** — Merge is more powerful: multi-column keys, all six join types, and refresh-on-demand. Lookup formulas remain better when you need a single value inside a calculated column on existing Table data.

Part 6: Append Queries (Stack/Union)

Home > Append Queries is Power Query's UNION ALL. It stacks queries vertically.

  • Two Tables — append exactly two queries.
  • Three or more tables — append any number; Power Query matches columns by name, filling unmatched columns with null.

Use Append to combine monthly files that all share the same column layout (Sales_Jan, Sales_Feb, Sales_MarSalesAllYear). Better still, put the source files in a folder and use Get Data > From File > From Folder to combine and append in one operation.


Part 7: Close & Load Options

When the pipeline is ready, you ship the result back to the Excel grid via Home > Close & Load.

Close & Load (Default)

Loads the result as a new Excel Table on a new worksheet. The Table is connected to the query — refreshing the query rewrites the Table.

Close & Load To... (Custom)

Opens a dialog with three choices:

Load Mode What it does
Table New Excel Table on a chosen worksheet
PivotTable Report New PivotTable with the query as its source
PivotChart New PivotChart
Only Create Connection No grid output — the query is stored but produces no Table. Use this for staging queries that will be merged into others.

You can also tick Add this data to the Data Model to make the query available to Power Pivot and DAX measures.

IMPORTANT
**Only Create Connection** is the right choice for any query that exists purely to be merged or appended into another query. Loading every intermediate query as a Table clutters the workbook and wastes memory.

Part 8: Refreshing a Query

Once loaded, queries do not automatically rerun — they stay snapshot-static until you refresh.

  • Right-click the loaded Table > Refresh — refresh just this query.
  • Right-click the query in Queries & Connections pane > Refresh — same thing, from the side pane.
  • Data tab > Refresh All (Ctrl + Alt + F5) — refresh every query and every PivotTable in the workbook.

Open the Queries & Connections pane (Data > Queries & Connections) to see all queries with their last-refreshed timestamp and row counts.

To set a refresh schedule: right-click the query > Properties > tick Refresh every N minutes or Refresh data when opening the file.


Part 9: A Word About M (the Power Query Formula Language)

Behind every Applied Step is a line of code in Microsoft's M language (formal name: Power Query Formula Language). You can see it in the formula bar above the data preview, or open the full script via View > Advanced Editor.

let
    Source = Excel.Workbook(File.Contents("C:\Sales.xlsx"), null, true),
    Sheet1 = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    PromotedHeaders = Table.PromoteHeaders(Sheet1, [PromoteAllScalars=true]),
    ChangedType = Table.TransformColumnTypes(PromotedHeaders, {{"Amount", type number}}),
    FilteredRows = Table.SelectRows(ChangedType, each [Region] = "North")
in
    FilteredRows

You almost never write M directly — the editor's clicks generate it for you. But knowing M exists is useful for two reasons: parameterising queries (e.g., reading a folder path from a cell), and understanding error messages that quote M function names like Table.SelectRows.


Part 10: Cross-References

  • Lesson 2.12 Excel Tables and Structured References — Power Query loads its results back into Excel Tables; From Table/Range is the entry point that turns existing Table data into a query input.
  • Lesson 2.7 Subtotals and TotalsGroup By in Power Query is the structured-data version of the Data > Subtotal menu feature.
  • Lesson 4.x PivotTables — A PivotTable built on a Power Query output stays in sync: refresh the query, then refresh the PivotTable, and new data flows through the entire chain.

Links