Technical brief

How WP Article Cleaner integrates with your WordPress site, what it touches, what it doesn't, and how the moving parts fit together.

1. Overview

WP Article Cleaner is delivered in two layers:

A read-only Python client that talks to a self-hosted WordPress site through its REST API. This is the layer that fetches your content and, eventually, applies approved changes back.
An AI auditing engine that consumes the fetched content and produces structured edit proposals. The engine is a frontier large language model wrapped in an editorial harness we maintain — its internals are intentionally left out of this document.

This brief covers the first layer end-to-end. It is also the layer you'd self-host if you wanted to run the workflow on a fully air-gapped environment.

2. Architecture

The client is intentionally a thin wrapper. There is no broker, no queue, no database. Every interaction is a direct HTTPS call to /wp-json/wp/v2/... on your site, authenticated with HTTP Basic Auth.

┌──────────────────────┐        HTTPS         ┌────────────────────────┐
│  AI auditing engine  │ ───────────────────▶ │  WordPress REST API    │
│  (proprietary)       │   /wp-json/wp/v2/... │  posts · pages · meta  │
└──────────┬───────────┘ ◀─────────────────── └────────────────────────┘
           │ uses                                       ▲
           ▼                                            │ Basic Auth
┌──────────────────────┐                                │ (App Password)
│  wp_cleaner Python   │ ───────────────────────────────┘
│  client library      │
└──────────────────────┘

The Python package wp_cleaner is what the AI engine calls into. It's also what you can call directly from your own scripts if you want to build dashboards, exports, or audits on top of the same primitives.

3. WordPress integration

The client targets the standard WordPress REST API surface. No plugin, no mu-plugin, no theme modifications, and no direct database access are required.

Endpoint	Purpose	Used by
`GET /wp/v2/posts`	Paginated post listing	`list_posts()`
`GET /wp/v2/posts/{id}`	Single post with raw block markup	`get_post()`

Both calls are issued with the context=edit query parameter. That's important: with context=view, the REST API returns post content as rendered HTML — Gutenberg block delimiters () have already been stripped. With context=edit we get content.raw, the source of truth, which is what any sane editing pass needs to operate on.

Pagination

Listing endpoints in WordPress are page-based. The client walks them transparently using the response header X-WP-TotalPages:

def paginated_get(self, path, params=None):
    params = dict(params or {})
    params.setdefault("per_page", 100)
    page = 1
    while True:
        params["page"] = page
        resp = self._request(path, params=params)
        for item in resp.json():
            yield item
        total = int(resp.headers.get("X-WP-TotalPages", "1"))
        if page >= total:
            return
        page += 1

4. Authentication

We use WordPress Application Passwords, a feature built into WordPress 5.6 and later. No third-party plugin is installed.

Application Passwords are scoped credentials a user can generate from Users → Profile → Application Passwords. They are:

Per-application. You can issue one for WP Article Cleaner specifically and revoke it without touching your real password.
HTTPS-only. The credential travels as HTTP Basic Auth over TLS. The client refuses non-HTTPS base URLs.
Visible exactly once. WordPress shows the generated string a single time; it is never retrievable afterwards.

Why not OAuth or JWT?

Both options exist, but they require either a server-side OAuth broker or installing a JWT plugin. Application Passwords are native, revocable, and have a smaller attack surface. We default to them and only revisit if a customer explicitly requires SSO.

5. Environment configuration

The client reads three environment variables from the host process. There is no .env file support and no credential file is ever written to disk by the tool.

Variable	Required	Notes
`WP_BASE_URL`	Yes	Must start with `https://`. Trailing slashes are stripped.
`WP_USERNAME`	Yes	WordPress login of the user that owns the App Password.
`WP_APP_PASSWORD`	Yes	Spaces in the displayed password are stripped automatically.

Set them in the OS environment of the machine running the workflow. On Linux/macOS, that typically means appending to ~/.bashrc or ~/.zshrc; on Windows, using [Environment]::SetEnvironmentVariable(...) at the User scope.

6. The client library

The Python package exposes a small, deliberate surface:

from wp_cleaner import WordPressClient, load_config, list_posts, get_post

cfg = load_config()
client = WordPressClient(cfg.base_url, cfg.username, cfg.app_password)

# Walk every published article (paginated under the hood):
for post in list_posts(client, status="publish"):
    print(post["id"], post["title"]["rendered"])

# Fetch a single post — content.raw preserves Gutenberg block markup:
post = get_post(client, 123)
print(post["content"]["raw"])

That's the entire public API for Stage 1. Anything more — bulk export, content diffing, write-back — composes from those primitives.

7. Error handling

WordPress returns errors as JSON objects of the form {"code": "...", "message": "..."}. The client parses those and raises a typed WPAPIError with the original status code, error code, and human-readable message preserved:

try:
    post = get_post(client, 9999)
except WPAPIError as exc:
    print(exc.status)    # 404
    print(exc.code)      # 'rest_post_invalid_id'
    print(exc.message)   # 'Invalid post ID.'

Configuration errors (missing env vars, non-HTTPS URLs) raise ConfigError with a message that points at the README. Both are intentionally distinct exception types so callers can decide which to surface and which to treat as fatal.

8. Security model

HTTPS-only transport. The client refuses to start if WP_BASE_URL is not HTTPS. There is no way to disable this check.
No credential persistence by the tool. Credentials are only ever read from os.environ at runtime. Nothing is cached, logged, or written to disk.
Read-only at Stage 1. The currently shipped client cannot create, update, or delete WordPress content. Even compromise of the credential cannot mutate your site through this codebase.
Scoped credential. Application Passwords inherit the role of the WordPress user that issued them. We recommend issuing them under an Editor-role user, not under an Administrator account.
Auditable surface. The entire client is open source and short enough to read in one sitting. Every HTTP call the AI engine makes flows through this code.

9. Roadmap

Stage 2 introduces the write path and the safety scaffolding that comes with it:

Update support via POST /wp/v2/posts/{id} with diff preview and dry-run modes.
Local snapshot store. Every article gets a JSON snapshot written before it is mutated, enabling exact rollback.
Page-builder detection. Posts authored in Elementor, Divi, or similar visual builders will be detected and refused — their content lives in custom meta, not in the post body, so REST edits would be invisible.
Draft-only mode. Approved updates can land as drafts so a human reviews the rendered article inside WordPress before publishing.
Pages, taxonomies, and media. The same client surface, extended to the rest of the WordPress content model.

Questions about the integration, security, or self-hosting? Get in touch.