PS PrestaShop Intermediate

Thin Content Detector — Documentation

Automatic detection of thin content, duplicates and boilerplate on your PrestaShop catalogue with AI-powered enrichment suggestions. Installation, threshold configuration, AI providers, cron scan and troubleshooting.

Updated Module version 1.0.0

DataFirefly Thin Content Detector automatically scans your products, categories and CMS pages across all active languages in your shop. It detects three SEO-toxic patterns — content that’s too thin, duplicate descriptions, and pages dominated by boilerplate — and generates AI enrichment suggestions ready to paste. This guide covers installation, configuration, daily use, cron scheduling and troubleshooting.

Overview

Since the Helpful Content Update, Google has been actively demoting pages whose content is too short, too similar to other pages, or too dominated by repeated elements. On an e-commerce catalogue, that’s typically supplier-copied product sheets, categories with two generic sentences, or variants sharing 95% of their description. Invisible to the naked eye on 500 products — but cumulatively, it’s what stops your site from ranking.

The three detection types

  • Thin content — pages below the configurable word threshold. Three severity levels based on how far below the threshold (critical < 25%, warning 25–75%, notice 75–100%).
  • Duplicates — two-pass detection: SHA1 hash for exact duplicates (severity 3), then Jaccard similarity ≥ configurable threshold for near-duplicates (severity 2).
  • Template / content ratio — identifies tokens shared with sibling pages (same parent category) and calculates the percentage of unique tokens per page. A page with 200 words but 90% boilerplate is as toxic as a 30-word page.

Installation

  1. Upload the module ZIP via Modules > Module Manager > Upload a module.
  2. Click Install. The module creates two tables (ps_dfthincontent_issue and ps_dfthincontent_scan) and an admin tab under Catalog.
  3. Access the module via Catalog > Thin Content (DataFirefly).
Compatibility. PrestaShop 8.0 – 9.x, PHP 7.4 – 8.3, MySQL 5.6+ / MariaDB 10.3+. Multi-shop natively supported (the unique key for issues includes id_shop). No Composer dependency required.

Configuration

Click the Configuration button in the module’s toolbar. Three panels are available.

Detection thresholds

  • Product minimum words — default 150. Any product whose long + short description combined contains fewer than 150 words will be flagged.
  • Category minimum words — default 100.
  • CMS page minimum words — default 250.
  • Jaccard similarity threshold — default 85%. Above this, two pages are considered near-duplicates.
  • Minimum template ratio — default 30%. Below this, the page is considered too dominated by boilerplate.
Which threshold to pick? 150 words per product is a solid starting point for most shops. For textiles or consumables, you can drop to 100. For technical electronics or home goods, raise to 250. For Jaccard similarity, 85% catches real duplicates without flagging every legitimate variant; drop to 75% if you have many very close variants to differentiate.

Scan targets

  • Scan products (ON by default).
  • Scan categories (ON by default).
  • Scan CMS pages (ON by default).
  • Automatic rescan on save (OFF by default). When enabled, each save of a product, category or CMS page triggers a targeted re-test of that object only. You see in real time whether your rewrite is enough to clear the thresholds.

AI configuration

Enrichment suggestions use an OpenAI-compatible endpoint (chat completions). This includes a wide range of providers:

  • OpenAI — endpoint https://api.openai.com/v1/chat/completions, recommended model gpt-4o-mini (≈ €0.001 per suggestion).
  • Mistral AI — endpoint https://api.mistral.ai/v1/chat/completions, model mistral-small-latest.
  • Groq — endpoint https://api.groq.com/openai/v1/chat/completions, model llama-3.3-70b-versatile. Very fast.
  • Ollama locally — endpoint http://localhost:11434/v1/chat/completions, any downloaded model. Zero cost.
  • Anthropic via an OpenAI-compatible proxy.

Parameters to fill in:

  • Endpoint — full URL to /v1/chat/completions.
  • Model — model identifier at the provider.
  • API key — Bearer token. Stored encrypted via PrestaShop’s configuration system.
  • Max tokens — default 600. Sufficient for a standard enrichment suggestion.
The API key is not mandatory. Issue detection and tracking work without AI. Only enrichment suggestions require a configured endpoint. You can perfectly use the module in pure audit mode.

Usage — Dashboard

The dashboard is the module’s home page. It shows:

  • Three main counters — total open, fixed and ignored issues.
  • Breakdown by issue type — thin / duplicate / template.
  • Breakdown by object type — product / category / CMS page.
  • Current thresholds — reminder of configured values.
  • Last 5 scans — date, duration, number of objects analysed.
  • “Run full scan” button — triggers a synchronous AJAX scan. A modal shows progress and the summary at the end.

Run a scan

Click Run full scan. The scan walks through all active languages, applies the three analysers to enabled targets, persists detected issues into ps_dfthincontent_issue, and marks as fixed any issues that are no longer detected (for example if you’ve enriched a product page since the last scan).

On large catalogues (beyond 5,000 products), prefer the cron scan (see below). The synchronous scan remains usable but may exceed the default PHP time limit. The cron scan automatically lifts the set_time_limit(0) and memory_limit 512M limits.

Usage — Issues list

Accessible via View issues in the toolbar. Paginated display (50 per page) with advanced filters:

  • Status — open / fixed / ignored.
  • Issue type — thin / duplicate / template.
  • Object type — product / category / CMS.
  • Language — filter on one of the active languages.
  • Free search — on the object name.

Each row shows severity (red / orange / blue dot), issue type, object type with icon, name, language, word count, the relevant metric (% similarity or % uniqueness) and three action buttons:

  • AI suggestion — opens a modal with an HTML enrichment suggestion generated on demand (see next section).
  • Mark fixed — moves the issue to fixed status. It stays in history but no longer pollutes the counters.
  • Ignore — moves the issue to ignored. Useful for intentionally short pages (e.g. a legitimately short “Contact” CMS page).

CSV export

The Export CSV button downloads all issues matching the current filter. Export is streamed to output (500-row chunks) to handle large catalogues without memory saturation. UTF-8 encoding with BOM for direct opening in Excel. Semicolon delimiter.

AI suggestions

Click the AI button on any row. The module sends a request to the configured endpoint with a prompt built dynamically from the issue type and object type:

  • Thin product — enrich with USPs, materials, usage, origin, warranties.
  • Thin category — enrich with range USPs, buying advice, subcategory comparison.
  • Thin CMS — editorial development, context, examples.
  • Duplicate — differentiate the page by focusing on what makes it unique compared to its duplicates.
  • Template — remove boilerplate, add elements unique to this specific page.

The system message enforces a return in clean HTML: p, ul, li, h3 tags only. No markdown, no root tags. You can paste the result directly into TinyMCE’s description field without cleanup.

The suggestion is stored in DB. If you reopen the modal later, it displays instantly without a new API call.

Recommended workflow. Filter by critical severity, generate suggestions one by one, copy each suggestion into the corresponding page, save. If automatic rescan is enabled, the issue moves to fixed automatically as soon as the save clears the thresholds.

Cron — Scheduled scans

The module exposes a token-protected cron endpoint, ideal for nightly scans:

https://your-shop.com/modules/dfthincontent/cron.php?token=YOUR_TOKEN

The token is randomly generated at install time and shown in the configuration panel. Keep it confidential — it grants access to triggering a full scan.

Crontab example (daily scan at 4am)

0 4 * * * curl -s "https://your-shop.com/modules/dfthincontent/cron.php?token=YOUR_TOKEN" > /dev/null 2>&1

Cron scan characteristics

  • set_time_limit(0) — no PHP time limit.
  • memory_limit 512M — set automatically.
  • JSON return containing the number of objects analysed, the number of issues detected and the total duration.
  • Validation via hash_equals to resist timing attacks.
Regenerate the token. If you suspect a token leak, uninstall then reinstall the module — a new token will be generated. You can also directly change the DFTHIN_CRON_TOKEN value in the ps_configuration table.

Technical architecture

Table structure

  • ps_dfthincontent_issue — one record per detected issue. Unique key: (id_object, object_type, id_lang, id_shop, issue_type). Notable fields: severity (1-3), word_count, content_hash (SHA1), metric_value (% similarity or uniqueness), metric_data (JSON with detail), ai_suggestion, status, object_name, object_url.
  • ps_dfthincontent_scan — scan history. Start / end date, duration, items analysed by type, status.

Hooks used

  • actionAdminControllerSetMedia — loads CSS / JS and exposes the AJAX URL via Media::addJsDef.
  • actionProductUpdate — rescans the modified product if auto-rescan is enabled.
  • actionObjectCategoryUpdateAfter — same for categories.
  • actionObjectCmsUpdateAfter — same for CMS pages.

Performance limits

Duplicate detection is intrinsically O(n²) — every page is compared to every other page of the same type / language / shop. To avoid an explosion on very large catalogues, the module applies two protections:

  • Safety cap at 1,500 items per group (type + language + shop). Beyond that, duplicate detection is disabled for this group — a warning is logged.
  • Word-count pre-filtering — Jaccard similarity is only computed between items whose word count is within a ±50% window. This eliminates the vast majority of useless comparisons.

Troubleshooting

The scan doesn’t start

  1. Open the browser’s network console, click Run full scan, look at the AJAX request to action=scanFull.
  2. If the response is HTML instead of JSON, it’s a server-side PHP fatal — check the PrestaShop (var/logs/) and PHP logs.
  3. If the response is 404, check that the AdminDfThinContent controller is properly registered (ps_tab table).
  4. If the response is 403, the CSRF token has expired — refresh the page and try again.

AI suggestions return an error

  • Check that the API key is correct and active at your provider.
  • Check that the server can reach the endpoint URL (outbound firewall, DNS).
  • If using Ollama locally, check that the service is running (ollama serve) and the model is downloaded (ollama pull llama3.3).
  • Check the PrestaShop logs — the module logs cURL errors and non-200 HTTP codes there.

The cron returns 401 or 403

The token transmitted doesn’t match. Get the correct token from the configuration panel and replace it in your crontab. No space, no line break in the value.

Legitimate duplicates are flagged

This is the typical case of very close variants (sizes of the same model, colours). Three options:

  • Mark the issues as ignored one by one.
  • Raise the Jaccard similarity threshold to 95% or more.
  • Disable product scanning and keep only categories + CMS if your use case doesn’t require product scanning.

Uninstallation

Uninstall via Modules > Module Manager > Uninstall. The module cleanly removes both DB tables, the admin tab and all configuration keys. No residual traces.

Back up the CSV export before uninstallation if you want to keep the history of generated AI suggestions. Once tables are dropped, the suggestions are gone.

Resources

Was this page helpful?

Still stuck? Contact support