Thin Content Detector — Documentation
Automatic detection of thin content, duplicates and boilerplate on your PrestaShop catalogue with AI-powered enrichment suggestions. Installation, threshold configuration, AI providers, cron scan and troubleshooting.
DataFirefly Thin Content Detector automatically scans your products, categories and CMS pages across all active languages in your shop. It detects three SEO-toxic patterns — content that’s too thin, duplicate descriptions, and pages dominated by boilerplate — and generates AI enrichment suggestions ready to paste. This guide covers installation, configuration, daily use, cron scheduling and troubleshooting.
Overview
Since the Helpful Content Update, Google has been actively demoting pages whose content is too short, too similar to other pages, or too dominated by repeated elements. On an e-commerce catalogue, that’s typically supplier-copied product sheets, categories with two generic sentences, or variants sharing 95% of their description. Invisible to the naked eye on 500 products — but cumulatively, it’s what stops your site from ranking.
The three detection types
- Thin content — pages below the configurable word threshold. Three severity levels based on how far below the threshold (critical < 25%, warning 25–75%, notice 75–100%).
- Duplicates — two-pass detection: SHA1 hash for exact duplicates (severity 3), then Jaccard similarity ≥ configurable threshold for near-duplicates (severity 2).
- Template / content ratio — identifies tokens shared with sibling pages (same parent category) and calculates the percentage of unique tokens per page. A page with 200 words but 90% boilerplate is as toxic as a 30-word page.
Installation
- Upload the module ZIP via Modules > Module Manager > Upload a module.
- Click Install. The module creates two tables (
ps_dfthincontent_issueandps_dfthincontent_scan) and an admin tab under Catalog. - Access the module via Catalog > Thin Content (DataFirefly).
id_shop). No Composer dependency required.
Configuration
Click the Configuration button in the module’s toolbar. Three panels are available.
Detection thresholds
- Product minimum words — default 150. Any product whose long + short description combined contains fewer than 150 words will be flagged.
- Category minimum words — default 100.
- CMS page minimum words — default 250.
- Jaccard similarity threshold — default 85%. Above this, two pages are considered near-duplicates.
- Minimum template ratio — default 30%. Below this, the page is considered too dominated by boilerplate.
Scan targets
- Scan products (ON by default).
- Scan categories (ON by default).
- Scan CMS pages (ON by default).
- Automatic rescan on save (OFF by default). When enabled, each save of a product, category or CMS page triggers a targeted re-test of that object only. You see in real time whether your rewrite is enough to clear the thresholds.
AI configuration
Enrichment suggestions use an OpenAI-compatible endpoint (chat completions). This includes a wide range of providers:
- OpenAI — endpoint
https://api.openai.com/v1/chat/completions, recommended modelgpt-4o-mini(≈ €0.001 per suggestion). - Mistral AI — endpoint
https://api.mistral.ai/v1/chat/completions, modelmistral-small-latest. - Groq — endpoint
https://api.groq.com/openai/v1/chat/completions, modelllama-3.3-70b-versatile. Very fast. - Ollama locally — endpoint
http://localhost:11434/v1/chat/completions, any downloaded model. Zero cost. - Anthropic via an OpenAI-compatible proxy.
Parameters to fill in:
- Endpoint — full URL to
/v1/chat/completions. - Model — model identifier at the provider.
- API key — Bearer token. Stored encrypted via PrestaShop’s configuration system.
- Max tokens — default 600. Sufficient for a standard enrichment suggestion.
Usage — Dashboard
The dashboard is the module’s home page. It shows:
- Three main counters — total open, fixed and ignored issues.
- Breakdown by issue type — thin / duplicate / template.
- Breakdown by object type — product / category / CMS page.
- Current thresholds — reminder of configured values.
- Last 5 scans — date, duration, number of objects analysed.
- “Run full scan” button — triggers a synchronous AJAX scan. A modal shows progress and the summary at the end.
Run a scan
Click Run full scan. The scan walks through all active languages, applies the three analysers to enabled targets, persists detected issues into ps_dfthincontent_issue, and marks as fixed any issues that are no longer detected (for example if you’ve enriched a product page since the last scan).
set_time_limit(0) and memory_limit 512M limits.
Usage — Issues list
Accessible via View issues in the toolbar. Paginated display (50 per page) with advanced filters:
- Status — open / fixed / ignored.
- Issue type — thin / duplicate / template.
- Object type — product / category / CMS.
- Language — filter on one of the active languages.
- Free search — on the object name.
Each row shows severity (red / orange / blue dot), issue type, object type with icon, name, language, word count, the relevant metric (% similarity or % uniqueness) and three action buttons:
- AI suggestion — opens a modal with an HTML enrichment suggestion generated on demand (see next section).
- Mark fixed — moves the issue to
fixedstatus. It stays in history but no longer pollutes the counters. - Ignore — moves the issue to
ignored. Useful for intentionally short pages (e.g. a legitimately short “Contact” CMS page).
CSV export
The Export CSV button downloads all issues matching the current filter. Export is streamed to output (500-row chunks) to handle large catalogues without memory saturation. UTF-8 encoding with BOM for direct opening in Excel. Semicolon delimiter.
AI suggestions
Click the AI button on any row. The module sends a request to the configured endpoint with a prompt built dynamically from the issue type and object type:
- Thin product — enrich with USPs, materials, usage, origin, warranties.
- Thin category — enrich with range USPs, buying advice, subcategory comparison.
- Thin CMS — editorial development, context, examples.
- Duplicate — differentiate the page by focusing on what makes it unique compared to its duplicates.
- Template — remove boilerplate, add elements unique to this specific page.
The system message enforces a return in clean HTML: p, ul, li, h3 tags only. No markdown, no root tags. You can paste the result directly into TinyMCE’s description field without cleanup.
The suggestion is stored in DB. If you reopen the modal later, it displays instantly without a new API call.
Cron — Scheduled scans
The module exposes a token-protected cron endpoint, ideal for nightly scans:
https://your-shop.com/modules/dfthincontent/cron.php?token=YOUR_TOKEN
The token is randomly generated at install time and shown in the configuration panel. Keep it confidential — it grants access to triggering a full scan.
Crontab example (daily scan at 4am)
0 4 * * * curl -s "https://your-shop.com/modules/dfthincontent/cron.php?token=YOUR_TOKEN" > /dev/null 2>&1
Cron scan characteristics
set_time_limit(0)— no PHP time limit.memory_limit 512M— set automatically.- JSON return containing the number of objects analysed, the number of issues detected and the total duration.
- Validation via
hash_equalsto resist timing attacks.
DFTHIN_CRON_TOKEN value in the ps_configuration table.
Technical architecture
Table structure
ps_dfthincontent_issue— one record per detected issue. Unique key:(id_object, object_type, id_lang, id_shop, issue_type). Notable fields:severity(1-3),word_count,content_hash(SHA1),metric_value(% similarity or uniqueness),metric_data(JSON with detail),ai_suggestion,status,object_name,object_url.ps_dfthincontent_scan— scan history. Start / end date, duration, items analysed by type, status.
Hooks used
actionAdminControllerSetMedia— loads CSS / JS and exposes the AJAX URL viaMedia::addJsDef.actionProductUpdate— rescans the modified product if auto-rescan is enabled.actionObjectCategoryUpdateAfter— same for categories.actionObjectCmsUpdateAfter— same for CMS pages.
Performance limits
Duplicate detection is intrinsically O(n²) — every page is compared to every other page of the same type / language / shop. To avoid an explosion on very large catalogues, the module applies two protections:
- Safety cap at 1,500 items per group (type + language + shop). Beyond that, duplicate detection is disabled for this group — a warning is logged.
- Word-count pre-filtering — Jaccard similarity is only computed between items whose word count is within a ±50% window. This eliminates the vast majority of useless comparisons.
Troubleshooting
The scan doesn’t start
- Open the browser’s network console, click Run full scan, look at the AJAX request to
action=scanFull. - If the response is HTML instead of JSON, it’s a server-side PHP fatal — check the PrestaShop (
var/logs/) and PHP logs. - If the response is 404, check that the
AdminDfThinContentcontroller is properly registered (ps_tabtable). - If the response is 403, the CSRF token has expired — refresh the page and try again.
AI suggestions return an error
- Check that the API key is correct and active at your provider.
- Check that the server can reach the endpoint URL (outbound firewall, DNS).
- If using Ollama locally, check that the service is running (
ollama serve) and the model is downloaded (ollama pull llama3.3). - Check the PrestaShop logs — the module logs cURL errors and non-200 HTTP codes there.
The cron returns 401 or 403
The token transmitted doesn’t match. Get the correct token from the configuration panel and replace it in your crontab. No space, no line break in the value.
Legitimate duplicates are flagged
This is the typical case of very close variants (sizes of the same model, colours). Three options:
- Mark the issues as ignored one by one.
- Raise the Jaccard similarity threshold to 95% or more.
- Disable product scanning and keep only categories + CMS if your use case doesn’t require product scanning.
Uninstallation
Uninstall via Modules > Module Manager > Uninstall. The module cleanly removes both DB tables, the admin tab and all configuration keys. No residual traces.
Resources
- Product page: datafirefly.com/en/product/dfthincontent/
- Support: support@datafirefly.com