PS PrestaShop Intermediate

AI Semantic Internal Linking — Complete guide

Install, configure and operate semantic internal linking with AI embeddings: indexing, suggestions, anchors, rollback and CLI worker.

Updated Module version 1.0.0

Overview

DataFirefly AI Semantic Internal Linking (dfaisemanticlinks) builds your PrestaShop store’s internal linking from vector embeddings. Each product, category and CMS page is turned into a vector by an AI provider (Mistral or OpenAI), vectors are compared by cosine similarity, and the module proposes contextual links with anchors extracted verbatim from the source text. You validate suggestions one by one or in bulk, and every inserted link can be surgically removed thanks to a unique data-dfasl marker.

The module makes no AI call on the front-office: similarities are pre-computed and validated links are written directly into descriptions. Performance impact: zero.

Requirements

  • PrestaShop 8.0 → 9.x (PrestaShop 1.7 not supported)
  • PHP 8.1, 8.2 or 8.3
  • MySQL 5.7+ / MariaDB 10.3+
  • A Mistral API key (console.mistral.ai) or OpenAI API key (platform.openai.com)
  • CLI access recommended for catalogs over 1,000 entities (cron)

Installation

  1. Back office → Modules → Module Manager → Upload a module.
  2. Upload dfaisemanticlinks.zip then click Install.
  3. The module creates 5 tables prefixed dfasl_ (embedding, queue, suggestion, inserted_link, job) and an AI Linking menu with 4 tabs: Dashboard, Suggestions, Inserted Links, Settings.

Uninstall cleanly drops the 5 tables and all DFASL_* configuration variables. Export your data beforehand if you want to keep it.

Configuration

1. Embedding provider

Settings tab, first block:

  • Provider: Mistral (mistral-embed, 1024 dimensions, default, EU hosting) or OpenAI (text-embedding-3-small, 1536 dimensions).
  • API key: paste the key of the selected provider.
  • Test API Connection: the button sends a test string and displays the received dimensions. Always validate the key here before launching an indexation.

If you switch providers after an indexation, vector dimensions change (1024 vs 1536). The module will prompt a Reindex All — old vectors are overwritten, but already-inserted links stay in place.

2. Indexing

  • Indexed types: products, categories, CMS pages — each toggleable independently.
  • Minimum length (default 200 characters): content that is too short after HTML cleanup is skipped.
  • Batch size (default 20): number of items sent per API request. One embedding call per batch.
  • Auto-reindexing (default on): every product/category/CMS change requeues the entity via hooks. Disable it temporarily during a massive CSV import.

3. Suggestions and insertion

  • Similarity threshold (default 0.78): pairs below the threshold are ignored. Lower to 0.72 for more suggestions, raise to 0.82 for more strictness.
  • Max links per page (default 5): SEO anti-over-optimization protection.
  • Anchor strategy: optimized n-grams (default) or raw target title.

First indexation

  1. Dashboard tab → Reindex All button: all active entities of enabled types are queued, in all active languages.
  2. Click Process Batch as many times as needed (small catalogs), or launch the CLI worker (see below).
  3. Each batch: text extraction + cleanup, batch embedding call, vector storage, then suggestion computation via cosine similarity.

The dashboard continuously shows: total entities, active embeddings, pending suggestions, active links, and queue statuses (Pending, Processing, Done, Error).

Indicative cost: 1,000 products in 3 languages ≈ 1.5 million tokens ≈ EUR 0.15 (Mistral) or USD 0.03 (OpenAI). SHA-256 hash detection ensures subsequent reindexes only pay for actually modified content.

Validating suggestions

Suggestions tab: paginated table with, per row, the source, target, similarity score, proposed anchor, a context snippet, and Insert / Reject buttons.

Choosing the anchor

The anchor generator extracts n-grams (2 to 6 words) from the target title appearing verbatim in the source body, ranked longest first. The dropdown lists all candidates; the Custom option opens a free field. The default anchor is the longest n-gram found — usually 3 or 4 words including the target’s main keywords.

Insertion

At insertion, the module links the first occurrence of the anchor not already inside an a, code or pre tag (SKIP/FAIL PCRE patterns). If no free occurrence exists, a fallback paragraph with the class dfasl-related is appended at the end of the description. Each link receives a data-dfasl attribute with a unique 36-character identifier.

Bulk actions

Check multiple rows (header checkbox to select all) then Insert Selection or Reject Selection. Pagination is 50 rows.

Removing a link (rollback)

Inserted Links tab: paginated list of active links with source, target, anchor, date, employee. The Remove button deletes only the a data-dfasl tag carrying that identifier — the anchor text stays intact, no other HTML element is touched, and the link is marked removed in the database.

CLI worker and cron

For large catalogs, use the command-line worker:

php modules/dfaisemanticlinks/bin/analyze.php [options]
  • --shop=N: target a specific shop (multishop).
  • --enqueue-all: requeue all active entities before processing.
  • --loop: keep looping while pending items remain.
  • --max-batches=N: cap the number of batches per run (anti-runaway safety).
  • --sleep=N: pause in seconds between batches (API rate limits).

Recommended cron every 15 minutes:

*/15 * * * * php /path/to/prestashop/modules/dfaisemanticlinks/bin/analyze.php --loop --max-batches=50 --sleep=1

The worker automatically resets entries stuck in “Processing” for over 30 minutes (previous run crash), marks failed items with the API error message, and keeps processing the rest of the batch.

Auto-reindexing

The actionObjectProductUpdateAfter, actionObjectCategoryUpdateAfter and actionObjectCmsUpdateAfter hooks requeue the modified entity in all active languages. Deletion hooks cascade-purge embeddings and suggestions. The SHA-256 content hash avoids any API call when the actual text hasn’t changed (e.g. a mere stock change).

Multi-shop and multilingual

Embeddings are scoped by the triplet (entity, language, shop). Suggestions never cross language or shop boundaries. Configuration (API key, threshold, indexed types) can differ per shop via PrestaShop’s standard multishop context selector.

Troubleshooting

“API key not configured” or error on connection test

Check that the key matches the provider selected in the dropdown (a Mistral key won’t work with the OpenAI provider and vice versa) and that it has credits. Detailed API errors are logged in Advanced Parameters → Logs (PrestaShopLogger).

Items remain in “Processing” status

A worker was probably interrupted. Wait 30 minutes (auto reset) or click Clear Queue then relaunch Reindex All.

Few or no suggestions generated

Three frequent causes: similarity threshold too high (try 0.72), content too short (below minimum length), or a catalog too homogeneous/heterogeneous. Also check the desired entity types are enabled in Settings.

The proposed anchor is the raw target title

That’s the fallback mode: no n-gram of the target title appears verbatim in the source body. Pick a custom anchor or enrich the source description.

Technical architecture

  • Strict PHP 8.1+, PSR-4 under the DataFirefly/AiSemanticLinks/ namespace mapped to src/
  • Legacy ModuleAdminController admin controllers (stable PS8/PS9 compatibility)
  • Tiny in-house service container (independent from the Symfony container)
  • Vectors as little-endian packed float32 BLOB + pre-computed L2 norm
  • 5 tables: dfasl_embedding, dfasl_queue, dfasl_suggestion, dfasl_inserted_link, dfasl_job
  • Unencrypted source code, ready to override
Was this page helpful?

Still stuck? Contact support