PrestaShop Artificial Intelligence

DataFirefly AI Semantic Internal Linking — Vector embeddings, cosine similarity and semi-automatic insertion with smart anchors for PrestaShop 8 & 9 (Mistral, OpenAI)

Semantic internal linking driven by AI embeddings for PrestaShop. Your content is vectorized (Mistral or OpenAI), compared by cosine similarity, and the module proposes the most relevant links with anchors extracted verbatim from the source text.

Most internal linking modules just do find-replace on keywords. The result is rigid, breaks existing links, and misses the majority of linking opportunities — all the ones where the exact keyword does not appear even though the meaning is very close. DataFirefly Semantic AI Internal Linking works differently: each product, category and CMS page is turned into an embedding vector (Mistral mistral-embed or OpenAI text-embedding-3-small, your choice). The module then compares vectors using cosine similarity and proposes the most semantically related content pairs, regardless of the vocabulary used. For each suggestion, an anchor generator extracts n-grams from the target title that appear verbatim in the source body — the anchor is both SEO-optimized and natural in the text. Semi-automatic insertion with one-by-one or bulk validation, clean rollback via unique marker, persistent queue, CLI worker for large catalogs, auto-reindexing on every change. Multilingual, multi-shop, two switchable AI providers, no subscription.

PrestaShop 8.0 → 9.x PHP 8.1+ Mistral · OpenAI Vector embeddings Cosine similarity Verbatim anchor extraction Semi-automatic insertion Rollback via unique marker CLI worker Auto-reindexing Multilingual & multi-shop No subscription
  • 30-day refund
  • 12 months updates
  • 24h support
www.datafirefly.com/en/
DataFirefly AI Semantic Internal Linking — Vector embeddings, cosine similarity and semi-automatic insertion with smart anchors for PrestaShop 8 & 9 (Mistral, OpenAI)
v1.0.0 · updated 2026-05-15
What it does

The short version.

01

Mistral or OpenAI vector embeddings

Each product, category and CMS page is turned into an embedding vector by the AI provider of your choice: Mistral mistral-embed (1024 dimensions, ~EUR 0.10 per million tokens) or OpenAI text-embedding-3-small (1536 dimensions, ~USD 0.02 per million tokens). Vectors are stored as packed float32 BLOB with pre-computed L2 norm — cosine similarity is then computed in PHP, with no external dependency (no pgvector or Elasticsearch needed). Switch providers via a simple dropdown, nothing to manually reindex.

02

Anchors extracted verbatim from source text

This is what sets the module apart. For each suggestion, the anchor generator extracts n-grams from the target title (2 to 6 words) that appear verbatim in the source body, and ranks them by length and relevance. The proposed anchor is therefore natural (it's already in your text, you're not forcing anything) and SEO-optimized (it contains the exact keywords of the target title). FR and EN stopwords built in. You stay in control: dropdown menu with all candidates, or free field to type a custom anchor.

03

Semi-automatic insertion with validation

Each suggestion enters the queue with Pending status. You validate or reject one at a time, or in bulk with multi-select and bulk action buttons. Insertion uses a robust regex that avoids a, code, pre tags and already-linked HTML zones — zero risk of breaking an existing link. Each inserted link receives a data-dfasl attribute carrying a UUID-like unique identifier, enabling surgical rollback: link removal from the Inserted Links tab, without touching the rest of the HTML, without risky regex, with zero risk of corrupting the description.

04

CLI worker + auto-reindexing via hooks

For catalogs over 1,000 entities, the back-office becomes slow. The module ships a CLI worker (bin/analyze.php) with --enqueue-all, --loop, --max-batches, --sleep options — suited for a cron running every 15 minutes to process the queue. On every product, category or CMS change, PrestaShop hooks place the affected entity in the queue in the background: the index stays current without intervention. The queue persists Pending, Processing, Done, Error statuses and automatically resets entries stuck for more than 30 minutes.

The long version

Everything you'd want to know before you install.

A detailed look at how DataFirefly AI Semantic Internal Linking — Vector embeddings, cosine similarity and semi-automatic insertion with smart anchors for PrestaShop 8 & 9 (Mistral, OpenAI) works, why we built it the way we did, and the thinking behind the features above.

§ 01

Why semantic internal linking is superior to keyword-based linking

Classic internal linking modules work on keyword-to-URL rules. You enter berber rug and associate the URL of the berber-rug category. The engine then does a find-replace in the HTML of your articles, products or pages. This approach has two major limitations. It's rigid: it only triggers a link when the exact keyword appears, which excludes all pages where the topic is covered with different wording (moroccan rug, kilim, traditional rug). And it's blind to semantic context: the engine doesn't know whether the target page is actually relevant to the source content, it's just doing string matching. Semantic linking works differently: each piece of content is represented by a vector of several hundred dimensions that encodes its meaning — product, category, CMS page, blog post. Two contents are linked if they're close in this vector space, regardless of the words used. The module thus spots linking opportunities a keyword engine would never see, and avoids false positives where a keyword appears in an irrelevant context.

§ 02

AI embeddings: how it works in practice

On first indexation, the module iterates over all active entities of your shop for the enabled types (products, categories, CMS pages). For each entity, the textual content is extracted and cleaned: title, meta_title, meta_description, short description and long description (HTML is properly stripped). The cleaned text is then sent in batch to the configured AI provider (Mistral or OpenAI), which returns one embedding vector per item — a list of floats that represents the text's semantics. This vector is stored in the database as float32 BLOB packed in little-endian, with its L2 norm pre-computed to speed up later similarity calculations. Similarity between two contents is then computed in PHP via normalized dot product (cosine similarity), an extremely fast operation once norms are pre-computed. On a catalog of 5,000 entities, computing all pairs in one language takes only a few seconds.

§ 03

Why two providers rather than just one

Each provider has its sweet spot. Mistral mistral-embed is the recommended default: 1024 dimensions, very low latency, Europe hosting (EU sovereignty for sensitive shops), cost around 10 cents per million tokens — well under one euro to index a multilingual catalog of several thousand entities. OpenAI text-embedding-3-small is the alternative: 1536 dimensions (richer vector space), excellent on non-European languages, cost around 2 cents per million tokens (five times cheaper than Mistral in USD). The module unifies both providers behind a common interface: same return format, same batch mechanism, same error handling with PrestaShopLogger. You can switch from one provider to the other from the configuration dropdown — the module will detect that dimensions changed and prompt a reindex (one click on Reindex All).

§ 04

The anchor generator, the real centerpiece of the module

This is what makes the difference with a raw suggestion module. For each (source, target) pair above the similarity threshold, the anchor generator runs the following algorithm: it extracts the target's title, splits it into n-grams from 2 to 6 words, removes stopwords (French and English), then looks for each of these n-grams verbatim in the source body. Found n-grams are ranked by decreasing length (longer ones are more discriminative and more SEO-optimized) and presented in the back-office dropdown. The default selected anchor is the longest one found — typically a 3 or 4-word anchor including the main keywords of the target title. If no n-gram of the target title appears in the source, the module offers the raw target title (fallback mode). You always keep control: editable dropdown, Custom option to type any anchor text. At insertion time, the module picks the first occurrence of the anchor text in the source body that's not already inside an a, code or pre tag — zero risk of breaking an existing link or re-linking already-linked text.

§ 05

Surgical rollback via unique marker

This is the feature that reassures every merchant who's cautious with their descriptions. Each inserted link receives an HTML data-dfasl attribute carrying a unique 36-character identifier generated randomly at insertion (UUID-like format). The identifier is also stored in the database in the dfasl_inserted_link table, along with the source entity, target, anchor, insertion date and the ID of the employee who validated. To remove a link, you go to the Inserted Links tab, click Remove next to the row in question: the module runs a regex matching exactly the a data-dfasl anchor pattern with this unique identifier, removes the a tag while preserving the anchor text intact, and marks the link as removed in the database. No other tag in the description is touched, no manual link is at risk. On 500 links inserted by the module across 200 product pages, you can remove a single one in one click without touching any of the other 499.

§ 06

CLI worker and processing strategy for large catalogs

On a catalog of a few dozen products, everything can be done from the back office: Reindex All then Process Batch is enough. Beyond a few thousand entities, the interface becomes slow and the user doesn't want to keep their browser open for hours. The module exposes a CLI worker (bin/analyze.php) that runs from PHP command line with four options. --shop to target a specific shop in a multishop environment. --enqueue-all to requeue all active entities before processing — useful for a full re-indexation after a provider or model change. --loop to keep looping while items remain to process. --max-batches to limit the number of batches processed in a single run (anti-runaway safety). --sleep to insert a pause between batches (useful to stay under API rate limits). The typical command for a cron every 15 minutes is: php modules/dfaisemanticlinks/bin/analyze.php --loop --max-batches=50 --sleep=1. The worker automatically resets entries stuck in Processing status for more than 30 minutes (case where a previous worker would have crashed), handles API errors by marking affected items as Error with the message, and continues processing healthy items in the batch.

§ 07

Auto-reindexing and index freshness

An index that drifts away from the catalog has no value anymore. The module handles freshness via native PrestaShop hooks. On every change of a product, category or CMS page (hooks actionObjectProductUpdateAfter, actionObjectCategoryUpdateAfter, actionObjectCmsUpdateAfter), the affected entity is placed back in the queue with Pending status, in all active languages — the next worker will process it automatically. On deletion (hooks actionObjectProductDeleteAfter, actionObjectCategoryDeleteAfter, actionObjectCmsDeleteAfter), associated embeddings and suggestions are cascaded purged. The module also includes a content hash (SHA-256 of cleaned text): if an entity is requeued without its actual content having changed (e.g. because an employee only touched the stock), the indexing batch detects the unchanged hash and skips the API call — token savings. Auto-reindexing is toggleable from Settings (DFASL_AUTO_INDEX option), useful for pausing it during a massive CSV import and resuming with a Reindex All when the import is done.

§ 08

Native multi-shop and multilingual

The module is natively multi-shop and multilingual. Embeddings are scoped by the triplet (entity, language, shop) — the same product in two shops will have two independent embeddings if descriptions differ, and the same product in French and English will have two different embeddings even if the product sheet is the same. Suggestions never cross language boundaries: a French product will never get a link suggested to an English product (which would make no SEO sense). Shop boundaries are respected the same way. Configuration can differ per shop (API key, similarity threshold, indexed types, display hook) — useful when you have a B2B shop with technical content and a B2C shop with general-public content in the same infrastructure.

§ 09

Typical use cases

Multilingual fashion shop with a catalog of 2,000 products — semantic linking spots pairs of visually or stylistically close products (e.g. two variations of a dress cut) that keyword rules would systematically miss. B2B technical shop with dense descriptions — semantic linking connects products that share the same industrial use case without identical vocabulary. E-commerce blog — each article can automatically reference the most semantically relevant products, categories and other articles, with anchors extracted verbatim from the article text (the opposite of mechanical find-replace). Catalog overhaul — after a massive import or a reorganization, a Reindex All rebuilds the linking in a few minutes where a manual strategy would have taken weeks of editorial work. Catalog with strong specialized vocabulary (organic cosmetics, medical equipment, technical products) — the module detects semantic proximities that non-experts wouldn't see, letting the SEO team discover non-obvious linking opportunities.

§ 10

Internal architecture and PrestaShop 8 and 9 compatibility

The module is built in PHP 8.1+ with strict types, readonly classes and modern features (match, enums). Autoload is PSR-4 under namespace DataFirefly/AiSemanticLinks/ mapped to src/. Admin controllers use legacy ModuleAdminController (not Symfony Grid) — deliberate choice to guarantee stable compatibility between PrestaShop 8.0 and 9.x without having to maintain two code variants. A tiny in-house service container (ServiceContainer) wires repositories and business services — isolating the module from Symfony container differences between PrestaShop versions, and avoiding a dependency that would break on every major update. Five SQL tables prefixed dfasl_: embedding (vectors and hashes), queue (work queue), suggestion (proposed pairs), inserted_link (active links), job (bulk operations). Uninstall cleanly drops the 5 tables and purges all DFASL_* configuration variables. Source code is delivered unencrypted, PSR-compliant — you can override, audit, or extend it as you wish.