Hybrid Search for Magnolia DXP: When Search Understands Context
The Next Evolution of On-Site Search: How dev5310 combines on-prem-Search and LLM vectors to eliminate zero-result pages and radically improve User Experience.
Why Your Search Needs to Get Smarter
In a digital world where Google and ChatGPT shape user expectations, a simple keyword search on your corporate website is no longer sufficient. Users don’t just search for terms; they ask questions, describe problems, and expect solutions—often without knowing the exact technical vocabulary.
Classic search engines fail exactly at this point. A typo, a synonym that hasn't been manually maintained, or a complex query often leads to the dreaded result: "No results found." This is where dev5310 Hybrid Search comes in.
We have developed a search solution deeply integrated into Magnolia DXP that combines the best of two worlds: The speed and precision of proven search technology (on-prem-Search) and the contextual understanding of modern Artificial Intelligence (LLM Vectors).
What is Hybrid Search? The Technological Symbiosis
Hybrid Search is not just a buzzword. It describes the architecture of our engine, which uses two different indexing and search methods in parallel and intelligently fuses their results. This is the core statement and the heart of our solution: We don't just understand the words; we understand the meaning.
1. The On-prem Search Index (Lexical Search)
This is the foundation. OpenSearch (a fork of Elasticsearch) is the industry standard for high-performance full-text search.
- How it works: It searches for exact matches of words or parts of words.
- Strength: Extremely fast, perfect for exact keyword matches (e.g., article numbers, specific product names) and efficient filtering.
- Limit: It understands no context. "Bank" can be a river edge or a financial institution—for a purely lexical search, it is the same word.
2. The Vector-Based Index (Semantic Search)
This is where innovation comes into play. We use a connection to Large Language Models (LLM) to transform content into mathematical vectors.
- How it works: Texts are translated into multidimensional vector spaces. Terms that are thematically similar are located close to each other in this space.
- Strength: The search understands context and intent.
- The Result: If a user types "How do I open an account?", the vector search finds the page for "Checking Account Application," even if the word "open" does not appear there. Semantic proximity is key.
The Synthesis: Why 1 + 1 = 3
The dev5310 Hybrid Search merges these two result streams. We use the precision of keywords and fill the gaps with the intelligence of vectors. The result is highly relevant search results, even with vague queries. No more empty result pages. Only direct connections to the question, the context, and the meaning.
Feature Deep-Dive: Full Control in Magnolia DXP
Technology is only as good as its usability. That’s why we didn't design Hybrid Search as an external black box, but as a native tool integrated directly into Magnolia DXP. Content Managers and Editors retain full control over search behavior.
1. Blacklist Management: Brand Safety & Internal Data
Not everything that can be indexed should be found. There are sensitive terms, internal project names, or words that should not be associated with the brand in a specific context.
- Function: Managers can independently maintain a list of keywords explicitly excluded from the search (Stopwords/Blacklist).
- Flexibility: Data entry is designed for maximum usability.
- Multifield: Add individual terms one by one.
- Textfield (CSV): Copy & paste long, comma-separated lists from Excel or other sources.
- Benefit: This proactively prevents reputational damage and ensures that internal jargon does not appear on the public results page.
2. Terminology and Synonyms: The Bridge to Corporate Language
Although the LLM understands semantic relationships, there is specific corporate language or product names that must be explicitly controlled.
- The Scenario: A customer searches for "Bank account," but your product is called "Giro account." or they search for "Cell phone," but you sell "Smartphones."
- The Solution: Editors can define synonym pairs. The system learns: If A is searched, also search for B.
- The Fallback Mechanism: This function is essential for reliability. Should the LLM (e.g., OpenAI or Azure) be temporarily unreachable or fail during an update, the search reverts to these maintained synonyms. This keeps result accuracy high even in AI "offline mode."
3. Boosting and Weighting: Steering Traffic Purposefully
Search is not just finding; search is also marketing. With our boosting functions, Search Managers can actively steer traffic.
- Prioritization: You can determine that when the search term "Sustainability" is entered, the latest CSR report always appears at position #1, ahead of older blog articles that might be mathematically more relevant.
- Flexible Targeting:
- Internal: Select a target page conveniently via the Magnolia Page Chooser.
- External: Enter an external URL. This is ideal if you want to promote campaign pages on microsites or partner sites that are not in the same CMS tree.
- Strategic Value: This transforms search from a passive function into an active conversion channel.
4. Magnolia Index Management: Automation for Ops
For Content Operations (Ops) teams, index currency is crucial. No one wants to wait for a nightly cronjob after publishing a press release.
We have created a new command and hooked it directly into the Publishing/Unpublishing chain of Magnolia.
Architecture & Integration
The technical implementation by dev5310 follows the principles of stability and scalability.
- Integration: The search is seamlessly integrated into Magnolia DXP. There are no noticeable disruptions for the editor.
- Request Proxy: Magnolia acts as an intelligent intermediary. The frontend sends search requests to Magnolia, which validates them, potentially enriches them (synonyms, blacklists), and then collect the results from search.
- Data Sovereignty: Since control lies within the CMS, you have the assurance that your business logic (Who can see what? What gets boosted?) is always adhered to.
Our Experts: The Minds Behind Hybrid Search
At dev5310, we believe technology is only as good as the people who implement it. Our team combines strategic foresight with hard technical excellence.
Lennart Kapanke – Software Engineer (B.Sc. Computer Science)
Focus: Backend development, vector logic, and performance.
Lennart brings academic precision and a deep understanding of algorithms. With his background in Computer Science (B.Sc.), he ensures that the code doesn't just work, but is performant, secure, and maintainable. He developed the logic that merges on-prem-search and vectors through an invented measurement calculation in milliseconds.
AI is powerful, but without solid software architecture, it is unpredictable. We built Hybrid Search to be robust: If the LLM 'hallucinates' or fails, our logic catches it. The code must be 'clean' so the results can be 'smart'. That is my standard for every line we write.
-- Lennart Kapanke, Software Engineer, dev5310
Martin Schmid, Senior Consultant & Integration Lead
Focus: Interface between business requirements and technical architecture.
Martin is the one who understands your requirements before you’ve even spoken them. As an experienced consultant, he knows that a search engine must not only function technically but also serve business goals. He handles the seamless integration of Hybrid Search into your existing Magnolia landscape and trains your Content Managers.
The best search is the one the user doesn't perceive as technology, but as a helpful assistant. My goal during integration is to ensure editors can intuitively control what the customer finds. With Hybrid Search, we finally close the gap between what the customer 'says' (types) and what they really 'mean'.
Martin Schmid, Technology Consultant
Use Cases: Hybrid Search in Practice
To illustrate the potential of dev5310 Hybrid Search, let's look at three concrete application scenarios.
FAQ: Frequently Asked Questions
Here we answer the most important questions regarding integration and usage.
Q: Do I need an external AI provider for vector search?
A: Hybrid Search uses LLM models (such as OpenAI or Azure OpenAI) for vectorization. We configure the interface for you. However, data sovereignty and control remain within your Magnolia system.
Q: Does AI slow down the search?
A: No. Vectorization happens during indexing (when the page is published). The actual search query runs against the prepared index and is blazing fast. The dev5310 Request Proxy is optimized for performance.
Q: What happens if the LLM fails?
A: We have built-in safety nets. If the vector index is temporarily unavailable, the system automatically falls back to the robust on-prem-Search keyword search and the maintained synonyms. Your search will always function.
Q: Can I use the search for multiple languages?
A: Absolutely. Vector models are often language-independent or master multiple languages excellently. A user can even search in English and find German content if the context matches. Magnolia controls the language-specific delivery.
Q: How complex is the integration into my existing Magnolia?
A: Since Martin specializes exactly in Magnolia as a consultant, the process is very efficient. We use existing Magnolia mechanisms (Observation, Commands) so we don't have to bend the core. Often, the basic integration is completed in just a few days.
Conclusion: Ready for the Search of the Future?
The dev5310 Hybrid Search is more than just a search bar. It is an intelligent tool that supports your content strategy, understands your users, and increases your conversion rates.
Stop relying on your customers accidentally typing the right word. Give them the technology that understands them.
Your Next Step
Want to see how Hybrid Search works with your own data?
Contact Martin and Lennart today for a no-obligation demo and an analysis of your current search index.