I spent a bunch of time playing with Apache SOLR trying to get multiword synonyms to work. I have a very specific use case where the field I wanted to create synonyms for was a collection of tags applied to the document which could be multiword. I don’t believe this solution would work for an open text field but for replacing one tag for another appears to work perfectly. I had to play with the configuration quite a bit to get the replacement working for phrases of more then one word. So I’ve documented my solution below. Your mileage may very.
I added the filter on the query side of the field definition. If you want to use the filter at index time, you’ll have to use a slightly different syntax in the synonym file. One thing to mention is the use of the tokenizer factory property. You’ll need to set this to “solr.KeywordTokenizerFactory” to stop it from tokenizing the value before doing the synonym replacement. The KeywordTokenizerFactory is weirdly named as it doesn’t actually tokenize anything.
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" tokenizerFactory="solr.KeywordTokenizerFactory"/>
For the synonyms.txt file:
# List of synonyms need to be defined as below if the filter is defined at query time # Searching for "Replaced Term" will yield documents that contain "Test" or "Two Words". "Test" and "Two Words" will continue to work as well. Test, Two Words => Replaced Term Test => Replaced Term Two # ....etc.... # List of synonyms need to be defined as below if the filter is defined at index time # This is so that "Original Term" continues to work as a term Original Term => Original Term, Test, Two Words
Some links to check out:
Hope this helps some one!