Working with Apache SOLR SynonymFilterFactory filter

I spent a bunch of time playing with Apache SOLR trying to get multiword synonyms to work. I have a very specific use case where the field I wanted to create synonyms for was a collection of tags applied to the document which could be multiword. I don’t believe this solution would work for an open text field but for replacing one tag for another appears to work perfectly. I had to play with the configuration quite a bit to get the replacement working for phrases of more then one word. So I’ve documented my solution below. Your mileage may very.

I added the filter on the query side of the field definition. If you want to use the filter at index time, you’ll have to use a slightly different syntax in the synonym file. One thing to mention is the use of the tokenizer factory property. You’ll need to set this to “solr.KeywordTokenizerFactory” to stop it from tokenizing the value before doing the synonym replacement. The KeywordTokenizerFactory is weirdly named as it doesn’t actually tokenize anything.

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"
                tokenizerFactory="solr.KeywordTokenizerFactory"/>

For the synonyms.txt file:

# List of synonyms need to be defined as below if the filter is defined at query time
# Searching for "Replaced Term" will yield documents that contain "Test" or "Two Words".  "Test" and "Two Words" will continue to work as well.
Test, Two Words => Replaced Term
Test => Replaced Term Two
# ....etc....

# List of synonyms need to be defined as below if the filter is defined at index time
# This is so that "Original Term" continues to work as a term
Original Term => Original Term, Test, Two Words

Some links to check out:

Hope this helps some one!

This entry was posted in internet, programming and tagged . Bookmark the permalink.

2 Responses to Working with Apache SOLR SynonymFilterFactory filter

  1. Alex says:

    Hi, i don’t have any result from this solutions.
    Can you show lines from your schema.xml around this solution?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s