November 12th

Our next meeting has Rich Harms of Source Allies teaching Lucene 101: Introduction to Apache Lucene.

Have you searched a website only to find the results useless? You searched for “fsu” and the first match returned is “does jsf suck - YES”. Shouldn’t it search by whole words? Shouldn’t matched words in titles or “H1” elements count more than footer? Find out how Lucene not only will search your document, but score the results based on importance of match.

Through a series of examples, we’ll take a look at how to store information in an Lucene index, how it structures documents and fields, common types of analyzers that may be applied to the contents of fields, dealing with different types of data in indexes (strings and numeric data), query syntax and an example of how to deal with non-English languages. We’ll also take a look at the data that gets stored in the indexes using the tool “Luke.”

If you’d like to follow along, the example code is available at: https://github.com/richharms/IntroToLucene