Full text search with Sphinx
- Sphinx installation
- Get the source
wget http://www.sphinxsearch.com/downloads/sphinx-0.9.7.tar.gz tar xvzf sphinx-0.9.7.tar.gz - Compile & install
cd sphinx-0.9.7 ./configure --prefix=~/sphinx make make install - Add to PATH
open your .bash_profile file and add this snippet:
Note: you need to reload the shell to apply the new settingif [ -d ~/sphinx/bin ]; then PATH=~/sphinx/bin:"${PATH}" fi
- Get the source
- Sphinx configuration
Configure sphinx for searching coffee houses by their name and company name.
Create a sphinx configuration file named sphinx.conf in the coffeehouse/config directory
Caution:- You need to define an sql query that will fetch all the data that goes into the full text index. The first column returned by this query must be a unique numeric identifier.
- You need to specify a port number, on which your search daemon will be listening. You need to pick a unique port number. Suggested port numbering schema for student shared host: 20000 + user id (to find your user id use the
idcommand) - You need to specify paths to your index and log files. Make sure they point to locations in your home directory
sample sphinx.conf file# # Sphinx configuration file sample # ############################################################################# ## data source definition ############################################################################# source coffee_houses { # data source type # for now, known types are 'mysql', 'pgsql' and 'xmlpipe' # MUST be defined type = mysql ##################################################################### # some straightforward parameters for 'mysql' source type sql_host = wierzba.wzks.uj.edu.pl sql_user = agnessa sql_pass = nndlhwra sql_db = baza_agnessa sql_port = 3306 # optional, default is 3306 # pre-query, executed before the main fetch query # useful eg. to setup encoding or mark records # optional, default is empty # # sql_query_pre = SET CHARACTER_SET_RESULTS=cp1251 sql_query_pre = SET CHARACTER_SET_RESULTS=latin2 # main document fetch query # # you can specify up to 32 (formally SPH_MAX_FIELDS in sphinx.h) fields; # all of the fields which are not document_id or attributes (see below) # will be full-text indexed # # document_id MUST be the very first field # document_id MUST be positive (non-zero, non-negative) # document_id MUST fit into 32 bits # document_id MUST be unique # # mandatory sql_query = \ SELECT coffee_houses.id, coffee_houses.name, companies.name \ FROM coffee_houses LEFT JOIN companies ON (coffee_houses.company_id=companies.id) } ############################################################################# ## index definition ############################################################################# index coffee_houses { # which document source to index # at least one MUST be defined # # multiple sources MAY be specified; to do so, just add more # "source = NAME" lines. in this case, ALL the document IDs # in ALL the specified sources MUST be unique source = coffee_houses # this is path and index file name without extension # # indexer will append different extensions to this path to # generate names for both permanent and temporary index files # # .tmp* files are temporary and can be safely removed # if indexer fails to remove them automatically # # .sp* files are fulltext index data files. specifically, # .spa contains attribute values attached to each document id # .spd contains doclists and hitlists # .sph contains index header (schema and other settings) # .spi contains wordlists # # MUST be defined path = /home/epi/login/sphinx/var/data/coffee_houses } ############################################################################# ## indexer settings ############################################################################# indexer { # memory limit # # may be specified in bytes (no postfix), kilobytes (mem_limit=1000K) # or megabytes (mem_limit=10M) # # will grow if set unacceptably low # will warn if set too low and potentially hurting the performance # # optional, default is 32M mem_limit = 16M } ############################################################################# ## searchd settings ############################################################################# searchd { # port on which search daemon will listen port = 10480 # log file # searchd run info is logged here log = /home/epi/login/sphinx/var/log/searchd.log # query log file # all the search queries are logged here query_log = /home/epi/login/sphinx/var/log/query.log # client read timeout, seconds read_timeout = 5 # maximum amount of children to fork # useful to control server load max_children = 3 # a file which will contain searchd process ID # used for different external automation scripts # MUST be present pid_file = /home/epi/login/sphinx/var/log/searchd.pid # maximum amount of matches this daemon would ever retrieve # from each index and serve to client # # this parameter affects per-client memory and CPU usage # (16+ bytes per match) in match sorting phase; so blindly raising # it to 1 million is definitely NOT recommended # # starting from 0.9.7, it can be decreased on the fly through # the corresponding API call; increasing is prohibited to protect # against malicious and/or malformed requests # # default is 1000 (just like with Google) max_matches = 1000 } # --eof-- - Install acts_as_sphinx
./script/plugin install http://svn.datanoise.com/acts_as_sphinx - Run the indexer & search daemon
How to index your data:
This will index the data as specified in sphinx.conf.rake sphinx:indexHow to start the search daemon:
This will start a search demon on the port specified in sphinx.conf.rake sphinx:startHow to stop the search daemon:
Or just kill it (without -9).rake sphinx:stopHow to reindex data:
Note: use this task when the daemon is running, otherwise use sphinx:indexrake sphinx:rotate - Use sphinx in your application
- Edit and test the model
Add acts_as_sphinx call to the CoffeeHouse model (app/models/coffee_house.rb):
Now in the command line try out the search (use a query that is likely to return results from your data set):class CoffeeHouse < ActiveRecord::Base acts_as_sphinx :host => '127.0.0.1', :port => the_port_your_searchd_is_running_on, :index => 'coffee_houses' [...] end
Note: in case script/console is not working, the same effect can be obtained with:script/console >CoffeeHouse.find_with_sphinx('starbucks')
In case irb is not working, it may not have been installed properly. A common case is when there's no executable called irb, but there is one called irb1.8. You may define an alias in .bash_profile to overcome this:irb >load 'config/environment.rb'alias irb='irb1.8' - Full text search action in the controller
add a new action in app/controllers/coffee_house_controller.rb:
def full_text_search @page = params[:page] @page ||= 1; @query = params[:query] @query ||= '' @coffee_houses = CoffeeHouse.find_with_sphinx(@query, :sphinx => {:limit => PER_PAGE, :page => @page}) @coffee_house_pages = pages_for @coffee_houses.total, :page => @page render :partial => 'list' endNote: make sure the search action (not full_text_search) looks the same as in "Ajax in Practice"
add pagination helpers to app/controllers/application.rb:
PER_PAGE = 10 unless defined? PER_PAGE def pages_for(size, options = {}) default_options = {:per_page => PER_PAGE} options = default_options.merge(options) Paginator.new self, size, options[:per_page], (options[:page] || 1) end - Search box in the view
add a new search box in app/views/coffee_houses/list.rhtml:
ajax-loader.gif<div class="search-box"> <% form_remote_tag :url => {:action => 'full_text_search'}, :update => 'coffee_houses', :loading => "Element.show('fts-loader')", :complete => "Element.hide('fts-loader')" do %> <%= text_field_tag 'query' %> <%= submit_tag 'Full text search' %> <% end %> <%= image_tag 'ajax-loader.gif', :id => 'fts-loader', :style => 'display:none' %> </div>adjust the stylesheet:
.search-box{ width: 300px; padding: 10px; }Note: make sure _list.rhtml looks the same as in "Ajax in Practice"
- Index updates
Schedule a cron job to run the rotate task at a frequency suitable for your site.
- Edit and test the model
- References