Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Sphinx Search Beginner's Guide
Sphinx Search Beginner's Guide
Sphinx Search Beginner's Guide
Ebook528 pages3 hours

Sphinx Search Beginner's Guide

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

This book is a step-by-step guide for an absolute beginner. It covers everything, from installing to configuring, to get you started quickly. It has numerous code examples that the reader can try on their own and learn while doing so. It has two full-fledged applications as examples that readers can follow. This book is specifically focused on the Search feature of web applications. This book is for developers who are new to Sphinx Search. All code examples use PHP but the logic is same for any other web scripting languages as well.
LanguageEnglish
Release dateMar 16, 2011
ISBN9781849512558
Sphinx Search Beginner's Guide

Related to Sphinx Search Beginner's Guide

Related ebooks

Information Technology For You

View More

Related articles

Reviews for Sphinx Search Beginner's Guide

Rating: 4 out of 5 stars
4/5

2 ratings1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 4 out of 5 stars
    4/5
    Pretty much does what it sets out to do, show how to install and implement Sphinx Search without any excess fluff, theory, or "boring stuff".

Book preview

Sphinx Search Beginner's Guide - Abbas Ali

Table of Contents

Sphinx Search

Credits

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

Who this book is for

Conventions

Time for action - heading

What just happened?

Pop quiz - heading

Have a go hero - heading

Reader feedback

Customer support

Errata

Piracy

Questions

1. Setting Up Sphinx

What you need to know

Different ways of performing a search

Searching on a live database

Searching an index

Sphinx—a full-text search engine

Features

A brief history

License

Installation

System requirements

Sphinx on a Unix-based system

Time for action - installation on Linux

What just happened?

Options to the configure command

Known issues during installation

Sphinx on Windows

Time for action - installation on Windows

What just happened?

Sphinx on Mac OS X

Time for action - installation on a Mac

What just happened?

Other supported systems

Summary

2. Getting Started

Checking the installation

Full-text search

What is full-text search?

Traditional search

Time for action - normal search in MySQL

What just happened?

MySQL full-text search

Advantages of full-text search

When to use a full-text search?

Overview of Sphinx

Primary programs

Time for action - Sphinx in action

What just happened?

Data to be indexed

Creating the Sphinx configuration file

Searching the index

Have a go hero -

Why use Sphinx for full-text searching?

Summary

3. Indexing

What are indexes?

Indexes in Sphinx

Index attributes

Types of attributes

Multi-value attributes (MVA)

Data sources

How to define the data source?

SQL data sources

Creating Index using SQL data source (Blog)

Creating a simple index without any attributes

Time for action - creating database tables for a blog

What just happened?

Time for action - populate the database tables

What just happened?

Time for action - creating the Sphinx configuration file

What just happened?

The indexing workflow

Adding attributes to the index

Time for action - adding attributes to the index

What just happened?

Adding an MVA to the index

Time for action - Adding an MVA to the index

What just happened?

Filtering without searching for a specific phrase

xmlpipe data source

xmlpipe2 data source

Indexing with schema defined in XML stream

Time for action - creating index (without attributes)

What just happened?

Time for action - add attributes to schema

What just happened?

Indexing with schema defined in configuration file

Time for action - create index with schema defined in configuration file

What just happened?

Summary

4. Searching

Client API implementations for Sphinx

Search using client API

Time for action - creating a basic search script

What just happened?

Matching modes

Time for action - searching with different matching modes

What just happened?

Boolean query syntax

Time for action - searching using Boolean query syntax

What just happened?

Extended query syntax

Time for action - searching with extended query syntax

What just happened?

Filtering full-text search results

Time for action - filtering the result set

What just happened?

Weighting search results

Time for action - weighting search results

What just happened?

Sorting modes

Grouping search results

Summary

5. Feed Search

The application

Tools and software used while creating this application

Database structure

Time for action - creating the MySQL database and tables

What just happened?

Basic setup

Time for action - setting up the feeds application

What just happened?

Add feed

Time for action - creating a form to add feeds

What just happened?

Saving the feed data

Time for action - adding code to save feed

What just happened?

Indexing the feeds

Time for action - create the index

What just happened?

Check for duplicate items

Time for action - adding code to avoid duplicate items

What just happened?

Index merging

Time for action - adding the delta index

What just happened?

Search form

Time for action - creating the search form

What just happened?

Perform the search query

Time for action - adding code to perform a search query

What just happened?

Applying filters

Time for action - adding code to filter the results

What just happened?

Time for action - showing search form prefilled with last submitted data

What just happened?

Re-indexing

Have a go hero - trying different search queries

Summary

6. Property Search

The application

Tools and software used while creating this application

Database structure

Time for action - creating the MySQL database and structure

What just happened?

Initial data

Time for action - populating the database

What just happened?

Basic setup

Time for action - setting up the application

What just happened?

Adding a property

Time for action - creating the form to add property

What just happened?

Indexing the properties

Time for action - creating the index

What just happened?

Simple search form

Time for action - creating the simple search form

What just happened?

Full-text search

Time for action - adding code to perform full-text search

What just happened?

Have a go hero - try setting different field weights

Advanced search

Time for action - creating the Advanced search form

What just happened?

Ranged filters

Time for action - adding ranged filters

What just happened?

Have a go hero - adding filter for amenities

Geo distance search

Time for action - creating the search form

What just happened?

Add geo anchor

Time for action - adding code to perform geo distance search

What just happened?

Have a go hero - adding the delta index using the index merging technique

Summary

7. Sphinx Configuration

Sphinx configuration file

Rules for creating the configuration file

Data source configuration

SQL related options

Connection options

sql_port

sql_sock

odbc_dsn

Options to fetch data (SQL data source)

sql_query_pre

sql_query_post

sql_query_post_index

sql_ranged_throttle

Configuration file using advanced options

Time for action - creating a configuration with advanced source options

What just happened?

MS SQL specific options

mssql_winauth

mssql_unicode

Index configuration

Distributed searching

Set up an index on multiple servers

Time for action - creating indexes for distributed searching

What just happened?

Set up the distributed index on the primary server

Time for action - adding distributed index configuration

What just happened?

agent_blackhole

agent_connect_timeout

agent_query_timeout

Distributed searching on single server

charset configuration

charset_type

charset_table

Data related options

stopwords

min_word_len

ignore_chars

html_strip

html_index_attrs

html_remove_elements

Word processing options

Morphology

Time for action - using morphology for stemming

What just happened?

morphology

min_stemming_len

Wordforms

Search daemon configuration

listen

log

query_log

read_timeout

client_timeout

max_children

pid_file

max_matches

seamless_rotate

Indexer configuration

mem_limit

max_iops

max_iosize

max_xmlpipe2_field

Summary

8. What Next?

SphinxQL

SphinxQL in action

Time for action - querying Sphinx using MySQL CLI

What just happened?

SELECT

Column list clause

FROM clause

WHERE clause

GROUP BY clause

ORDER BY clause

LIMIT clause

OPTION clause

SHOW WARNINGS

SHOW STATUS

SHOW META

Use case scenarios

Popular websites using Sphinx

Summary

Index

Sphinx Search

Beginner's Guide


Sphinx Search

Beginner's Guide

Copyright © 2011 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused, directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: March 2011

Production Reference: 1100311

Published by Packt Publishing Ltd.

32 Lincoln Road

Olton

Birmingham, B27 6PA, UK.

ISBN 978-1-849512-54-1

www.packtpub.com

Cover Image by Asher Wishkerman (< a.wishkerman@mpic.de> )

Credits

Author

Abbas Ali

Reviewers

Paul Grinberg

Kevin Horn

Acquisition Editor

Eleanor Duffy

Development Editor

Roger D'souza

Technical Editor

Aaron Rosario

Indexers

Tejal Daruwale

Monica Ajmera Mehta

Editorial Team Leader

Aanchal Kumar

Project Team Leader

Priya Mukherji

Project Coordinator

Sneha Harkut

Proofreader

Jonathan Russell

Graphics

Nilesh Mohite

Production Coordinator

Melwyn D'sa

Cover Work

Melwyn D'sa

About the Author

Abbas Ali has over six years of experience in PHP Development and is a Zend Certified PHP 5 Engineer. A Mechanical Engineer by education, Abbas turned to software development just after finishing his engineering degree. He is a member of the core development team for the Coppermine Photo Gallery, an open source project, which is one of the most popular photo gallery applications in the world.

Fascinated with both machines and knowledge, Abbas is always learning new programming techniques. He got acquainted with Sphinx in 2009 and has been using it in most of his commercial projects ever since. He loves open source and believes in contributing back to the community.

Abbas is married to Tasneem and has a cute little daughter, Munira. He has lived in Nagpur (India) since his birth and is in no rush to move to any other city in the world. In his free time he loves to watch movies and television. He is also an amateur photographer and cricketer.

Abbas is currently working as Chief Operating Officer and Technical Manager at SANIsoft Technologies Private Limited, Nagpur, India. The company specializes in development of large, high performance, and scalable PHP applications.

For feedback and suggestions, you can contact Abbas at:

Web : http://www.abbasali.net/contact/

Twitter: @_abbas

Acknowledgement

My wife Tasneem and sweet little daughter Munira were patient throughout my writing adventure, and I want to thank them for giving me tremendous support and quiet space to work at home. I would also like to thank my mother for her moral support.

My inspiration was Dr. Tarique Sani, CTO of SANIsoft, who is my employer, mentor, and guru. I would like to thank him for his support and exchange of technical know-how. I would also like to thank my colleagues at SANIsoft who encouraged me in my endeavor.

I would also like to thank all the reviewers and editors who worked patiently with me. A special thanks to Aaron Rosario who worked sleepless nights during the final editing phase.

Richard Phillips of Utilitas Knowledge Management Limited, London, introduced me to Sphinx while I was working on one of his projects in 2009. He deserves special thanks and acknowledgment.

Last, but not the least; I would like to thank my brother who has been an inspiration all my life.

About the Reviewers

Paul Grinberg is an electrical engineer with a focus on embedded firmware design. As part of his work he has utilized many techniques that traditionally fall outside of his field, including a number of scripting languages. While learning PHP, Paul started contributing to the MediaWiki project by writing a number of extensions. One of those extensions was the Sphinx Search extension to improve the search capability of the MediaWiki engine.

I would like to thank Svemir Brkic, who is the co-author of the Sphinx Search extension for MediaWiki. I would also like to thank my wife for her understanding, flexibility, and support for my hobbies.

Kevin Horn has a B.S. in Mechanical Engineering from Texas A&M University and has been creating web applications since 1998, when he accidentally became a web developer after running out of money for college. He's worked under almost every job title in the IT field, though he always seems to come back to programming. Despite working with a number of different languages, there's no doubt that his favorite is Python, as he will tell anyone who will listen (and some who won't).

Kevin lives in North Texas with his wife, two sons, and a couple of canine interlopers.

Kevin currently works as a semi-freelance programmer both through his own company and others. In his not-so-copious free time, he works on various open source Python projects, reads a truly ridiculous amount of fiction, and tries to figure out how to raise his offspring properly.

Thanks to the Packt team for making the process of reviewing my first book pretty darn painless. I'd also like to thank my wife, kids, and friends for putting up with me staring at the computer screen, when they'd much rather I be doing something else.

www.PacktPub.com

Support files, eBooks, discount offers, and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at< service@packtpub.com> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy & paste, print and bookmark content

On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

This book will serve as a guide to everything that you need to know about running a Sphinx Search Engine. In today's world, search is an integral part of any application; a reliable search engine like Sphinx Search can be the difference between running a successful and unsuccessful business. What good is being on the web if no one knows you are there?

Enjoying the preview?
Page 1 of 1