一.安装配置
安装参见: https://www.ttlsa.com/html/1236.html文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
二.索引说明文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
在查询语句中选择许多列包含在索引中。文档ID(documnet ID)必须位于第一列,并且是一个唯一的无符号的正整数。在这种情况下表中的id字段将被视为创建的索引。文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
文档ID在SQL查询语句中是第一个字段,它必须是一个唯一的非零无符号的非负整数。如果使用多个索引来源,所有来源的文档ID必须唯一。文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
除文档ID字段外,所有查询选择字段,默认情况下都视为创建全文索引字段。文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
如果想一个或多个字段作为索引属性,那么可以使用sql_arrt_*选项来定义。文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
下面的选项可用于不同类型的属性声明:文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_unit: 无符号整数属性(32位)文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_bool: 布尔属性文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_bigint: 符号整数属性(64位)文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_timestamp: unix时间戳属性文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_str2ordinal: 序串号属性文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_float: 浮点属性文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_multi: 多值属性(MVA)文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_multi格式如下:文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
sql_attr_multi = ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY]文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
选项参数如下:文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
ATTR-TYPE: uint或timestamp文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
ATTR-NAME: 属性名称文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
SOURCE-TYPE: field或query或ranged-query文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engine-2/
QUERY: 通过一个SQL语句查询获取所有文档ID,属性值对
RANGE-QUERY: 通过SQL语句查询来获取最小和最大的ID值,类似于sql_query_range
例如:参见https://www.ttlsa.com/html/1346.html
sql_query = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents
sql_attr_uint = group_id
sql_attr_timestamp = date_added
id作为文档ID,group_id作为一个无符号整数属性, date_added作为时间戳属性。 而剩下的字段titel, content将作为全文索引字段。
当对索引进行搜索时,搜索字词匹配的title和content字段,而这两个属性可用于排序和筛选。
sql_query_info是可选的。主要用于在命令行界面(CLI)下搜索查询时,显示索引中匹配的属性。
如果加上sql_query_info = SELECT * FROM documents WHERE id=$id ($id为查询到的文档ID)
# search test displaying matches: 1. document=1, weight=2578, date_added=Fri Jun 15 15:58:41 2012 id=1 group_id=1 group_id2=5 date_added=2012-06-15 15:58:41 title=test one content=this is my test document number one. also checking search within phrases. 2. document=2, weight=1557, date_added=Fri Jun 15 15:58:41 2012 id=2 group_id=1 group_id2=6 date_added=2012-06-15 15:58:41 title=test two content=this is my test document number two words: 1. ‘test’: 2 documents, 3 hits
当把sql_query_info注释掉,查询结果如下:
# search test displaying matches: 1. document=1, weight=2578, date_added=Fri Jun 15 15:58:41 2012 2. document=2, weight=1557, date_added=Fri Jun 15 15:58:41 2012 words: 1. 'test': 2 documents, 3 hits
三.blog系统下使用
1.新建相关表
mysql> CREATE DATABASE ttlsa_com ;
Query OK, 1 row affected (0.00 sec)
#帖子表
mysql> CREATE TABLE `ttlsa_com`.`posts` ( -> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , -> `title` VARCHAR( 255 ) NOT NULL , -> `content` TEXT NOT NULL , -> `author_id` INT UNSIGNED NOT NULL , -> `publish_date` DATETIME NOT NULL -> ) ENGINE = INNODB; Query OK, 0 rows affected (0.08 sec)
#作者表,每个post属于一个作者,一个作者可以有多个post
mysql> CREATE TABLE `ttlsa_com`.`authors` ( -> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , -> `name` VARCHAR( 50 ) NOT NULL -> ) ENGINE = INNODB; Query OK, 0 rows affected (0.06 sec)
#类别表,一个post可以属于多个类别,一个类别可以有多个post
mysql> CREATE TABLE `ttlsa_com`.`categories` ( -> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , -> `name` VARCHAR( 50 ) NOT NULL -> ) ENGINE = INNODB; Query OK, 0 rows affected (0.10 sec)
#posts表和categories表关联表
mysql> CREATE TABLE `ttlsa_com`.`posts_categories` ( -> `post_id` INT UNSIGNED NOT NULL , -> `category_id` INT UNSIGNED NOT NULL , -> PRIMARY KEY ( `post_id` , `category_id` ) -> ) ENGINE = INNODB; Query OK, 0 rows affected (0.04 sec) mysql> INSERT INTO `posts` (`id`, `title`, `content`, `author_id`, `publish_date`) VALUES (1, 'Electronics For You', 'EFY- Electronics For You is a magazine for people with a passion for Electronics and Technology. Since the first issue in 1969, EFY has delivered the best in Product Reviews, Hardware and Software comparisons, Technical News, Analysis, Electronics news, about Products, Components, Computer Hardware,Power Supply, Industry Automation, Circuit Designing provided by electronicsforu.com.', 2, '2010-08-02 10:29:28'), (2, 'What is PHP?', 'PHP Hypertext Preprocessor (the name is a recursive acronym) is a widely used, general-purpose scripting language that was originally designed for web development to produce dynamic web pages.', 3, '2010-03-09 10:31:01'),(3, 'Nintendo', 'Games that are easy to play and fun for anyone. Nintendo are one of them major players in gaming world. They also develop computer games these days.', 4, '2010-01-05 10:39:21'),(4, 'Sony PlayStation - Full of life', 'Sony Playstation is one of the leading gaming console of modern times. They are fun to play and people of all age groups enjoy it.', 1, '2010-08-17 10:48:23'),(5, 'Namespaces in PHP 5.3', 'One of the most significant and welcome features added in PHP 5.3 was that of namespaces. While this has been around in other programming languages, namespaces have finally found their place starting with PHP 5.3.', 2, '2010-04-19 10:50:11'),(6, 'Leadership Skills', 'Leadership skill is the key to success in any field, be it software industry, automobile industry or any other business.', 2, '2009-02-09 10:55:32'),(7, 'Ruby on Rails', 'RoR is a rapid web application development framework. It was one of the first framework for developing web applications.', 4, '2010-08-13 13:44:32'),(8, 'Sphinx search engine', 'Sphinx was created by Andrew Aksyonoff and it can be used along with any programming language.', 1, '2009-04-13 13:46:11'); Query OK, 8 rows affected (0.01 sec) Records: 8 Duplicates: 0 Warnings: 0 mysql> INSERT INTO `authors` (`id`, `name`) VALUES -> (1, 'Amit Badkas'), -> (2, 'Aditya Mooley'), -> (3, 'Rita Chouhan'), -> (4, 'Dr.Tarique Sani'); Query OK, 4 rows affected (0.00 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> INSERT INTO `categories` (`id`, `name`) VALUES -> (1, 'Programming'), -> (2, 'Games'), -> (3, 'Electronics'), -> (4, 'PHP'), -> (5, 'Search'), -> (6, 'Misc'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> INSERT INTO `posts_categories` (`post_id`, `category_id`) VALUES -> (1, 1), -> (1, 2), -> (1, 3), -> (2, 1), -> (2, 4), -> (3, 2), -> (3, 3), -> (4, 2), -> (4, 3), -> (5, 1), -> (5, 4), -> (6, 6), -> (7, 1), -> (8, 1), -> (8, 5); Query OK, 15 rows affected (0.00 sec) Records: 15 Duplicates: 0 Warnings: 0
2.sphinx.conf配置如下:
# vi sphinx.conf source blog { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = ttlsa_com sql_query = \ SELECT id, title, content, UNIX_TIMESTAMP(publish_date) \ AS publish_date, author_id FROM posts sql_attr_uint = author_id sql_attr_multi = uint category_id from query; SELECT post_id, category_id FROM posts_categories sql_attr_timestamp = publish_date sql_query_info = SELECT id, title FROM posts WHERE ID=$id } index posts { source = blog path = /data/sphinx/ docinfo = extern charset_type = sbcs } indexer { mem_limit = 32M } searchd { port = 9312 log = /var/log/searchd.log query_log = /var/log/query.log read_timeout = 5 max_children = 30 pid_file = /var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 0 unlink_old = 1 }
#创建索引
# indexer --all collected 8 docs, 0.0 MB collected 15 attr values sorted 0.0 Mvalues, 100.0% done sorted 0.0 Mhits, 100.0% done total 8 docs, 1543 bytes total 0.033 sec, 46475 bytes/sec, 240.96 docs/sec total 35 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg total 11 writes, 0.000 sec, 0.6 kb/call avg, 0.0 msec/call avg
#使用sphinx搜索php并按照时间升序排序
# search php --rsort=date index 'posts': query 'php ': returned 2 matches of 2 total in 0.000 sec displaying matches: 1. document=2, weight=2678, publish_date=Tue Mar 9 10:31:01 2010, author_id=3, category_id=(1,4) id=2 title=What is PHP? 2. document=5, weight=2703, publish_date=Mon Apr 19 10:50:11 2010, author_id=2, category_id=(1,4) id=5 title=Namespaces in PHP 5.3 words: 1. 'php': 2 documents, 5 hits
#使用mysql搜索php
mysql> select * from posts where title like "%php%" or content like "%php%" order by publish_date\G *************************** 1. row *************************** id: 2 title: What is PHP? content: PHP Hypertext Preprocessor (the name is a recursive acronym) is a widely used, general-purpose scripting language that was originally designed for web development to produce dynamic web pages. author_id: 3 publish_date: 2010-03-09 10:31:01 *************************** 2. row *************************** id: 5 title: Namespaces in PHP 5.3 content: One of the most significant and welcome features added in PHP 5.3 was that of namespaces. While this has been around in other programming languages, namespaces have finally found their place starting with PHP 5.3. author_id: 2 publish_date: 2010-04-19 10:50:11 2 rows in set (0.02 sec)
#使用sphinx php api搜索
# vi search.php
<?php require_once "sphinxapi.php"; $search = new SphinxClient; $search->setServer("localhost", 9312); $search->setMatchMode(SPH_MATCH_ANY); $search->SetArrayResult ( true ); $search->setMaxQueryTime(3); $search->SetSortMode(SPH_SORT_ATTR_ASC,"publish_date"); print_r($search->query("php")); ?>
# php search.php
Array ( [error] => [warning] => [status] => 0 [fields] => Array ( [0] => title [1] => content ) [attrs] => Array ( [publish_date] => 2 [author_id] => 1 [category_id] => 1073741825 ) [matches] => Array ( [0] => Array ( [id] => 2 [weight] => 2 [attrs] => Array ( [publish_date] => 1268101861 [author_id] => 3 [category_id] => Array ( [0] => 1 [1] => 4 ) ) ) [1] => Array ( [id] => 5 [weight] => 2 [attrs] => Array ( [publish_date] => 1271645411 [author_id] => 2 [category_id] => Array ( [0] => 1 [1] => 4 ) ) ) ) [total] => 2 [total_found] => 2 [time] => 0.000 [words] => Array ( [php] => Array (=> 2 [hits] => 5 ) ) )
#搜索包含php且作者是'Rita Chouhan'
# search --rsort=date --filter author_id 3 php index 'posts': query 'php ': returned 1 matches of 1 total in 0.000 sec displaying matches: 1. document=2, weight=2678, publish_date=Tue Mar 9 10:31:01 2010, author_id=3, category_id=(1,4) id=2 title=What is PHP? words: 1. 'php': 2 documents, 5 hits
#使用sphinx php api搜索
# vi search.php
<?php require_once "sphinxapi.php"; $search = new SphinxClient; $search->setServer("localhost", 9312); $search->setMatchMode(SPH_MATCH_ANY); $search->SetArrayResult ( true ); $search->setMaxQueryTime(3); $search->SetFilter("author_id",array(3)); print_r($search->query("php")); ?>
# php search.php
Array ( [error] => [warning] => [status] => 0 [fields] => Array ( [0] => title [1] => content ) [attrs] => Array ( [publish_date] => 2 [author_id] => 1 [category_id] => 1073741825 ) [matches] => Array ( [0] => Array ( [id] => 2 [weight] => 2 [attrs] => Array ( [publish_date] => 1268101861 [author_id] => 3 [category_id] => Array ( [0] => 1 [1] => 4 ) ) ) ) [total] => 1 [total_found] => 1 [time] => 0.000 [words] => Array ( [php] => Array (=> 2 [hits] => 5 ) ) )
#搜索包含sphinx且类型属于5的所有内容(这里将使用到MVA)
# search --filter category_id 5 sphinx
index 'posts': query 'sphinx ': returned 1 matches of 1 total in 0.000 sec displaying matches: 1. document=8, weight=2795, publish_date=Mon Apr 13 13:46:11 2009, author_id=1, category_id=(1,5) id=8 title=Sphinx search engine words: 1. 'sphinx': 1 documents, 2 hits
通过mysql数据库查询:
mysql> select p.* from posts as p, posts_categories as pc where p.title like "%sphinx%" and p.id=pc.post_id and pc.category_id=5\G *************************** 1. row *************************** id: 8 title: Sphinx search engine content: Sphinx was created by Andrew Aksyonoff and it can be used along with any programming language. author_id: 1 publish_date: 2009-04-13 13:46:11 1 row in set (0.01 sec)
如需转载请注明出处:https://www.ttlsa.com/html/1354.html
评论