sphinx 全文搜索应用(一)

默北 Sphinx1 9,144字数 4586阅读15分17秒阅读模式

一.安装配置

安装参见: https://www.ttlsa.com/html/1236.html文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

配置如下:文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

# vi sphinx.conf文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

# source块设置数据源文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

source src1
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = sphinx
sql_port = 3306 # optional, default is 3306
sql_query = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents
sql_attr_uint = group_id
sql_attr_timestamp = date_added
sql_query_info = SELECT * FROM documents WHERE id=$id
}

# index块设置索引保存目录等文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

index test1
{
source = src1
path = /data/sphinx/
docinfo = extern
charset_type = sbcs
}

# indexer块设置索引选项,比如内存限制大小文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

indexer
{
mem_limit = 32M
}

# searchd块设置搜索索引时使用的选项文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

searchd
{
port = 9312
log = /var/log/searchd.log
query_log = /var/log/query.log
read_timeout = 5
max_children = 30
pid_file = /var/log/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 0
unlink_old = 1
}

二.新建相关表文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

mysql> CREATE TABLE documents
-> (
-> id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT,
-> group_id INTEGER NOT NULL,
-> group_id2 INTEGER NOT NULL,
-> date_added DATETIME NOT NULL,
-> title VARCHAR(255) NOT NULL,
-> content TEXT NOT NULL
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> REPLACE INTO documents ( id, group_id, group_id2, date_added, title, content ) VALUES
-> ( 1, 1, 5, NOW(), 'test one', 'this is my test document number one. also checking search within phrases.' ),
-> ( 2, 1, 6, NOW(), 'test two', 'this is my test document number two' ),
-> ( 3, 2, 7, NOW(), 'another doc', 'this is another group' ),
-> ( 4, 2, 8, NOW(), 'doc number four', 'this is to test groups' );
Query OK, 4 rows affected (0.02 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> CREATE TABLE tags
-> (
-> docid INTEGER NOT NULL,
-> tagid INTEGER NOT NULL,
-> UNIQUE(docid,tagid)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO tags VALUES
-> (1,1), (1,3), (1,5), (1,7),
-> (2,6), (2,4), (2,2),
-> (3,15),
-> (4,7), (4,40);
Query OK, 10 rows affected (0.02 sec)
Records: 10 Duplicates: 0 Warnings: 0

三.创建索引文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

# indexer --all文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

using config file '/etc/sphinxsearch/sphinx.conf'...文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

indexing index 'test1'...文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

collected 4 docs, 0.0 MB文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

sorted 0.0 Mhits, 100.0% done文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

total 4 docs, 193 bytes文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

total 0.072 sec, 2644 bytes/sec, 54.80 docs/sec文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

total 3 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

total 9 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

 文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

四.测试文章源自运维生存时间-https://www.ttlsa.com/sphinx/sphinx-search-engines/

# search test

using config file '/etc/sphinxsearch/sphinx.conf'...

index 'test1': query 'test ': returned 3 matches of 3 total in 0.022 sec

 

displaying matches:

1. document=1, weight=2421, group_id=1, date_added=Fri Jun 15 15:58:41 2012

id=1

group_id=1

group_id2=5

date_added=2012-06-15 15:58:41

title=test one

content=this is my test document number one. also checking search within phrases.

2. document=2, weight=2421, group_id=1, date_added=Fri Jun 15 15:58:41 2012

id=2

group_id=1

group_id2=6

date_added=2012-06-15 15:58:41

title=test two

content=this is my test document number two

3. document=4, weight=1442, group_id=2, date_added=Fri Jun 15 15:58:41 2012

id=4

group_id=2

group_id2=8

date_added=2012-06-15 15:58:41

title=doc number four

content=this is to test groups

 

words:

1. 'test': 3 documents, 5 hits

 

五.使用sphinx php API测试

1.将sphinxapi.php(此文件在源码api目录下)拷贝到正确位置(取决于个人环境)

# vi search.php

<?php
require_once "sphinxapi.php";
$search = new SphinxClient;
$search->setServer("localhost", 9312);
$search->setMatchMode(SPH_MATCH_ANY);
$search->SetArrayResult ( true );
$search->setMaxQueryTime(3);

print_r($search->query("test")); //"test"为要搜索的词
?>

结果如下:

Array
(
	[error] =>
	[warning] =>
	[status] => 0
	[fields] => Array
			(
			[0] => title
			[1] => content
			)

	[attrs] => Array

			(
			[group_id] => 1
			[date_added] => 2
			)

	[matches] => Array //匹配的结果
			(
			[0] => Array
				(
				[id] => 1
				[weight] => 2
				[attrs] => Array
						(
						[group_id] => 1
						[date_added] => 1339747121
						)
				)

			[1] => Array
					(
					[id] => 2
					[weight] => 2
					[attrs] => Array
							(
							[group_id] => 1
							[date_added] => 1339747121
							)
					)
			[2] => Array
					(
					[id] => 4
					[weight] => 1
					[attrs] => Array
							(
							[group_id] => 2
							[date_added] => 1339747121
							)
					)
			)

	[total] => 3
	[total_found] => 3
	[time] => 0.000
	[words] => Array
			(
			[test] => Array
					(
					
=> 3 [hits] => 5 ) ) )

说明:

=> 3有三条结果想匹配,[hits] => 5命中5次,与数据库相对应的id是1,2,4.耗时[time] => 0.000

查询数据库结果如下:

mysql> select *, (select unix_timestamp(date_added)) as ut from documents where id in (1,2,4);

+----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+------------+

| id | group_id | group_id2 | date_added | title | content | ut |

+----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+------------+

| 1 | 1 | 5 | 2012-06-15 15:58:41 | test one | this is my test document number one. also checking search within phrases. | 1339747121 |

| 2 | 1 | 6 | 2012-06-15 15:58:41 | test two | this is my test document number two | 1339747121 |

| 4 | 2 | 8 | 2012-06-15 15:58:41 | doc number four | this is to test groups | 1339747121 |

+----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+------------+

3 rows in set (0.00 sec)

因此对于大量数据,为了提高查询速度,可以先通过sphinx查找出id,再通过id来查询数据库数据。

如需转载请注明出处:https://www.ttlsa.com/html/1346.html

weinxin
我的微信
微信公众号
扫一扫关注运维生存时间公众号,获取最新技术文章~
默北
  • 本文由 发表于 15/06/2012 17:44:51
  • 转载请务必保留本文链接:https://www.ttlsa.com/sphinx/sphinx-search-engines/
评论  1  访客  0

    来自外部的引用

评论已关闭!