mongodb full text search(fts:全文搜素)是在版本2.4新加的特性。在以前的版本,是通过精确匹配和正则表达式来查询,这效率是很低的。全文索引,能够从大量的文本中搜索出所需的内容,内置多国语言和分词方法。不支持宇宙第一语言---中文。
全文索引会导致mongodb写入性能下降,因为所有字符串都要拆分,存储到不同地方。
全文索引是一种技术,并大量的使用。如搜索引擎,站内搜索等等。常使用到的全文搜索有:
lucene
sphinx
redis-search
riak search
不过,对中文的搜索不尽如人意。文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
mongodb用的是开源的snowball分词器,参见http://snowball.tartarus.org/texts/stemmersoverview.html
mongodb 全文索引支持的语言有:
danish
dutch
english
finnish
french
german
hungarian
italian
norwegian
portuguese
romanian
russian
spanish
swedish
turkish文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
如果希望使用其他语言,需要在创建索引时指定要使用的语言。默认是支持英文的。文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
> db.de.ensureIndex( {txt: "text"}, {default_language: "german"} )
1. 启用全文索引文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
> db.adminCommand( { setParameter : 1, textSearchEnabled : true } ) { "was" : false, "ok" : 1 }
默认是关闭的。否则会报错"err" : "text search not enabled"文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
2. 插入测试数据
下面是插入一些《加州旅馆》这首二十世纪非常著名的流行音乐。文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
> db.ttlsa_com.insert({"song":"1. Hotel California", "lyrics": "On a dark desert highway, cool wind in my hair. Warm smell of colitas, rising up through the air."}) > db.ttlsa_com.insert({"song":"2. Hotel California", "lyrics": "Up ahead in the distance, I saw a shimmering light. My head grew heavy and my sight grew dim."}) > db.ttlsa_com.insert({"song":"3. Hotel California", "lyrics": "Such a lovely place, Such a lovely face."}) > db.ttlsa_com.insert({"song":"4. Hotel California", "lyrics": "Some dance to remember, some dance to forget."}) > db.ttlsa_com.insert({"song":"5. Hotel California", "lyrics": "Welcome to the Hotel California"}) > db.ttlsa_com.insert({"song":"加州旅馆", "lyrics": "Welcome to the Hotel California"})
3. 创建全文索引文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
> db.ttlsa_com.ensureIndex({"song":"text", "lyrics":"text"}) 或者 > db.ttlsa_com.ensureIndex({"$**": "text"}) $**表示在所有的字符串字段上创建一个全文索引。 也可以指定权重 > db.ttlsa_com.ensureIndex({"song":"text"},{"weights":{"song": 2, "$**": 3}})
4. 查看索引信息文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
> db.system.indexes.find().toArray() [ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "testindex.ttlsa_com", "name" : "_id_" }, { "v" : 1, "key" : { "_fts" : "text", "_ftsx" : 1 }, "ns" : "testindex.ttlsa_com", "name" : "song_text_lyrics_text", "weights" : { "lyrics" : 1, "song" : 1 }, "default_language" : "english", "language_override" : "language", "textIndexVersion" : 1 } ]
5. 查询
全文索引是通过文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
db.collection.runCommand( "text", { search: <string>, filter: <document>, project: <document>, limit: <number>, language: <string> } )
来查询的,非通过find()命令来实现的。文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
> db.ttlsa_com.runCommand("text",{search:"Welcome"}) { "queryDebugString" : "welcom||||||", "language" : "english", "results" : [ { "score" : 0.6666666666666666, "obj" : { "_id" : ObjectId("5203366a1e234f712039f7ef"), "song" : "5. Hotel California", "lyrics" : "Welcome to the Hotel California" } }, { "score" : 0.6666666666666666, "obj" : { "_id" : ObjectId("5203366b1e234f712039f7f0"), "song" : "加州旅馆", "lyrics" : "Welcome to the Hotel California" } } ], "stats" : { "nscanned" : 2, "nscannedObjects" : 0, "n" : 2, "nfound" : 2, "timeMicros" : 86 }, "ok" : 1 }
取反:在要搜索的单词前面加上“-”,排除包含该单词的记录。文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
> db.ttlsa_com.runCommand("text",{search:"hotel -California"}) { "queryDebugString" : "hotel||california||||", "language" : "english", "results" : [ ], "stats" : { "nscanned" : 6, "nscannedObjects" : 0, "n" : 0, "nfound" : 0, "timeMicros" : 172 }, "ok" : 1 }
分词效果不是很理想。如文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
> db.ttlsa_com.runCommand("text",{search:"the"}) { "queryDebugString" : "||||||", "language" : "english", "results" : [ ], "stats" : { "nscanned" : 0, "nscannedObjects" : 0, "n" : 0, "nfound" : 0, "timeMicros" : 35 }, "ok" : 1 }
就搜索不到的。文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
上面说到了,mongodb是不支持中文搜索的。
要用到复杂的搜索功能,还是用sphinx。sphinx内容可以参见:https://www.ttlsa.com/?s=sphinx文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
如需转载请注明出处:mongodb 全文搜索https://www.ttlsa.com/html/2230.html文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/ 文章源自运维生存时间-https://www.ttlsa.com/mongodb/mongodb-full-text-search-mongodb-ttlsa-tutorial-series/
来自外部的引用