- A+
所属分类:大数据
首先要安装java 和 elasticsearch,相关步骤参见之前文章。
1. 安装maven
1 2 3 4 5 6 7 8 |
# wget -c http://mirror.cc.columbia.edu/pub/software/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz # tar zxvf apache-maven-3.3.3-bin.tar.gz -C /usr/local/ # cd /usr/local/ # ln -s apache-maven-3.3.3 maven # vi /etc/profile export M2_HOME=/alidata/server/maven export PATH=${M2_HOME}/bin:${PATH} # . /etc/profile |
2. 安装elasticsearch-analysis-ik
1 2 3 4 5 6 |
# wget -c https://github.com/medcl/elasticsearch-analysis-ik/archive/master.zip # unzip master.zip # cd elasticsearch-analysis-ik-master/ # cp -Rf config/* ik /usr/local/elasticsearch/config/ # mvn package # /usr/local/elasticsearch/bin/plugin --install analysis-ik --url file:///alidata/download/elasticsearch-analysis-ik-master/target/releases/elasticsearch-analysis-ik-1.4.1.zip |
3. 修改elasticsearch.yml
在该文件后面添加下面的内容
1 2 3 4 5 6 7 8 9 10 11 12 13 |
index: analysis: analyzer: ik: alias: [ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_max_word: type: ik use_smart: false ik_smart: type: ik use_smart: true index.analysis.analyzer.default.type: ik |
4. 重新启动elasticsearch
1 |
/usr/local/elasticsearch/bin/elasticsearch -d |
5. ik分词测试
创建一个索引,名为index
1 |
# curl -XPUT http://localhost:9200/index |
为索引index创建mapping
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# curl -XPOST http://localhost:9200/index/fulltext/_mapping -d' { "fulltext": { "_all": { "analyzer": "ik" }, "properties": { "content": { "type" : "string", "boost" : 8.0, "term_vector" : "with_positions_offsets", "analyzer" : "ik", "include_in_all" : true } } } }' |
测试
1 2 3 4 |
# curl 'http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d ' { "text":"www.ttlsa.com 教程太好了受益匪浅,望ttlsa越来越好" }' |
显示结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
{ "tokens" : [ { "token" : "text", "start_offset" : 4, "end_offset" : 8, "type" : "ENGLISH", "position" : 1 }, { "token" : "www.ttlsa.com", "start_offset" : 11, "end_offset" : 24, "type" : "LETTER", "position" : 2 }, { "token" : "www", "start_offset" : 11, "end_offset" : 14, "type" : "ENGLISH", "position" : 3 }, { "token" : "ttlsa", "start_offset" : 15, "end_offset" : 20, "type" : "ENGLISH", "position" : 4 }, { "token" : "com", "start_offset" : 21, "end_offset" : 24, "type" : "ENGLISH", "position" : 5 }, { "token" : "教程", "start_offset" : 25, "end_offset" : 27, "type" : "CN_WORD", "position" : 6 }, { "token" : "太好了", "start_offset" : 27, "end_offset" : 30, "type" : "CN_WORD", "position" : 7 }, { "token" : "太好", "start_offset" : 27, "end_offset" : 29, "type" : "CN_WORD", "position" : 8 }, { "token" : "好了", "start_offset" : 28, "end_offset" : 30, "type" : "CN_WORD", "position" : 9 }, { "token" : "受益匪浅", "start_offset" : 30, "end_offset" : 34, "type" : "CN_WORD", "position" : 10 }, { "token" : "受益", "start_offset" : 30, "end_offset" : 32, "type" : "CN_WORD", "position" : 11 }, { "token" : "匪", "start_offset" : 32, "end_offset" : 33, "type" : "CN_CHAR", "position" : 12 }, { "token" : "浅", "start_offset" : 33, "end_offset" : 34, "type" : "CN_CHAR", "position" : 13 }, { "token" : "望", "start_offset" : 35, "end_offset" : 36, "type" : "CN_CHAR", "position" : 14 }, { "token" : "ttlsa", "start_offset" : 36, "end_offset" : 41, "type" : "ENGLISH", "position" : 15 }, { "token" : "越来越好", "start_offset" : 41, "end_offset" : 45, "type" : "CN_WORD", "position" : 16 }, { "token" : "越来越", "start_offset" : 41, "end_offset" : 44, "type" : "CN_WORD", "position" : 17 }, { "token" : "越来", "start_offset" : 41, "end_offset" : 43, "type" : "CN_WORD", "position" : 18 }, { "token" : "越好", "start_offset" : 43, "end_offset" : 45, "type" : "CN_WORD", "position" : 19 } ] } |

微信公众号
扫一扫关注运维生存时间公众号,获取最新技术文章~
13/01/2016 下午 5:37 沙发
关闭分词怎么改,我不想要分词