elasticsearch 安装 ik 中文分词

sunny 大数据1 17,1663字数 3012阅读10分2秒阅读模式

首先要安装java 和 elasticsearch,相关步骤参见之前文章。

1. 安装maven

# wget -c http://mirror.cc.columbia.edu/pub/software/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
# tar zxvf apache-maven-3.3.3-bin.tar.gz -C /usr/local/
# cd /usr/local/
# ln -s apache-maven-3.3.3 maven
# vi /etc/profile
export M2_HOME=/alidata/server/maven
export PATH=${M2_HOME}/bin:${PATH}
# . /etc/profile

2. 安装elasticsearch-analysis-ik

# wget -c https://github.com/medcl/elasticsearch-analysis-ik/archive/master.zip
# unzip master.zip 
# cd elasticsearch-analysis-ik-master/
# cp -Rf config/* ik /usr/local/elasticsearch/config/
# mvn package
# /usr/local/elasticsearch/bin/plugin --install analysis-ik --url file:///alidata/download/elasticsearch-analysis-ik-master/target/releases/elasticsearch-analysis-ik-1.4.1.zip

3. 修改elasticsearch.yml

在该文件后面添加下面的内容文章源自运维生存时间-https://www.ttlsa.com/bigdata/elasticsearch-analysis-ik-chinese/

index:
  analysis:                   
    analyzer:      
      ik:
          alias: [ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
          type: ik
          use_smart: false
      ik_smart:
          type: ik
          use_smart: true
index.analysis.analyzer.default.type: ik

4. 重新启动elasticsearch

/usr/local/elasticsearch/bin/elasticsearch -d

5. ik分词测试

创建一个索引,名为index文章源自运维生存时间-https://www.ttlsa.com/bigdata/elasticsearch-analysis-ik-chinese/

# curl -XPUT http://localhost:9200/index

为索引index创建mapping文章源自运维生存时间-https://www.ttlsa.com/bigdata/elasticsearch-analysis-ik-chinese/

# curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
{
    "fulltext": {
             "_all": {
            "analyzer": "ik"
        },
        "properties": {
            "content": {
                "type" : "string",
                "boost" : 8.0,
                "term_vector" : "with_positions_offsets",
                "analyzer" : "ik",
                "include_in_all" : true
            }
        }
    }
}'

测试文章源自运维生存时间-https://www.ttlsa.com/bigdata/elasticsearch-analysis-ik-chinese/

# curl 'http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d '
{
"text":"www.ttlsa.com 教程太好了受益匪浅,望ttlsa越来越好"
}'

显示结果如下:文章源自运维生存时间-https://www.ttlsa.com/bigdata/elasticsearch-analysis-ik-chinese/

{
  "tokens" : [ {
    "token" : "text",
    "start_offset" : 4,
    "end_offset" : 8,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "www.ttlsa.com",
    "start_offset" : 11,
    "end_offset" : 24,
    "type" : "LETTER",
    "position" : 2
  }, {
    "token" : "www",
    "start_offset" : 11,
    "end_offset" : 14,
    "type" : "ENGLISH",
    "position" : 3
  }, {
    "token" : "ttlsa",
    "start_offset" : 15,
    "end_offset" : 20,
    "type" : "ENGLISH",
    "position" : 4
  }, {
    "token" : "com",
    "start_offset" : 21,
    "end_offset" : 24,
    "type" : "ENGLISH",
    "position" : 5
  }, {
    "token" : "教程",
    "start_offset" : 25,
    "end_offset" : 27,
    "type" : "CN_WORD",
    "position" : 6
  }, {
    "token" : "太好了",
    "start_offset" : 27,
    "end_offset" : 30,
    "type" : "CN_WORD",
    "position" : 7
  }, {
    "token" : "太好",
    "start_offset" : 27,
    "end_offset" : 29,
    "type" : "CN_WORD",
    "position" : 8
  }, {
    "token" : "好了",
    "start_offset" : 28,
    "end_offset" : 30,
    "type" : "CN_WORD",
    "position" : 9
  }, {
    "token" : "受益匪浅",
    "start_offset" : 30,
    "end_offset" : 34,
    "type" : "CN_WORD",
    "position" : 10
  }, {
    "token" : "受益",
    "start_offset" : 30,
    "end_offset" : 32,
    "type" : "CN_WORD",
    "position" : 11
  }, {
    "token" : "匪",
    "start_offset" : 32,
    "end_offset" : 33,
    "type" : "CN_CHAR",
    "position" : 12
  }, {
    "token" : "浅",
    "start_offset" : 33,
    "end_offset" : 34,
    "type" : "CN_CHAR",
    "position" : 13
  }, {
    "token" : "望",
    "start_offset" : 35,
    "end_offset" : 36,
    "type" : "CN_CHAR",
    "position" : 14
  }, {
    "token" : "ttlsa",
    "start_offset" : 36,
    "end_offset" : 41,
    "type" : "ENGLISH",
    "position" : 15
  }, {
    "token" : "越来越好",
    "start_offset" : 41,
    "end_offset" : 45,
    "type" : "CN_WORD",
    "position" : 16
  }, {
    "token" : "越来越",
    "start_offset" : 41,
    "end_offset" : 44,
    "type" : "CN_WORD",
    "position" : 17
  }, {
    "token" : "越来",
    "start_offset" : 41,
    "end_offset" : 43,
    "type" : "CN_WORD",
    "position" : 18
  }, {
    "token" : "越好",
    "start_offset" : 43,
    "end_offset" : 45,
    "type" : "CN_WORD",
    "position" : 19
  } ]
}
文章源自运维生存时间-https://www.ttlsa.com/bigdata/elasticsearch-analysis-ik-chinese/文章源自运维生存时间-https://www.ttlsa.com/bigdata/elasticsearch-analysis-ik-chinese/
weinxin
我的微信
微信公众号
扫一扫关注运维生存时间公众号,获取最新技术文章~
sunny
  • 本文由 发表于 23/10/2015 01:05:24
  • 转载请务必保留本文链接:https://www.ttlsa.com/bigdata/elasticsearch-analysis-ik-chinese/
  • Elasticsearch
  • elasticsearch-analysis-ik
  • ik
评论  1  访客  1
    • 叮里个咚
      叮里个咚 0

      关闭分词怎么改,我不想要分词

    评论已关闭!