本样例版本为6.x
7.x中去掉了type,但是type依然存在,为默认值:_doc
样例
PUT paper
{
"settings": {
"number_of_replicas": 0,
"number_of_shards": 2
},
"mappings": {
"pap": {
"properties": {
"linkCount": {
"fielddata": true,
"store": true,
"type": "text"
},
"pubDate": {
"copy_to": "fullcontent",
"store": true,
"type": "text"
},
"publish_date": {
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
"ignore_malformed": true,
"type": "date"
},
"source": {
"copy_to": "fullcontent",
"fielddata": true,
"store": true,
"type": "text"
},
"summary": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"title": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"url": {
"type": "text"
},
"viewCount": {
"fielddata": true,
"store": true,
"type": "text"
},
"year": {
"fielddata": true,
"store": true,
"type": "text"
},
"columnName": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets"
},
"doi": {
"copy_to": "fullcontent",
"store": true,
"type": "text"
},
"downloadCount": {
"fielddata": true,
"store": true,
"type": "text"
},
"enTitle": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"analyzer": "ik_max_word"
},
"id": {
"store": true,
"type": "keyword"
},
"journal": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"keyWords": {
"copy_to": "fullcontent",
"store": true,
"type": "keyword"
},
"authors": {
"properties": {
"author": {
"copy_to": "fullcontent",
"type": "keyword"
},
"doctorId": {
"store": true,
"type": "keyword"
},
"hospitalName": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"doctor_hcoid": {
"store": true,
"type": "keyword"
},
"doctor_hcpid": {
"store": true,
"type": "keyword"
},
"institution": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"url": {
"type": "text"
}
}
},
"fullcontent": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
}
}
}
}
}
setting:
"number_of_replicas": 0, // 备份数
"number_of_shards": 2 //分片数
type:
字段类型概述
一级分类 二级分类 具体类型
核心类型 字符串类型 text,keyword
整数类型 integer,long,short,byte
浮点类型 double,float,half_float,scaled_float
逻辑类型 boolean
日期类型 date
范围类型 range
二进制类型 binary
复合类型 数组类型 array
对象类型 object
嵌套类型 nested
地理类型 地理坐标类型 geo_point
地理地图 geo_shape
特殊类型 IP类型 ip
范围类型 completion
令牌计数类型 token_count
附件类型 attachment
抽取类型 percolator
type:text
例:
"summary": {
"type": "text",
"analyzer": "ik_max_word"
}
此mapping表示:summary字段为字符串,分词器采用全词匹配。若希望采用分词匹配,则mapping应为:
"summary": {
"type": "text",
"analyzer": "ik_smart"
}
type:keyword
例:
"cnName": {
"store": true,
"type": "keyword"
},
此mapping表示:cnName字段为字符串,类型为keyword,为全词匹配,适合精确匹配查找,支持groupby。
字段需要groupby:“fielddata”: true
"linkCount": {
"fielddata": true,
"store": true,
"type": "text"
}
这个字段需要 groupby 且 type 为 text 的时候,必须 将 fielddata 设置为 true
“store”: true
elasticsearch将字段保存一份源文档到 _source
时间格式
"publish_date": {
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
"ignore_malformed": true,
"type": "date"
}
ignore_malformed:取值为true或false,默认值是false。若要忽略格式错误的数值,则应设置为true。
elasticsearch 中的相似度模型:“similarity”: “BM25”,
例:
"summary": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
此mapping中的 “similarity”: “BM25” 为了避免搜索词在该字段中出现的频率过高而影响评分。
比如:我们搜索fire fox,假如返回两篇文章 doc1 和 doc2,doc1 的评分为15,doc2的评分为10。但是,有可能doc1是一篇很长的关于火灾的文章;而doc2则是一篇关于firefox浏览器的使用教程。而我们的预期显然则是更偏向于后者,此时则需要在mapping中加入相似度模型。
关于BM25的理论基础:推荐阅读https://www.elastic.co/guide/cn/elasticsearch/guide/current/pluggable-similarites.html
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/94924.html