elasticsearch创建mapping字段索引说明

本样例版本为6.x
7.x中去掉了type，但是type依然存在，为默认值：_doc

样例

PUT paper
{
	"settings": {
		"number_of_replicas": 0,
		"number_of_shards": 2
	},
	"mappings": {
		"pap": {
			"properties": {
				"linkCount": {
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"pubDate": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text"
				},
				"publish_date": {
					"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
					"ignore_malformed": true,
					"type": "date"
				},
				"source": {
					"copy_to": "fullcontent",
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"summary": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets",
					"analyzer": "ik_max_word"
				},
				"title": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets",
					"analyzer": "ik_max_word"
				},
				"url": {
					"type": "text"
				},
				"viewCount": {
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"year": {
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"columnName": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets"
				},
				"doi": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text"
				},
				"downloadCount": {
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"enTitle": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"analyzer": "ik_max_word"
				},
				"id": {
					"store": true,
					"type": "keyword"
				},
				"journal": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets",
					"analyzer": "ik_max_word"
				},
				"keyWords": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "keyword"
				},
				"authors": {
					"properties": {
						"author": {
							"copy_to": "fullcontent",
							"type": "keyword"
						},
						"doctorId": {
							"store": true,
							"type": "keyword"
						},
						"hospitalName": {
							"copy_to": "fullcontent",
							"store": true,
							"type": "text",
							"similarity": "BM25",
							"index_options": "offsets",
							"analyzer": "ik_max_word"
						},
						"doctor_hcoid": {
							"store": true,
							"type": "keyword"
						},
						"doctor_hcpid": {
							"store": true,
							"type": "keyword"
						},
						"institution": {
							"copy_to": "fullcontent",
							"store": true,
							"type": "text",
							"similarity": "BM25",
							"index_options": "offsets",
							"analyzer": "ik_max_word"
						},
						"url": {
							"type": "text"
						}
					}
				},
				"fullcontent": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets",
					"analyzer": "ik_max_word"
				}
			}
		}
	}
}

setting：

"number_of_replicas": 0,		// 备份数
"number_of_shards": 2		//分片数

type：

字段类型概述
一级分类	二级分类	具体类型
核心类型	字符串类型	text,keyword
整数类型	integer,long,short,byte
浮点类型	double,float,half_float,scaled_float
逻辑类型	boolean
日期类型	date
范围类型	range
二进制类型	binary
复合类型	数组类型	array
对象类型	object
嵌套类型	nested
地理类型	地理坐标类型	geo_point
地理地图	geo_shape
特殊类型	IP类型	ip
范围类型	completion
令牌计数类型	token_count
附件类型	attachment
抽取类型	percolator

type:text

例：

"summary": {
	"type": "text",
	"analyzer": "ik_max_word"
}

此mapping表示：summary字段为字符串，分词器采用全词匹配。若希望采用分词匹配，则mapping应为：

"summary": {
	"type": "text",
	"analyzer": "ik_smart"
}

type:keyword

例：

"cnName": {
	"store": true,
	"type": "keyword"
},

此mapping表示：cnName字段为字符串，类型为keyword，为全词匹配，适合精确匹配查找，支持groupby。

字段需要groupby：“fielddata”: true

"linkCount": {
	"fielddata": true,
	"store": true,
	"type": "text"
}

这个字段需要 groupby 且 type 为 text 的时候，必须将 fielddata 设置为 true

“store”: true

elasticsearch将字段保存一份源文档到 _source

时间格式

"publish_date": {
	"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
	"ignore_malformed": true,
	"type": "date"
}

ignore_malformed：取值为true或false，默认值是false。若要忽略格式错误的数值，则应设置为true。

elasticsearch 中的相似度模型：“similarity”: “BM25”,

例：

"summary": {
	"copy_to": "fullcontent",
	"store": true,
	"type": "text",
	"similarity": "BM25",
	"index_options": "offsets",
	"analyzer": "ik_max_word"
},

此mapping中的 “similarity”: “BM25” 为了避免搜索词在该字段中出现的频率过高而影响评分。
比如：我们搜索fire fox，假如返回两篇文章 doc1 和 doc2，doc1 的评分为15，doc2的评分为10。但是，有可能doc1是一篇很长的关于火灾的文章；而doc2则是一篇关于firefox浏览器的使用教程。而我们的预期显然则是更偏向于后者，此时则需要在mapping中加入相似度模型。
关于BM25的理论基础：推荐阅读https://www.elastic.co/guide/cn/elasticsearch/guide/current/pluggable-similarites.html

文章由极客之音整理，本文链接：https://www.bmabk.com/index.php/post/94924.html