elasticsearch创建mapping字段索引说明

导读:本篇文章讲解 elasticsearch创建mapping字段索引说明,希望对大家有帮助,欢迎收藏,转发!站点地址:www.bmabk.com

本样例版本为6.x
7.x中去掉了type,但是type依然存在,为默认值:_doc

样例

PUT paper
{
	"settings": {
		"number_of_replicas": 0,
		"number_of_shards": 2
	},
	"mappings": {
		"pap": {
			"properties": {
				"linkCount": {
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"pubDate": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text"
				},
				"publish_date": {
					"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
					"ignore_malformed": true,
					"type": "date"
				},
				"source": {
					"copy_to": "fullcontent",
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"summary": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets",
					"analyzer": "ik_max_word"
				},
				"title": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets",
					"analyzer": "ik_max_word"
				},
				"url": {
					"type": "text"
				},
				"viewCount": {
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"year": {
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"columnName": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets"
				},
				"doi": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text"
				},
				"downloadCount": {
					"fielddata": true,
					"store": true,
					"type": "text"
				},
				"enTitle": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"analyzer": "ik_max_word"
				},
				"id": {
					"store": true,
					"type": "keyword"
				},
				"journal": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets",
					"analyzer": "ik_max_word"
				},
				"keyWords": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "keyword"
				},
				"authors": {
					"properties": {
						"author": {
							"copy_to": "fullcontent",
							"type": "keyword"
						},
						"doctorId": {
							"store": true,
							"type": "keyword"
						},
						"hospitalName": {
							"copy_to": "fullcontent",
							"store": true,
							"type": "text",
							"similarity": "BM25",
							"index_options": "offsets",
							"analyzer": "ik_max_word"
						},
						"doctor_hcoid": {
							"store": true,
							"type": "keyword"
						},
						"doctor_hcpid": {
							"store": true,
							"type": "keyword"
						},
						"institution": {
							"copy_to": "fullcontent",
							"store": true,
							"type": "text",
							"similarity": "BM25",
							"index_options": "offsets",
							"analyzer": "ik_max_word"
						},
						"url": {
							"type": "text"
						}
					}
				},
				"fullcontent": {
					"copy_to": "fullcontent",
					"store": true,
					"type": "text",
					"similarity": "BM25",
					"index_options": "offsets",
					"analyzer": "ik_max_word"
				}
			}
		}
	}
}

setting:

"number_of_replicas": 0,		// 备份数
"number_of_shards": 2		//分片数

type:

字段类型概述
一级分类	二级分类	具体类型
核心类型	字符串类型	text,keyword
整数类型	integer,long,short,byte
浮点类型	double,float,half_float,scaled_float
逻辑类型	boolean
日期类型	date
范围类型	range
二进制类型	binary
复合类型	数组类型	array
对象类型	object
嵌套类型	nested
地理类型	地理坐标类型	geo_point
地理地图	geo_shape
特殊类型	IP类型	ip
范围类型	completion
令牌计数类型	token_count
附件类型	attachment
抽取类型	percolator

type:text

例:

"summary": {
	"type": "text",
	"analyzer": "ik_max_word"
}

此mapping表示:summary字段为字符串,分词器采用全词匹配。若希望采用分词匹配,则mapping应为:

"summary": {
	"type": "text",
	"analyzer": "ik_smart"
}

type:keyword

例:

"cnName": {
	"store": true,
	"type": "keyword"
},

此mapping表示:cnName字段为字符串,类型为keyword,为全词匹配,适合精确匹配查找,支持groupby

字段需要groupby:“fielddata”: true

"linkCount": {
	"fielddata": true,
	"store": true,
	"type": "text"
}

这个字段需要 groupbytypetext 的时候,必须fielddata 设置为 true

“store”: true

elasticsearch将字段保存一份源文档到 _source

时间格式

"publish_date": {
	"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
	"ignore_malformed": true,
	"type": "date"
}

ignore_malformed:取值为true或false,默认值是false。若要忽略格式错误的数值,则应设置为true。

elasticsearch 中的相似度模型:“similarity”: “BM25”,

例:

"summary": {
	"copy_to": "fullcontent",
	"store": true,
	"type": "text",
	"similarity": "BM25",
	"index_options": "offsets",
	"analyzer": "ik_max_word"
},

此mapping中的 “similarity”: “BM25” 为了避免搜索词在该字段中出现的频率过高而影响评分。
比如:我们搜索fire fox,假如返回两篇文章 doc1 和 doc2,doc1 的评分为15,doc2的评分为10。但是,有可能doc1是一篇很长的关于火灾的文章;而doc2则是一篇关于firefox浏览器的使用教程。而我们的预期显然则是更偏向于后者,此时则需要在mapping中加入相似度模型
关于BM25的理论基础:推荐阅读https://www.elastic.co/guide/cn/elasticsearch/guide/current/pluggable-similarites.html

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/94924.html

(0)
小半的头像小半

相关推荐

极客之音——专业性很强的中文编程技术网站,欢迎收藏到浏览器,订阅我们!