下载链接ES:
https://www.elastic.co/cn/downloads/elasticsearch
ES特点:
-
Elasticsearch is a distributed document store 分布式存储
-
Instead of storing information as rows of columnar data 不是将信息作为柱状数据的行存储
-
it is indexed and fully searchable in near real-time–within 1 second 近期实时搜索和完全搜索 1s内
-
An index can be thought of as an optimized collection of documents and each document is a collection of fields 可以认为索引作为优化文档 每个文档都是一个字段的集合
-
text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees. 文档存储在倒排索引中,数字和地理字段存储在BKD trees中
-
When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node.
当集群中存在多个节点时 存储的文档会分布在整个集群中,并且可以从任何节点
立即
访问
ES是如何做到这么快, 和全文检索的呢?
最最重要的一点:
-
Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches ES使用的是倒排索引的数据结构,支持非常快的全文检索
-
An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in
倒排索引 列出了任何文档中显示的每个唯一单词,并标识每个单词出现的所有文档
-
An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data
索引可以认为是文档的优化集合,每个文档都是字段的集合,这些字段是包含数据的键值对
-
By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure 默认情况下,ES对每个字段的所有数据都进行建立索引,并且每个索引字段都具有专用的优化数据结构
例如:
数据类型 数据结构 text/keyword 倒排索引 数字/地理位置 BKD tree 关系行数据库和Elasticsearch的对比
Relational DB Elasticsearch 数据库(database) 索引(indices) 表(tables) types(8 之后废弃) 行(rows) documents 字段(columns) fileds
ES7.14 新特性:
-
Cross-cluster EQL searchedit 跨集群搜索 -
Async SQL searchedit 异步搜索 -
Transforms: support for top metricsedit 转换-支持顶级指标 -
Anomaly detection: reset job APIedit 异常检测、重置作业 -
New match_only_text field typeedit 新的match_only_text 字段类型 -
More memory-efficient composite aggregationsedit 更节省内存的复合聚合 -
New migrate to data tiers routing API 新 的 迁移到数据层路由 API -
New terms enum API -
Automatic database updates for the GeoIP processoredit
移除的特性
Starting in version 7.14, Beats central management has been removed. If you’re currently using Beats central management, we recommend that you start using Fleet instead. For more information, refer to the Fleet documentation.
ES操作
新增
-
单条新增
注意
Documents sent to a data stream must have a @timestamp field
发送给数据流的文档必须具有@timestamp字段
POST logs-my_app-default/_doc
{
"@timestamp":"2021-10-20T14:13:10.000Z",
"event":{
"original":"192.0.2.42 --[10/Sep/2021:14:13 +0000] " GET /images/bg.jpg HTTP/1.0" 200 24736 "
}
}
-
多条新增
Use the _bulk 进行新增
Each line must end in a newline character (n), including the last line. – {“create”:{}}
PUT logs-my_app-default/_bulk
{"create":{}}
{ "@timestamp":"2021-09-20T14:24:10.000Z","event":{ "original": "192.0.2.242 - - [07/May/2020:16:24:32 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0" } }
{"create":{}}
{ "@timestamp": "2099-05-08T16:25:42.000Z", "event": { "original": "192.0.2.255 - - [08/May/2099:16:25:42+0000] "GET /favicon.ico HTTP/1.0" 200 3638" } }
搜索
-
搜索数据(根据 @timestamp 降序)
GET logs-my_app-default/_search
{
"query": {
"match_all": { }
},
"sort": [
{
"@timestamp": "desc"
}
]
}
-
获取指定的字段
通过fileds指定字段
GET logs-my_app-default/_search
{
"query": {
"match_all": {}
},
"fields": [
"@timestamp"
],
"_source": false,
"sort": [
{
"@timestamp":"desc"
}
]
}
-
按照日期搜索
GET logs-my_app-default/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "2021-05-05",
"lte": "2099-05-08"
}
}
},
"fields": [
"@timestamp"
],
"_source": false,
"sort": [
{
"@timestamp":"desc"
}
]
}
-
按日期的相对搜索,如:当前日期的前一天
通过使用 now/d 的方式
GET logs-my_app-default/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "now-1d/d",
"lte": "now/d"
}
}
},
"fields": [
"@timestamp"
],
"_source": false,
"sort": [
{
"@timestamp":"desc"
}
]
}
-
从非结构化内容中提取字段
您可以在搜索期间从非结构化内容中提取来自非结构化内容的运行时字段,例如日志消息
例如:获取从非结构化的内容中获取ip
“script”: “”” String sourceip=grok(‘%{IPORHOST:sourceip}.*’).extract(doc[“event.original”].value).sourceip; if(sourceip != null) emit(sourceip) “””
GET logs-my_app-default/_search
{
"runtime_mappings": {
"source.ip": {
"type": "ip",
"script": """
String sourceip=grok('%{IPORHOST:sourceip}.*').extract(doc["event.original"].value)?.sourceip;
if(sourceip != null) emit(sourceip)
"""
}
},
"query": {
"range": {
"@timestamp": {
"gte": "now",
"lte": "2099-05-08"
}
}
},
"fields": [
"@timestamp",
"source.ip"
],
"_source": false,
"sort": [
{
"@timestamp":"desc"
}
]
}
-
结合查询
使用bool
进行组合查询
GET logs-my_app-default/_search
{
"runtime_mappings": {
"source.ip": {
"type": "ip",
"script": """
String sourceip = grok('%{IPORHOST:sourceip}.*').extract(doc[ "event.original" ].value)?.sourceip;
if (sourceip != null) emit(sourceip);
"""
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now",
"lte": "2099-05-08"
}
}
},
{
"range": {
"source.ip": {
"gte": "192.0.2.0",
"lte": "192.0.2.240"
}
}
}
]
}
},
"fields": [
"@timestamp",
"source.ip"
],
"_source": false,
"sort": [
{
"@timestamp":"desc"
}
]
}
-
汇总数据
注意:
The aggregation only runs on documents that match the query 聚合仅在与查询匹配的文档上运行
下面使用聚合来计算运行时的http.response.body.bytes
字段的 average_response_siz
GET logs-my_app-default/_search
{
"runtime_mappings": {
"http.response.body.bytes": {
"type": "long",
"script": """
String bytes = grok('%{COMMONAPACHELOG}').extract(doc[ "event.original" ].value)?.bytes;
if (bytes != null) emit(Integer.parseInt(bytes));
"""
}
},
"aggs": {
"average_response_size": {
"avg": {
"field": "http.response.body.bytes"
}
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now",
"lte": "2099-05-09"
}
}
}
]
}
},
"fields": [
"@timestamp",
"http.response.bytes"
],
"_source": false,
"sort": [
{
"@timestamp":"desc"
}
]
}
结果:
-
删除数据流
DELETE _data_stream/logs-my_app-default
更多的search操作
Common search options
ES的Java API官方文档
官网地址:Elasticsearch
ES Java API地址:ES Java API
微信搜索【码上遇见你
】获取更多精彩内容
原文始发于微信公众号(码上遇见你):Elastic Search全网第一篇最新版本的解读入门,你了解吗
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/78749.html