Prometheus
-
安装prometheus-operator
wget https://github.com/prometheus-operator/prometheus-operator/releases/download/v0.64.0/bundle.yaml
kubectl create -f bundle.yaml
-
创建示例应用
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
replicas: 3
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: example-app
image: fabxc/instrumented_app
ports:
- name: web
containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
labels:
app: example-app
name: example-app
spec:
ports:
- name: 8080-8080
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: example-app
type: NodePort
-
创建Service和Pod监控对象
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app
labels:
team: frontend
spec:
selector:
matchLabels:
app: example-app
endpoints:
- port: 8080-8080
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: example-app
labels:
team: frontend
spec:
selector:
matchLabels:
app: example-app
podMetricsEndpoints:
- port: web
-
部署Prometheus和监控示例应用的Service和Pod
如果在K8S集群上激活了RBAC授权,则必须先创建RBAC规则,并且提前获得Prometheus服务帐户。
接下来创建服务帐户和所需的集群角色和集群角色绑定:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
-
创建Prometheus对象,并定义选择监控指定标签的ServiceAccount和PodMonitor,最后暴露Prometheus
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
replicas: 3
serviceMonitorSelector:
matchLabels:
team: frontend
podMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false # 如果要开启管理API可设置为true
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- name: web
nodePort: 30900
port: 9090
protocol: TCP
targetPort: web
selector:
prometheus: prometheus
Alertmanager
对于警报组件,Prometheus Operator引入了2个资源对象:
-
Alertmanager资源对象,它的作用是允许用户以声明的方式描述警报管理器群集。 -
AlertmanagerConfig资源对象,它的作用是允许用户以声明方式描述警报管理器配置。
先准备好警报管理器的配置,也就是创建AlertmanagerConfig资源对象。接着部署有3个副本的警报管理器集群,并使用该警报管理器的配置,最后暴露警报管理器
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: config-example
labels:
alertmanagerConfig: example
spec:
route:
groupBy: ['...']
groupWait: 1s
groupInterval: 1s
repeatInterval: 1000d
receiver: 'webhook'
receivers:
- name: 'webhook'
webhookConfigs:
- url: 'http://192.168.11.254:5001/webhook' # 接收到的警报还要往这个API发送,接收告警的API请见下面的webhook.py代码
sendResolved: true
---
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: example
spec:
replicas: 3
alertmanagerConfiguration: # 此处为全局配置
name: config-example
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager-example
spec:
type: NodePort
ports:
- name: web
nodePort: 30903
port: 9093
protocol: TCP
targetPort: web
selector:
alertmanager: example
接收警报消息的py代码:
import json
from flask import Flask, request
app = Flask(__name__)
@app.route('/webhook', methods=['POST'])
def send():
try:
data = json.loads(request.data)
print(data)
except Exception as e:
print(e)
return 'finish ok ...'
if __name__ == '__main__':
app.run(debug=False, host='0.0.0.0', port=5001)
接收到的警报消息之所以要推给另外的接口,其目的是还可以对警报消息做进一步的处理。然后再往其它地方推送警报,比如钉钉、邮件等等。
Prometheus和Alertmanager整合
在之前创建Prometheus对象的yaml文件的基础上,拿过来改造:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
replicas: 3
alerting: # 主要是改造这里,此处与警报管理器整合
alertmanagers:
- namespace: default
name: alertmanager-example
port: web
serviceMonitorSelector:
matchLabels:
team: frontend
podMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- name: web
nodePort: 30900
port: 9090
protocol: TCP
targetPort: web
selector:
prometheus: prometheus
整合后,在Prometheus的页面中 Status > Runtime & Build Information 下会看到已经发现到了3个警报管理器实例

如果没有成功发现,请检查配置是否正确。
Rule
规则发现机制:
-
默认情况下,Prometheus资源仅发现在同一命名空间中的规则 -
默认情况下,如果spec.ruleSelector字段为nil,则表示不匹配任何规则 -
要发现所有命名空间中的规则,可以给ruleNamespaceSelector字段传空字典,例如ruleNamespaceSelector: {} -
若要从与特定标签匹配的所有命名空间中发现规则,可以使用matchLabels
-
创建警报规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
prometheus: example
role: alert-rules
name: prometheus-example-rules
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
出于演示的目的,这条规则始终会触发警报。方便验证是否正常工作。
-
开始部署Prometheus规则
之前已经将Prometheus和Alertmanageryaml进行整合,接下来,继续在这个yaml的基础上改造。
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
replicas: 3
alerting:
alertmanagers:
- namespace: default
name: alertmanager-example
port: web
serviceMonitorSelector:
matchLabels:
team: frontend
podMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false
ruleSelector:
matchLabels:
role: alert-rules
prometheus: example
ruleNamespaceSelector: {} # 要为PrometheusRules发现选择的命名空间。如果未指定,则仅使用与普罗米修斯对象所在的命名空间相同的命名空间。
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- name: web
nodePort: 30900
port: 9090
protocol: TCP
targetPort: web
selector:
prometheus: prometheus
在Prometheus的页面中 Status > Rules 下会看到这条示例规则

在Alertmanager的页面中,也有了警报消息,说明警报组件已经接收到了Prometheus发送过来的警报,不过貌似时区有点不对。

综合测试
测试步骤:
-
修改或添加警报规则 -
观察Prometheus是否能发现到新的规则 -
警报触发后观察Alertmanager能否正常接收到警报 -
警报触发后观察Alertmanager能否正常推送到其它接口
测试结果截图:



原文始发于微信公众号(不背锅运维):云原生监控:Prometheus Operator,一文带你打通全流程:监控、规则、警报。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/149550.html