文章目录

一、部署Prometheus（192.168.28.5）
二、部署Exporters(192.168.28.10)
四、总结

一、部署Prometheus（192.168.28.5）

1、环境准备工作

服务器类型	IP地址	组件
Prometheus服务器	192.168.28.5	node_exporter-1.1.2.linux-amd64.tar.gz
agent服务器	192.168.28.10	node_exporter
grafana服务器	192.168.28.100	Grafana

systemctl stop firewalld
systemctl disable firewalld
setenforce 0

vim /etc/selinux/config
SELINUX=disabled

vim /etc/reslove.conf
nameserver 114.114.114.114

ntpdate ntp1.aliyun.com 		#时间同步

2、普罗米修斯的部署

prometheus下载地址：

https://prometheus.io/download/

2.1 上传 prometheus-2.37.0.linux-amd64.tar.gz 到 /opt 目录中，并解压

[root@prometheus ~]# cd /opt
[root@prometheus opt]# ls
cni  containerd  rh
[root@prometheus opt]# rz -E
rz waiting to receive.
[root@prometheus opt]# ls
cni  containerd  prometheus-2.37.0.linux-amd64.tar.gz  rh
[root@prometheus opt]# tar xf prometheus-2.37.0.linux-amd64.tar.gz 
[root@prometheus opt]# mv prometheus-2.37.0.linux-amd64 /usr/local/prometheus
[root@prometheus opt]# cd /usr/local/prometheus
[root@prometheus prometheus]# ls
console_libraries  consoles  LICENSE  NOTICE  prometheus  prometheus.yml  promtool

2.2 修改配置文件

cat /usr/local/prometheus/prometheus.yml | grep -v "^#"
global:					#用于prometheus的全局配置，比如采集间隔，抓取超时时间等
  scrape_interval: 15s			#采集目标主机监控数据的时间间隔，默认为1m
  evaluation_interval: 15s 		#触发告警生成alert的时间间隔，默认是1m
  # scrape_timeout is set to the global default (10s).
  scrape_timeout: 10s			#数据采集超时时间，默认10s
 
alerting:				#用于alertmanager实例的配置，支持静态配置和动态服务发现的机制
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
 
rule_files:				#用于加载告警规则相关的文件路径的配置，可以使用文件名通配机制
  # - "first_rules.yml"
  # - "second_rules.yml"
 
scrape_configs:			#用于采集时序数据源的配置
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"		#每个被监控实例的集合用job_name命名，支持静态配置（static_configs）和动态服务发现的机制（*_sd_configs）
 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
 
    static_configs:				#静态目标配置，固定从某个target拉取数据
      - targets: ["localhost:9090"] #静态目标配置，固定从某个target拉取数据

2.3 配置系统启动文件，设置开机自启

[root@prometheus prometheus]# vim /usr/lib/systemd/system/prometheus.service
 
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus.yml \
--storage.tsdb.path=/usr/local/prometheus/data/ \
--storage.tsdb.retention=15d \
--web.enable-lifecycle
  
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
 
[Install]
WantedBy=multi-user.target

2.4 开启prometheus,并访问网页验证

systemctl start prometheus
systemctl enable prometheus
 
netstat -natp | grep :9090
ss -natp | grep 9090
 
浏览器访问：http://192.168.28.5:9090 ，访问到 Prometheus 的 Web UI 界面
点击页面的 Status -> Targets，如看到 Target 状态都为 UP，说明 Prometheus 能正常采集到数据
http://192.168.28.5:9090/metrics ，可以看到 Prometheus 采集到自己的指标数据

默认只监控了本机一台，点 Status→点 Targets→可以看到只监控了本机：
状态为UP，说明prometheus能够正常采集到数据

通过 http:// 服务器 IP:9090/metrics 可以查看到监控的数据：

二、部署Exporters(192.168.28.10)

1、监控远程Linux主机(192.168.28.10)

在远程 linux 主机（被监控端 agent）上安装 node_exporter 组件。

下载地址：

https://prometheus.io/download/

1.1 上传 node_exporter-1.3.1.linux-amd64.tar.gz 到 /opt 目录中，并解压

cd /opt/
tar xf node_exporter-1.2.1.linux-amd64.tar.gz
cd node_exporter-1.1.2.linux-amd64/
mv node_exporter /usr/local/bin #让系统可以识别
./node_exporter --help		#查看命令可选项

1.2 配置启动文件，设置开机自启

服务开启方式一，使用systemctl控制

vim /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=mysqld_exporter
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.ntp \
--collector.mountstats \
--collector.systemd \
--collector.tcpstat
 
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
 
[Install]
WantedBy=multi-user.target

1.3 启动node_exporter

systemctl daemon-reload
systemctl start node_exporter
systemctl enable node_exporter
 
netstat -natp | grep :9100
 
浏览器访问：http://192.168.28.10:9100/metrics ，可以看到 Node Exporter 采集到的指标数据

浏览器访问：http://192.168.28.10:9100/metrics ，可以看到 Node Exporter 采集到的指标数据

1.4 修改Prometheus服务器的配置文件

回到 Prometheus 服务器的配置文件里添加被监控机器的配置段(192.168.28.5)

vim /usr/local/prometheus/prometheus.yml
 #添加静态targets才能使得server1节点加入
  - job_name: 'agent' #取名一个job来代替被监控的机器
    static_configs:
    - targets: ['192.168.28.10:9100'] #监控机器的IP，接口

改完配置文件后，重启服务

systemctl restart prometheus.service
systemctl status prometheus.service

1.5 访问prometheus服务器

回到 web 管理界面→点 Status→点 Targets→可以看到多了一台监控目标
agent被成功监控

注：也可以在本机安装 node_exporter，使用上面的方式监控本机。

2、监控远程MySQL

在被管理机 agent上安装 mysqld_exporter 组件
下载地址：

https://prometheus.io/download/

2.1 下载mysqld_exporter组件

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.linux-amd64.tar.gz

-blog.csdnimg.cn/5643a6975e1747358a0b19ffc96e283a.png)

2.2 安装mysqld_exporter组件

 tar xf mysqld_exporter-0.14.0.linux-amd64.tar.gz -C /usr/local
 mv /usr/local/mysqld_exporter-0.14.0.linux-amd64/ /usr/local/mysqld_exporter #重命名优化
 ls /usr/local/mysqld_exporter/
LICENSE  mysqld_exporter  NOTICE

2.3 安装mariadb数据库，并授权

yum install mariadb\* -y

systemctl start mariadb
systemctl enable mariadb
 
#进入数据库
mysql
 
授权IP为192.168.28.10，因为不是prometheus服务器直接来找mariadb获取数据，而是prometheus服务器找mysql_exporter，然后mysql_exporter再找mariadb.所以这个IP指的是mysql_exporter的IP.
grant select,replication client,process ON *.* to 'mysql_monitor'@'192.168.28.10' identified by '123456';
 
flush privileges;

2.4 创建一个mariadb配置文件

创建一个mariadb配置文件，写上连接的用户名和密码（和上面的授权的用户名和密码要对应）

2.5 配置启动文件，启动mysql_exporter

[root@agent ~]# vim /usr/lib/systemd/system/mysqld_exporter.service
 
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/bin/mysqld_exporter \
--collector.ntp \
--collector.mountstats \
--collector.systemd \
--collector.tcpstat
 
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
 
[Install]
WantedBy=multi-user.target

2.6 修改Prometheus服务器的配置文件

回到 Prometheus 服务器的配置文件里添加被监控的 mariadb 的配置段

vim /usr/local/prometheus/prometheus.yml
 35   - job_name: 'agent_mariadb'
 36     static_configs:
 37     - targets: ['192.168.28.10:9104']

改完配置文件之后，重启服务

2.7 访问prometheus服务器

回到 web 管理界面→点 Status→点 Targets→可以看到监控 mariadb

三、部署Grafana进行展示

Grafana 是一个开源的度量分析和可视化工具，可以通过将采集的数据分析，查询，然后进行可视化的展示，并能实现报警。

下载地址

https://grafana.com/grafana/download/

1、下载安装Grafana（192.168.28.100）

#使用yum解决依赖关系  我这边直接上传软件包到opt
yum install -y grafana-7.4.0-1.x86_64.rpm
或
rpm -ivh /opt/grafana-7.3.6-1.x86_64.rpm
 
systemctl start grafana-server
systemctl enable grafana-server
 
netstat -natp | grep :3000
 
浏览器访问：http://192.168.28.100:3000 ，默认账号和密码为 admin/admin

通过浏览器访问 http:// grafana 服务器 IP:3000 就到了登录界面，使用默认的 admin 用户，admin 密码就可以登陆了。

2、配置数据源

下面我们把 Prometheus 服务器收集的数据做为一个数据源添加到 grafana，让 grafana 可以得到 Prometheus 的数据。

3、导入模板

点击prometheus_data,选择Dashboards

4、为数据源做数据展示

自定义名称，点击保存

最后在dashboard可以查看到

注：有多条数据的时候，可以在查询的键值后面加个大括号，括号里的条件表示只匹配当前的监控项。

5、导入grafana监控面板

浏览器访问：https://grafana.com/grafana/dashboards ，在页面中搜索 node exporter ，选择适合的面板，点击 Copy ID 或者 Download JSON
 
在 grafana 页面中，+ Create -> Import ，输入面板 ID 号或者上传 JSON 文件，点击 Load，即可导入监控面板

6、Grafana 图形显示 MySQL 监控数据

在 grafana 上修改配置文件,并下载安装 mysql 监控的 dashboard（包含相关 json 文件，这些 json 文件可以看作是开发人员开发的一个监控模板）。

在grafana图形化界面导入相关的json文件

用grafana服务器上的firefox浏览器打开，方便上传



点 import 导入后，报 prometheus 数据源找不到，因为这些 json 文件里默认要找的就是叫 Prometheus 的数据源，但我们前面建立的数据源却是叫 prometheus_data。

那么请自行把原来的 prometheus_data 源改名为 Prometheus 即可（注意：第一个字母 P 是大写）。然后再回去刷新一下，就有数据了。

7、Grafana+onealert报警

Prometheus 报警需要使用 alertmanager 这个组件，而且报警规则需要手动编写（对运维来说不友好）。所以我这里选用 grafana+onealert 报警。注意：实现报警前把所有机器时间同步再检查一遍。

登陆http://www.onealert.com/→注册帐户→登入后台管理

7.1 在Grafana中配置Webhook URL

1、在Grafana中创建Notification channel，选择类型为Webhook；
2、推荐选中Send on all alerts和Include image，Cloud Alert体验更佳；
3、将第一步中生成的Webhook URL填入Webhook settings Url；
URL格式：
http://api.aiops.com/alert/api/event/grafana/v1/7a2eb59ab2d24483847b17e74bd9b255/
 
4、Http Method选择POST；
5、Send Test&Save；

在grafana增加通知通道

7.2 测试CPU负载告警

现在可以去设置一个报警来测试了（这里以我们前面加的 cpu 负载监控来做测试）

保存后就可以测试了，如果 agent1上的 cpu 负载还没有到 0.3，你可以试试 0.1，或者运行一些程序把 agent1负载调大。

最终的邮件报警效果：

我们解决问题后，只需要点关闭就行，就会当作问题被解决了