如何實(shí)現(xiàn)基于Prometheus和Grafana的監(jiān)控平臺(tái)之運(yùn)維告警

本篇內(nèi)容介紹了“如何實(shí)現(xiàn)基于Prometheus和Grafana的監(jiān)控平臺(tái)之運(yùn)維告警”的有關(guān)知識(shí)，在實(shí)際案例的操作過程中，不少人都會(huì)遇到這樣的困境，接下來就讓小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧！希望大家仔細(xì)閱讀，能夠?qū)W有所成！

成都創(chuàng)新互聯(lián)專注于伊春網(wǎng)站建設(shè)服務(wù)及定制，我們擁有豐富的企業(yè)做網(wǎng)站經(jīng)驗(yàn)。熱誠為您提供伊春營銷型網(wǎng)站建設(shè)，伊春網(wǎng)站制作、伊春網(wǎng)頁設(shè)計(jì)、伊春網(wǎng)站官網(wǎng)定制、小程序制作服務(wù)，打造伊春網(wǎng)絡(luò)公司原創(chuàng)品牌,更為您提供伊春網(wǎng)站排名全網(wǎng)營銷落地服務(wù)。

告警方式

Grafana

新版本的Grafana已經(jīng)提供了告警配置，直接在dashboard監(jiān)控panel中設(shè)置告警即可，但是我用過后發(fā)現(xiàn)其實(shí)并不靈活，不支持變量，而且好多下載的圖表無法使用告警，所以我們不選擇使用Grafana告警，而使用Alertmanager。如何實(shí)現(xiàn)基于Prometheus和Grafana的監(jiān)控平臺(tái)之運(yùn)維告警

Alertmanager

相比于Grafana的圖形化界面，Alertmanager需要依靠配置文件實(shí)現(xiàn)，配置稍顯繁瑣，但是勝在功能強(qiáng)大靈活。接下來我們就一步一步實(shí)現(xiàn)告警通知。

告警類型

Alertmanager告警主要使用以下兩種：

郵件接收器 email_config
Webhook接收器 webhook_config，會(huì)用post形式向配置的url地址發(fā)送如下格式的參數(shù)。

 {
 "version": "2",
 "status": "<resolved|firing>",
 "alerts": [{
   "labels":  < object > ,
   "annotations":  < object > ,
   "startsAt": "<rfc3339>",
   "endsAt": "<rfc3339>"
   }]
 }

「這次主要使用郵件的方式進(jìn)行告警?！?/strong>

實(shí)現(xiàn)步驟

下載
從GitHub上下載最新版本的Alertmanager,將其上傳解壓到服務(wù)器上。tar -zxvf alertmanager-0.19.0.linux-amd64.tar.gz
配置Alertmanager

vi alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'mail.163.com:25' #郵箱發(fā)送端口
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]' #郵箱賬號(hào)
  smtp_auth_password: 'xxxxxx' #郵箱密碼
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 10s  # 最初即第一次等待多久時(shí)間發(fā)送一組警報(bào)的通知
  group_interval: 10s # 在發(fā)送新警報(bào)前的等待時(shí)間
  repeat_interval: 1h # 發(fā)送重復(fù)警報(bào)的周期 對于email配置中，此項(xiàng)不可以設(shè)置過低，否則將會(huì)由于郵件發(fā)送太多頻繁，被smtp服務(wù)器拒絕
  receiver: 'email'
receivers:
  - name: 'email'
    email_configs:
    - to: '[email protected]'

修改完成后可以使用 ./amtool check-config alertmanager.yml校驗(yàn)文件是否正確。

校驗(yàn)正確后啟動(dòng)alertmanager。nohup ./alertmanager &。（第一次啟動(dòng)可以不使用nohup靜默啟動(dòng)，方便后面查看日志）

我們只定義了一個(gè)路由，那就意味著所有由Prometheus產(chǎn)生的告警在發(fā)送到Alertmanager之后都會(huì)通過名為 email的receiver接收。實(shí)際上，對于不同級(jí)別的告警，會(huì)有不同的處理方式，因此在route中，我們還可以定義更多的子Route。具體配置規(guī)則大家可以去百度進(jìn)一步了解。

配置Prometheus
在Prometheus安裝目錄下建立rules文件夾，放置所有的告警規(guī)則文件。

alerting:
  alertmanagers:
  - static_configs:
    - targets: ['192.168.249.131:9093']

rule_files:
  - rules/*.yml

在rules文件夾下建立告警規(guī)則文件 service_down.yml,當(dāng)服務(wù)器下線時(shí)發(fā)送郵件。
groups: - name: ServiceStatus rules: - alert: ServiceStatusAlert expr: up == 0 for: 2m labels: team: node annotations: summary: "Instance {{ $labels.instance }} has bean down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes." value: "{{ $value }}"
「配置詳解」
alert：告警規(guī)則的名稱。
expr：基于PromQL表達(dá)式告警觸發(fā)條件，用于計(jì)算是否有時(shí)間序列滿足該條件。
for：評估等待時(shí)間，可選參數(shù)。用于表示只有當(dāng)觸發(fā)條件持續(xù)一段時(shí)間后才發(fā)送告警。在等待期間新產(chǎn)生告警的狀態(tài)為PENDING，等待期后為FIRING。
labels：自定義標(biāo)簽，允許用戶指定要附加到告警上的一組附加標(biāo)簽。
annotations：用于指定一組附加信息，比如用于描述告警詳細(xì)信息的文字等，annotations的內(nèi)容在告警產(chǎn)生時(shí)會(huì)一同作為參數(shù)發(fā)送到Alertmanager。
配置完成后重啟Prometheus，訪問Prometheus查看告警配置。
測試
關(guān)閉node_exporter,過2分鐘就可以收到告警郵件啦，截圖如下： Alertmanager的告警內(nèi)容支持使用模板配置，可以使用好看的模板進(jìn)行渲染，感興趣的可以試試！
The More
node exporter的一些計(jì)算語句
CPU使用率(單位為percent)
(avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
內(nèi)存已使用(單位為bytes)
node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Cached_bytes - node_memory_Buffers_bytes - node_memory_Slab_bytes
內(nèi)存使用量(單位為bytes/sec)
node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Cached_bytes - node_memory_Buffers_bytes - node_memory_Slab_bytes
內(nèi)存使用率(單位為percent)
((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Cached_bytes - node_memory_Buffers_bytes - node_memory_Slab_bytes)/node_memory_MemTotal_bytes) * 100
server1的內(nèi)存使用率(單位為percent)
((node_memory_MemTotal_bytes{instance="server1"} - node_memory_MemAvailable_bytes{instance="server1"})/node_memory_MemTotal_bytes{instance="server1"}) * 100
server2的磁盤使用率(單位為percent)
((node_filesystem_size_bytes{fstype=~"xfs|ext4",instance="server2"} - node_filesystem_free_bytes{fstype=~"xfs|ext4",instance="server2"}) / node_filesystem_size_bytes{fstype=~"xfs|ext4",instance="server2"}) * 100
uptime時(shí)間(單位為seconds)
time() - node_boot_time
server1的uptime時(shí)間(單位為seconds)
time() - node_boot_time_seconds{instance="server1"}
網(wǎng)絡(luò)流出量(單位為bytes/sec)
irate(node_network_transmit_bytes_total{device!~"lo|bond[0-9]|cbr[0-9]|veth.*"}[5m]) > 0
server1的網(wǎng)絡(luò)流出量(單位為bytes/sec)
irate(node_network_transmit_bytes_total{instance="server1", device!~"lo|bond[0-9]|cbr[0-9]|veth.*"}[5m]) > 0
網(wǎng)絡(luò)流入量(單位為bytes/sec)
irate(node_network_receive_bytes_total{device!~"lo|bond[0-9]|cbr[0-9]|veth.*"}[5m]) > 0
server1的網(wǎng)絡(luò)流入量(單位為bytes/sec)
irate(node_network_receive_bytes_total{instance="server1", device!~"lo|bond[0-9]|cbr[0-9]|veth.*"}[5m]) > 0
磁盤讀取速度(單位為bytes/sec)
irate(node_disk_read_bytes_total{device=~"sd.*"}[5m])

“如何實(shí)現(xiàn)基于Prometheus和Grafana的監(jiān)控平臺(tái)之運(yùn)維告警”的內(nèi)容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識(shí)可以關(guān)注創(chuàng)新互聯(lián)網(wǎng)站，小編將為大家輸出更多高質(zhì)量的實(shí)用文章！

名稱欄目：如何實(shí)現(xiàn)基于Prometheus和Grafana的監(jiān)控平臺(tái)之運(yùn)維告警
瀏覽路徑：http://www.jbt999.com/article6/psioog.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供ChatGPT、關(guān)鍵詞優(yōu)化、定制開發(fā)、建站公司、服務(wù)器托管、網(wǎng)站改版

廣告

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：[email protected]。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

js實(shí)現(xiàn)輪播圖效果的方法
如何在Mysql中實(shí)現(xiàn)事務(wù)ACID
mysql重新安裝的疑問問題有哪些
什么是goChannel
玩轉(zhuǎn)Koa之核心原理分析
進(jìn)入mongodbshell的方法
python中debug的方法