如何解决无法将 K8s 服务添加为 prometheus 目标
我希望我的 prometheus 服务器从 pod 中抓取指标。
我按照以下步骤操作:
- 使用部署创建了一个 pod -
kubectl apply -f sample-app.deploy.yaml
- 使用
kubectl apply -f sample-app.service.yaml
暴露相同的内容
- 使用
helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
部署 Prometheus 服务器 - 使用
kubectl apply -f service-monitor.yaml
创建了一个 serviceMonitor 来为 prometheus 添加目标。
所有 Pod 都在运行,但是当我打开 prometheus 仪表板时,在仪表板 UI 的 status>targets 下,我没有看到 sample-app service 作为 prometheus 目标。
我已验证以下内容:
- 当我执行
sample-app
时我可以看到kubectl get servicemonitors
- 我可以在
/metrics
下看到示例应用以普罗米修斯格式公开指标
此时我进一步调试,进入prometheus pod使用
kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh
,并看到 proemetheus 配置 (/etc/config/prometheus.yml) 没有将 sample-app 作为作业之一,因此我使用
kubectl edit cm prometheus-server -o yaml
已添加
- job_name: sample-app
static_configs:
- targets:
- sample-app:8080
假设所有其他字段(例如 scraping 间隔),scrape_timeout 保持默认。
我可以看到 /etc/config/prometheus.yml 中也反映了相同的情况,但 prometheus 仪表板仍然没有将 sample-app
显示为 status>targets 下的目标。
以下是 prometheus-server 和服务监视器的 yaml。
apiVersion: apps/v1
kind: Deployment
Metadata:
annotations:
autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","name":"prometheus-server"}]},"modified":true}'
deployment.kubernetes.io/revision: "1"
Meta.helm.sh/release-name: prometheus
Meta.helm.sh/release-namespace: prom
creationTimestamp: "2021-06-24T10:42:31Z"
generation: 1
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
name: prometheus-server
namespace: prom
resourceVersion: "6983855"
selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
uid: <some-uid>
spec:
progressDeadlineseconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
Metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.5.0
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successthreshold: 1
timeoutSeconds: 10
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successthreshold: 1
timeoutSeconds: 4
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-06-24T10:43:25Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-24T10:42:31Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: replicaset "prometheus-server-65b759cb95" has successfully progressed.
reason: NewreplicasetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
用于服务监视器的 yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
Metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","Metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
creationTimestamp: "2021-06-24T07:55:58Z"
generation: 2
labels:
app: sample-app
release: prometheus
name: sample-app
namespace: prom
resourceVersion: "6904642"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
uid: <some-uid>
spec:
endpoints:
- port: http
selector:
matchLabels:
app: sample-app
release: prometheus
解决方法
您需要使用包含 Prometheus 运算符的 prometheus-community/kube-prometheus-stack
图表,以便根据 ServiceMonitor 资源自动更新 Prometheus 的配置。
您使用的 prometheus-community/prometheus
图表不包括 Prometheus 运算符,该运算符监视 Kubernetes API 中的 ServiceMonitor 资源并相应地更新 Prometheus 服务器的 ConfigMap。
您的集群中似乎安装了必要的 CustomResourceDefinitions (CRD),否则您将无法创建 ServiceMonitor 资源。这些未包含在 prometheus-community/prometheus
图表中,因此可能之前已将它们添加到您的集群中。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。