无法将 K8s 服务添加为 prometheus 目标

如何解决无法将 K8s 服务添加为 prometheus 目标

我希望我的 prometheus 服务器从 pod 中抓取指标。

我按照以下步骤操作:

  1. 使用部署创建了一个 pod - kubectl apply -f sample-app.deploy.yaml
  2. 使用 kubectl apply -f sample-app.service.yaml
  3. 暴露相同的内容
  4. 使用 helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml 部署 Prometheus 服务器
  5. 使用 kubectl apply -f service-monitor.yaml 创建了一个 serviceMonitor 来为 prometheus 添加目标。

所有 Pod 都在运行,但是当我打开 prometheus 仪表板时,在仪表板 UI 的 status>targets 下,我没有看到 sample-app service 作为 prometheus 目标。

我已验证以下内容

  1. 当我执行 sample-app 时我可以看到 kubectl get servicemonitors
  2. 我可以在 /metrics
  3. 下看到示例应用以普罗米修斯格式公开指标

此时我进一步调试,进入prometheus pod使用 kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh ,并看到 proemetheus 配置 (/etc/config/prometheus.yml) 没有将 sample-app 作为作业之一,因此我使用

编辑了 configmap

kubectl edit cm prometheus-server -o yaml添加

    - job_name: sample-app
        static_configs:
        - targets:
          - sample-app:8080

假设所有其他字段(例如 scraping 间隔),scrape_timeout 保持认。

我可以看到 /etc/config/prometheus.yml 中也反映了相同的情况,但 prometheus 仪表板仍然没有将 sample-app 显示为 status>targets 下的目标。

以下是 prometheus-server 和服务监视器的 yaml。

apiVersion: apps/v1
kind: Deployment
Metadata:
  annotations:
    autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","name":"prometheus-server"}]},"modified":true}'
    deployment.kubernetes.io/revision: "1"
    Meta.helm.sh/release-name: prometheus
    Meta.helm.sh/release-namespace: prom
  creationTimestamp: "2021-06-24T10:42:31Z"
  generation: 1
  labels:
    app: prometheus
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-14.2.1
    component: server
    heritage: Helm
    release: prometheus
  name: prometheus-server
  namespace: prom
  resourceVersion: "6983855"
  selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
  uid: <some-uid>
spec:
  progressDeadlineseconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: prometheus
      component: server
      release: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    Metadata:
      creationTimestamp: null
      labels:
        app: prometheus
        chart: prometheus-14.2.1
        component: server
        heritage: Helm
        release: prometheus
    spec:
      containers:
      - args:
        - --volume-dir=/etc/config
        - --webhook-url=http://127.0.0.1:9090/-/reload
        image: jimmidyson/configmap-reload:v0.5.0
        imagePullPolicy: IfNotPresent
        name: prometheus-server-configmap-reload
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
          requests:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
        securityContext:
          capabilities:
            drop:
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
          readOnly: true
      - args:
        - --storage.tsdb.retention.time=15d
        - --config.file=/etc/config/prometheus.yml
        - --storage.tsdb.path=/data
        - --web.console.libraries=/etc/prometheus/console_libraries
        - --web.console.templates=/etc/prometheus/consoles
        - --web.enable-lifecycle
        image: quay.io/prometheus/prometheus:v2.26.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/healthy
            port: 9090
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 15
          successthreshold: 1
          timeoutSeconds: 10
        name: prometheus-server
        ports:
        - containerPort: 9090
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/ready
            port: 9090
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 5
          successthreshold: 1
          timeoutSeconds: 4
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
          requests:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
        securityContext:
          capabilities:
            drop:
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
        - mountPath: /data
          name: storage-volume
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: prometheus-server
      serviceAccountName: prometheus-server
      terminationGracePeriodSeconds: 300
      volumes:
      - configMap:
          defaultMode: 420
          name: prometheus-server
        name: config-volume
      - name: storage-volume
        persistentVolumeClaim:
          claimName: prometheus-server
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2021-06-24T10:43:25Z"
    lastUpdateTime: "2021-06-24T10:43:25Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2021-06-24T10:42:31Z"
    lastUpdateTime: "2021-06-24T10:43:25Z"
    message: replicaset "prometheus-server-65b759cb95" has successfully progressed.
    reason: NewreplicasetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

用于服务监视器的 yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
Metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","Metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
  creationTimestamp: "2021-06-24T07:55:58Z"
  generation: 2
  labels:
    app: sample-app
    release: prometheus
  name: sample-app
  namespace: prom
  resourceVersion: "6904642"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
  uid: <some-uid>
spec:
  endpoints:
  - port: http
  selector:
    matchLabels:
      app: sample-app
      release: prometheus 

解决方法

您需要使用包含 Prometheus 运算符的 prometheus-community/kube-prometheus-stack 图表,以便根据 ServiceMonitor 资源自动更新 Prometheus 的配置。

您使用的 prometheus-community/prometheus 图表不包括 Prometheus 运算符,该运算符监视 Kubernetes API 中的 ServiceMonitor 资源并相应地更新 Prometheus 服务器的 ConfigMap。

您的集群中似乎安装了必要的 CustomResourceDefinitions (CRD),否则您将无法创建 ServiceMonitor 资源。这些未包含在 prometheus-community/prometheus 图表中,因此可能之前已将它们添加到您的集群中。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?
Java在半透明框架/面板/组件上重新绘画。
Java“ Class.forName()”和“ Class.forName()。newInstance()”之间有什么区别?
在此环境中不提供编译器。也许是在JRE而不是JDK上运行?
Java用相同的方法在一个类中实现两个接口。哪种接口方法被覆盖?
Java 什么是Runtime.getRuntime()。totalMemory()和freeMemory()?
java.library.path中的java.lang.UnsatisfiedLinkError否*****。dll
JavaFX“位置是必需的。” 即使在同一包装中
Java 导入两个具有相同名称的类。怎么处理?
Java 是否应该在HttpServletResponse.getOutputStream()/。getWriter()上调用.close()?
Java RegEx元字符(。)和普通点?