https://www.confluent.io/blog/monitor-kafka-clusters-with-prometheus-grafana-and-confluent/
This document is prepared to set up the monitoring stack using Ansible as much as possible, Prometheus and Grafana have been tested to work on an Air-Gap (no internet access) environment providing the binaries from the official site, using the playbooks from GitHub user (0x0I)[https://github.com/O1ahmad]. Prometheus Node exporter playbook, provided by (Clud Alchemy)[https://github.com/cloudalchemy] has not bee tested on an Air-Gap environment, it doesn't mean it couldn't work.
Prometheus scrape metrics from an http endpoint that need to be exposed on the target hosts. This endpoint is exposed by a jmxexporter javaagent that can be enabled using CP-Ansible, adding the following configuration to the inventory host file.
#### Monitoring Configuration ####
jmxexporter_enabled: true
jmxexporter_url_remote: false
jmxexporter_jar_url: ~/jmx/jmx_prometheus_javaagent-0.17.2.jarThe jmx-exporter jar file can be downloaded from Maven central
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.17.2/jmx_prometheus_javaagent-0.17.2.jarThe configuration for the exporter is already provided by cp-ansible and also available in the jmx-monitoring-stacks, in shared-assets/jmx-exporter
Each service can be configured with a different port for the exporter endpoint, these are default values per component:
- Zookeepers 8079
- Brokers 8080
- SchemaRegistry 8078
- Connect 8077
- KSQL 8076
After running the CP-Ansible playbook you can test the endpoint using:
curl http://<component-host>:<component-port>Use jmx-monitoring-stacks playbook (under jmxexporter-prometheus-grafana/cp-ansible) to create the scrape targets information for Prometheus.
From the cp-ansible directory mentioned above... ansible-playbook -i inventory.yml prometheus-config.yml -e env=environment-name
ansible-playbook -i ~/inventories/sasl-rbac-env1.yml prometheus-config.yml -e env=primary -e node_exporter_enabled=true
ansible-playbook -i ~/inventories/sasl-rbac-env2.yml prometheus-config.yml -e env=dr -e node_exporter_enabled=trueThe generated file in the example above are the scrape_config for prometheus that must copied in to prometheus playbook configuration in the next step
The next playbooks for installing Prometheus and Grafana, have a dependency over a playbook to prepare systemd processes. This can be installed as:
ansible-galaxy role install 0x0i.systemdOr for an Air Gap ansible host, use a two step process .
- Download and package the playbook on a host with access to GitHub
git clone https://github.com/0x0I/ansible-role-systemd
tar -czvf 0x0i.systemd ansible-role-systemd- Copy the file
0x0i.systemdto the ansible controller host and install the role
ansible-galaxy role install 0x0i.systemdThe binaries for Prometheus are available here: https://prometheus.io/download/#prometheus
The default execution of the playbook downloads the binary from the above url, for AirGap environments, it is assumed that this is downloaded to a ~/downloads folder (change the inventory file below)
The playbooks for this step are provided by https://github.com/0x0I/ansible-role-prometheus
For the latest version of the playbook, download and package from GitHub
git clone https://github.com/0x0I/ansible-role-prometheus.git
tar -czvf 0x0i.prometheus ansible-role-prometheusInstall using ansible-galaxy
ansible-galaxy role install 0x0I.prometheusThe is a wrapper playbook to launch the install role
## File: install-prometheus.yml
---
- name: Installing Prometheus on hosted machine
hosts: prometheus
gather_facts: true
tasks:
- name: Create temp dir for binary
file:
path: "/tmp/prometheus"
state: directory
mode: "0666"
- name: Copy prometheus binary
copy:
src: "{{ prometheus_local_binary }}"
dest: "{{ prometheus_remote_binary }}"
mode: "0666"
- import_role:
name: 0x0i.prometheus
vars:
archive_url: "file://{{ prometheus_remote_binary }}"
archive_checksum: '' This inventory file will contain the scrapping targets configuration prepared in "Prepare Scrapping targets for the environment" section
# prometheus-inventory.yml
---
all:
vars:
ansible_connection: ssh
ansible_user: dfederico
ansible_become: true
ansible_ssh_private_key_file: ~/.ssh/id_rsa
ansible_python_interpreter: /usr/bin/python3
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
prometheus_archive_name: prometheus-2.37.6.linux-amd64
prometheus_local_binary: "~/downloads/{{ prometheus_archive_name }}.tar.gz"
prometheus_remote_binary: "/tmp/prometheus/{{ prometheus_archive_name }}.tar.gz"
prometheus_config:
scrape_configs:
- job_name: "zookeeper"
static_configs:
- targets:
- "dfederico-demo-zk-0:8079"
labels:
env: "primary"
- targets:
- "dfederico-demo-dr-zk-0:8079"
labels:
env: "dr"
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
- job_name: "kafka-broker"
static_configs:
- targets:
- "dfederico-demo-broker-0:8080"
- "dfederico-demo-broker-1:8080"
- "dfederico-demo-broker-2:8080"
labels:
env: "primary"
- targets:
- "dfederico-demo-dr-broker-0:8080"
- "dfederico-demo-dr-broker-1:8080"
- "dfederico-demo-dr-broker-2:8080"
labels:
env: "dr"
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
- job_name: "schema-registry"
static_configs:
- targets:
- "dfederico-demo-sr-0:8078"
labels:
env: "primary"
- targets:
- "dfederico-demo-dr-sr-0:8078"
labels:
env: "dr"
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
- job_name: "kafka-connect"
static_configs:
- targets:
- "dfederico-demo-connect-0:8077"
labels:
env: "primary"
- targets:
- "dfederico-demo-dr-connect-0:8077"
labels:
env: "dr"
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
prometheus:
hosts:
dfederico-demo-extra-0:
The above examples set up two environment, the job names match the services as they will be used in grafana, and the different targets are labelled with an env tag each.
ansible-playbook -i prometheus-inventory.yml install-prometheus.ymlOn any or all host (or using ansible) check that node-exporter process is running
sudo systemctl status prometheus
ansible -i prometheus-inventory.yml prometheus -m shell -a "systemctl status prometheus.service"You can check system.d event with journalctl or similar
journalctl -f -u prometheus.serviceOther commands:
systemctl cat prometheus.serviceDefault config file: /etc/prometheus/prometheus.yml
Query endpoint default / or /status endpoint (port 9090)
Open a webbrowser to port 9090 to /targets endpoint to confirm all the scraping works
The binaries for Grafana are available here: https://grafana.com/grafana/download/9.5.3?edition=oss
The default execution of the playbook downloads the binary from the above url, for AirGap environments, it is assumed that this is downloaded to a ~/downloads folder (change the inventory file below)
The playbooks for this step are provided by https://github.com/0x0I/ansible-role-grafana
For the latest version of the playbook, download and package from GitHub
git clone https://github.com/0x0I/ansible-role-grafana
tar -czvf 0x0i.grafana ansible-role-grafanaInstall using ansible-galaxy
ansible-galaxy role install 0x0i.grafanaThe is a wrapper playbook to launch the install role
## File: install-grafana.yml
---
- name: Installing Grafana on hosted machine
hosts: grafana
gather_facts: true
tasks:
- name: Create temp dir for binary
file:
path: "/tmp/grafana"
state: directory
mode: "0666"
- name: Copy grafana binary
copy:
src: "{{ grafana_local_binary }}"
dest: "{{ grafana_remote_binary }}"
mode: "0666"
- import_role:
name: 0x0i.grafana
vars:
archive_url: "file://{{ grafana_remote_binary }}"
archive_checksum: '' ansible-playbook -i ~/inventories/gcp-sandbox/sasl-rbac-env1.yml grafana-install.yml(Optional) This can be merged with the prometheus-inventory.yml
# grafana-inventory.yml
---
all:
vars:
ansible_connection: ssh
ansible_user: dfederico
ansible_become: true
ansible_ssh_private_key_file: ~/.ssh/id_rsa
ansible_python_interpreter: /usr/bin/python3
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
grafana_archive_name: grafana-9.4.3.linux-amd64
grafana_local_binary: "~/downloads/{{ grafana_archive_name }}.tar.gz"
grafana_remote_binary: "/tmp/grafana/{{ grafana_archive_name }}.tar.gz"
grafana_config:
# section [security]
security:
admin_user: admin
admin_password: admin-secret
grafana:
hosts:
dfederico-demo-extra-0:Note/change the admin user and password above that should be used to access the WebGUI
ansible-playbook -i grafana-inventory.yml install-grafana.ymlOn any or all host (or using ansible) check that grafan process is running
sudo systemctl status grafana.service
ansible -i grafana-inventory.yml grafana -m shell -a "systemctl status grafana.service"You can check system.d event with journalctl or similar
journalctl -f -u grafana.serviceOther commands:
systemctl cat grafana.serviceOpen the grafana UI on port 3000 using a web browser and authenticate using the configured user
Create a Prometheus DataSource, usually http://localhost:9090, connectivity is tests when saving the configuration
Import each Dashboard from JMX monitoring Stacks from ConfluentInc GitHub Repository. Under (jmxexporter-prometheus-grafana/assets/grafana/provisioning/dashboards) folder
This is an optional component for hardware and OS metrics exposed by *NIX kernels, since most environment already have a monitoring agent for node resources (cpu, memory, disk, etc). Note: the playbook provided by Cloud Alchemy has not been tested on an Air-Gap environment.
First, install the playbooks from Cloud Alchemy (recommended from https://github.com/prometheus/node_exporter )
ansible-galaxy install cloudalchemy.node_exporterOn an Air-Gap environment you clone the repository from GitHub on a host with internet access, tar the folder and ship it to the AirGapped Ansible controller
git clone https://github.com/cloudalchemy/ansible-node-exporter.git
tar -czvf cloudalchemy.node_exporter ansible-node-exporterThe above create a compressed file named cloudalchemy.node_exporter as expected by the role name. On the Ansible Controller host, install the package with ansible-galaxy.
ansible-galaxy role install cloudalchemy.node_exporterThis creates the role in ansible galaxy (usually at ~/.ansible/roles), you can check the installed role using:
ansible-galaxy role list## File: node-exporter-install.yml
- hosts: all
pre_tasks:
- name: Create node_exporter cert dir
file:
path: "/etc/node_exporter"
state: directory
owner: node-exp
group: node-exp
- name: Copy certificate
copy:
src: ~/inventories/ssl/generated/server-demo.pem
dest: /etc/node_exporter/tls.pem
mode: "0640"
owner: "node-exp"
group: "node-exp"
- name: Copy certificate-Key
copy:
src: ~/inventories/ssl/generated/server-demo-key.pem
dest: /etc/node_exporter/tls.key
mode: "0640"
owner: "node-exp"
group: "node-exp"
roles:
- cloudalchemy.node_exporter
vars:
node_exporter_tls_server_config:
cert_file: /etc/node_exporter/tls.pem
key_file: /etc/node_exporter/tls.keyNOTE: Needs a "base" run to create the user first, on the second run it would then copy the files and re-configure
ansible-playbook -i ~/inventories/host-env1.yml node-exporter-install.ymlOn any or all host (or using ansible) check that node-exporter process is running
sudo systemctl status node_exporter.service
ansible -i ~/inventories/sasl-rbac-env1.yml all -m shell -a "systemctl status node_exporter.service"You can check system.d event with journalctl or similar
journalctl -f -u node_exporter.serviceOther commands:
systemctl cat node_exporter.serviceQuery node-exporter default /metrics endpoint (port 9100)
curl -k https://broker-1:9100/metrics
curl --cacert ~/inventories/ssl/generated/CAcert.pem https://broker-1:9100/metrics


