Hello selfhosters.

We all have bare-metal servres, VPS:es, containers and other things running. Some of them may be exposed openly to the internet, which is populated by autonomous malicious actors, and some may reside on a closed-off network since they contain sensitive data.

And there is a lot of solutions to monitor your servers, since none of us want our resources to be part of a botnet, or mine bitcoins for APTs, or simply have confidential data fall into the wrong hands.

Some of the tools I’ve looked at for this task are check_mk, netmonitor, monit: all of there monitor metrics such as CPU, RAM and network activity. Other tools such as Snort or Falco are designed to particularly detect suspicious activity. And there also are solutions that are hobbled together, like fail2ban actions together with pushover to get notified of intrusion attempts.

So my question to you is - how do you monitor your servers and with what tools? I need some inspiration to know what tooling to settle on to be able that detect unwanted external activity on my resources.

  • StritEnglish
    arrow-up
    15
    arrow-down
    0
    ·
    9 months ago
    link
    fedilink

    I’m pretty old school, but as I only have 1 server, I just use ssh, df, du and top.

    • Deebster
      cake
      English
      arrow-up
      12
      arrow-down
      0
      ·
      9 months ago
      link
      fedilink

      Not even htop? That is old school.

      • beta_testerEnglish
        arrow-up
        12
        arrow-down
        0
        ·
        9 months ago
        link
        fedilink

        Not even btop? That’s middle school.

        • SamsyEnglish
          arrow-up
          6
          arrow-down
          0
          ·
          9 months ago
          link
          fedilink

          Not even bottom? That’s elementary school.

  • Avid AmoebaEnglish
    arrow-up
    7
    arrow-down
    0
    ·
    9 months ago
    edit-2
    9 months ago
    link
    fedilink

    Prometheus.

    It’s open source, it’s easy to setup, its agents are available for nearly anything including OpenWrt, it can serve the simplest use case of “is it down” as well as much more complicated ones that stem from its ability to collect data over time.

    Personally I’m monitoring:

    • Is it up?
    • Is the storage array healthy?
    • Are the services I care about running?

    I used to run it ephemerallly - wiping data on restart. Recently started persisting its data so I can see data over the longer run.

    • surewhynotlemEnglish
      arrow-up
      2
      arrow-down
      0
      ·
      9 months ago
      link
      fedilink

      What do you use to see the data? Prometheus itself is easy to set up, but getting to the data seemed complicated.

      • Avid AmoebaEnglish
        arrow-up
        2
        arrow-down
        0
        ·
        9 months ago
        edit-2
        9 months ago
        link
        fedilink

        The Prometheus built-in web UI. I find it pretty simple.

      • ludEnglish
        arrow-up
        1
        arrow-down
        0
        ·
        9 months ago
        link
        fedilink

        You can use grafana to visualise the data.

        Grafana isn’t too hard to use.

  • MystikIncarnateEnglish
    arrow-up
    5
    arrow-down
    0
    ·
    9 months ago
    link
    fedilink

    I’m a network guy, so everything in my labs use SNMP because it works with everything. Things that don’t support SNMP are usually replaced and yeeted off the nearest bridge.

    For that I use librenms. Simple, open source, and I find it easy to use, for the most part. I put it on a different system than what I’m monitoring because if it shares fate with everything else, it’s not going to be very useful or give me any alerts if there’s a full outage of my main homelab cluster.

    Of course, access from the internet to it, is forbidden, and any SNMP is filtered by my firewall. Nothing really gets through for it, so I’m unconcerned about it becoming a target. For the rest of my systems security is mostly reliant on a small set of reverse proxies and firewall rules to keep everything secure.

    I use a couple of VPN systems to access the servers remotely, all running on odd ports (if they need port forwards at all). I have multiple to provide redundancy to my remote access, so if one VPN isn’t working due to a crash or something, I have others that should get me some measure of access.

  • its_me_gbEnglish
    arrow-up
    5
    arrow-down
    0
    ·
    9 months ago
    edit-2
    9 months ago
    link
    fedilink

    Prometheus for metrics

    Loki for logs

    Grafana for dashboards.

    I use node exporter for host metrics (Proxmox/VMs/SFFs/RaspPis/Router) and a number of other *exporters:

    • exportarr
    • plex-exporter
    • unifi-exporter
    • bitcoin node exporter

    I use the OpenTelemetry collector to collect some of the above metrics, rather than Prometheus itself, as well as docker logs and other log files before shipping them to Prometheus/Loki.

    Oh, I also scrape metrics from my Traefik containers using OTEL as well.

    • nameliviaEnglish
      arrow-up
      2
      arrow-down
      0
      ·
      9 months ago
      link
      fedilink

      What does having OpenTelemetry improve? I have a setup similar to yours but data goes from Prometheus to Grafana and I never thought I would need anything else.

      • its_me_gbEnglish
        arrow-up
        5
        arrow-down
        0
        ·
        9 months ago
        link
        fedilink

        Not a whole lot to be honest. But I work with OpenTelemetry everyday for my day job, so it was a little exercise for me.

        Though, OTEL does have some advantages in that It is a vendor agnostic collection tool. allowing you to use multiple different collection methods and switch out your backend easily if you wish.

  • drktEnglish
    arrow-up
    4
    arrow-down
    0
    ·
    9 months ago
    link
    fedilink

    Sometimes I just sit and stare at my apache access logs because I’m bored

    GoAccess is pretty nice for a broad overview of Apache logs, also.

    For other services I generally just look at them every now and then and if something looks off I investigate. I found a cryptominer on my network once because it was spamming DNS and that shows up in DNS logs.

    • BakkodaEnglish
      arrow-up
      2
      arrow-down
      0
      ·
      9 months ago
      link
      fedilink

      I used to use some logging script made in Go where you could filter your logs and they would update in real time. Was great for catching stuck processes, leave it running on a different desktop, mousewheel over to it (i miss openbox so so much) and check my logs. I just have nothing facing outwards now so i ignore everything.

    • Big PEnglish
      arrow-up
      3
      arrow-down
      0
      ·
      9 months ago
      link
      fedilink

      What’s crowded? I am having trouble searching for it because of its name

      • ArchyEnglish
        arrow-up
        2
        arrow-down
        0
        ·
        9 months ago
        link
        fedilink

        crowdsec, pretty sure what’s meant

  • vegetaaaaaaaEnglish
    arrow-up
    3
    arrow-down
    0
    ·
    9 months ago
    edit-2
    9 months ago
    link
    fedilink

    Netdata (agent only/not the cloud-based features), and a bunch of scanners running from cron/systemd timers, rsyslog for logs (and graylog for larger setups)

    My base ansible role for monitoring.

    Since your question is also related to securing your setup, inspect and harden the configuration of all running services and the OS itself. Here is my common ansible role for basic stuff. Find (prefereably official) hardening guides for your distribution and implement hardening guidelines such as DISA STIG, CIS benchmarks, ANSSI guides, etc.

  • JonnyJaapEnglish
    arrow-up
    3
    arrow-down
    0
    ·
    9 months ago
    link
    fedilink

    I used zabbix at some point, but I never looked at the data so I stopped. Zabbix shows all kind of stuff.

    I have cockpit on my bare-metal that has some stats, and netdata on my firewall, I do not track any of my VM’s (except vnstat that runs on everything device).

  • loudwhisperEnglish
    arrow-up
    3
    arrow-down
    0
    ·
    9 months ago
    link
    fedilink

    I run Prometheus on a separate cluster, so I plug my servers with node_exporter and scrape metrics. I then alert with grafana. To be honest, the setup is heavier (resource usage-wise) than I would like for my use case, but it’s what I am used to, and scales well to multiple machines.

  • MrMcGasionEnglish
    arrow-up
    3
    arrow-down
    1
    ·
    9 months ago
    link
    fedilink

    I’ve dabbled with some monitoring tools in the past, but never really stuck with anything proper for very long. I usually notice issues myself. I self-host my own custom new-tab page that I use across all my devices and between that, Nextcloud clients, and my home-assistant reverse proxy on the same vps, when I do have unexpected downtime, I usually notice within a few minutes.

    Other than that I run fail2ban, and have my vps configured to send me a text message/notification whenever someone successfully logs in to a shell via ssh, just in case.

    Based on the logs over the years, most bots that try to login try with usernames like admin or root, I have root login disabled for ssh, and the one account that can be used over ssh has a non-obvious username that would also have to be guessed before an attacker could even try passwords, and fail2ban does a good job of blocking ips that fail after a few tries.

    If I used containers, I would probably want a way to monitor them, but I personally dislike containers (for myself, I’m not here to “yuck” anyone’s “yum”) and deliberately avoid them.

  • johntashEnglish
    arrow-up
    2
    arrow-down
    0
    ·
    9 months ago
    link
    fedilink

    UptimeKuma is great, I use it for the simple “are my services up? and is what I pay most attention to.

    I still use zabbix for finer grained monitors though like checking raid status, smartctl, disk space, temperatures, etc.

    I’ve been trying out librenms with more custom snmp checks too and am considering going that route instead of zabbix in the future

  • DecronymBEnglish
    arrow-up
    1
    arrow-down
    0
    ·
    9 months ago
    edit-2
    9 months ago
    link
    fedilink

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    DNS Domain Name Service/System
    SSL Secure Sockets Layer, for transparent encryption
    VPN Virtual Private Network
    VPS Virtual Private Server (opposed to shared hosting)

    3 acronyms in this thread; the most compressed thread commented on today has 9 acronyms.

    [Thread #421 for this sub, first seen 10th Jan 2024, 14:55] [FAQ] [Full list] [Contact] [Source code]

  • TheGreenGolemEnglish
    arrow-up
    1
    arrow-down
    0
    ·
    9 months ago
    link
    fedilink

    It cannot notify you, you have to check it manually, but: I use DaRemote on my phone to periodically check my bare metal.

  • taladarEnglish
    arrow-up
    1
    arrow-down
    0
    ·
    9 months ago
    link
    fedilink

    Icinga2 works reasonably well for us. It is easy to write new checks as small shell scripts (or any other binary that can print and set and exit status code).