Originally asked in #lemmy:matrix.org
1 The Idea
I’ve been thinking about writing a website to monitor Lemmy instances, much in the same vein as lemmy-status.org, to help people like me, who are interested in the operational health of their favourite servers, have a better understanding of patterns and be notified when things go wrong.
I thought I’d share my thoughts w/ you and ask for your feedback before going down any potential rabbit hole.
1.1 Public-facing monitoring solution external to a cluster
I don’t wish to add any more complexity to a Lemmy setup. Rather I’m thinking about a solution which is totally unknown to a Lemmy server AND is publicly available.
I’m sure one could get quite a decent monitoring solution which is internal to the cluster using Prometheus+Grafana but that is not the aim of this.
1.2 A set of key endpoints
In the past there’ve been situations where a particular server’s web UI would be a 404 or 503 while the mobile clients kept happily working.
I’d like to query a server for the following major functionalities (and the RTT rate):
- web/mobile home feed
- web/mobile create post/comment
- web/mobile search
1.3 Presenting stats visually via graphs
I’d like to be able to look at the results in a visual way, preferably as graphs.
1.4 History
I think it’d be quite cool (and helpful?) to retain the history of monitoring data for a certain period of time to be able to do some basic meaningful query over the rates.
1.5 Notification
I’d like to be able to receive some sort of a notification when my favourite instance becomes slow or becomes unavailable and when it comes back online or goes back to “normal.”
2 Questions
❓ Are you folks aware if someone has already done something similar?
❓ I’m not very familiar w/ Rust (I wrote only a couple of small toy projects w/ it.) Where can I find a list of API endpoints a Lemmy server publicly exposes?
❓ If there’s no such list, which endpoints do you think would work in my case?
I stopped using my preferred instance because I couldn’t tell if it was having problems or it was my Internet. This would be very useful for people like me to sanity check things.
There does exist something similar to this: https://lemmy-status.org. It will eventually have an automatic list, but it is not implemented yet. They are currently adding instances in manually. The owner is @[email protected], one of our infra people at lemmy.world. The website is not connected to lemmy.world by any means btw.
lemmy-status.org knows my instance (lemmy.mindoki.com) but when I search for it and selects it, it just shows global fediverse data :-/
Thanks. Yes, lemmy-status.org was where I got the initial idea 💯
automatic list
For the website I’m thinking about, I’d rather keep it exclusively opt-in. I don’t wish to add any extra load since most of the instances are running off of enthusiasts’ pockets.
Oh sorry, didn’t see that 😅
much in the same vein as lemmy-status.org
I was also thinking that an opt-in or something similar would be nice. As overloading small project raspberries with a large monitoring website wouldn’t be that nice…
Even if you ping it once a minute it won’t even be noticeable IMO. When you surf (through) your Lemmy I stance there is a lot of traffic going on.
I imagine the ping would be for uptime? Or would you repeatedly scanlot of stuff? Then just do it rarely.
I still haven’t made up my mind as to what is a good interval. But I think I’ll take a per-endpoint approach, hitting more expensive ones less frequently.
So far I can only think of 4-5 endpoints/URLs that I should hit in every iteration as outlined in the post above.
web/mobile home feed
web/mobile create post/comment
web/mobile searchI think those will cover most of the usecases.
Thanks all for the input 🙏
I did a quick experiment w/ the APIs and I think I have identified the ones I’d need. Obviously, all is open source (GPLv3) available on github: lemmy-clerk
As the next step, I’m going to expose that data to Prometheus for scraping.