Categories
DevOps Learning

Uptime monitoring with uptime kuma

After migrating this blog to self-hosted, I thought about how to monitor its up time and to get some notifications if it ever went down. I’ve been using uptimerobot for the longest time and it has worked well for many years, like at least 6 years. It’s time to self-host this as well.

I chanced upon uptime-kuma which is a popular open source self-hosted monitoring tool. Setting up was extremely easy with docker-compose and I got it up and running in minutes. Set up another simple telegram bot and now I have a pretty robust notification system if any of my sites went down.

Anyone can now see if I’ve been doing a good job at keeping my sites up at https://uptime.lordofgeeks.com.

https://uptime.lordofgeeks.com

And if anything goes down….

when I accidentally removed the DNS entry for uptime

It is so easy to setup that there’s no reason for anyone to not monitor their sites. My favourite part is that it actually looks pretty good!

The docker-compose.yml example that the repo gives is everything you need, though your mileage will certainly vary once you take into account your own networking configurations.

Give it a try and have fun monitoring your own sites!

Categories
Deployment DevOps Docker Optimization

Migrating this blog to self-hosted again

4 years ago I migrated this blog to Hostinger from a self-hosted docker instance. With the 48 month plan ending in 4 days, I went back to self-hosted once again.

Why?

Mainly cost 💸. The price has went up >230% since the last time I paid for it. It’s the difference between paying $44.16 vs $104.62 (after discounts) for 48 months of hosting. For something that I barely use, or have barely any traffic in, there’s little to no incentive for me to pay ~$2.18 usd/month for this blog.

CurrentRenewed
Yearly (usd)$44.16$104.62
Monthly (usd)$0.92$2.18

“Well surely self-hosted can’t be free right?”

You’re right, it isn’t “free” per se, but because I have a home lab server running anyway, I might as well use the spare capacity to host the blog. (again, the home lab is something I should write about, hopefully next week)

It took me about an hour to fully migrate over, it was a smooth process with a tiny bit of pain (self-inflicted carelessness).

The home lab is a mini pc, with a measly Intel N100 CPU, along with 16GB of ram and and 500GB SSD. I am shocked to find out how many services it can host comfortably, it has completely changed my view on what’s possible with these small machines.

This blog is hosted on docker as expected. But it’s a Docker container, inside of a Ubuntu VM, inside of a Proxmox host. The idle stats are pretty decent, consuming about 1GB of RAM.

NAME            CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O      
wordpress       0.01%     355.5MiB / 7.752GiB   4.48%     4.12GB / 288MB    75.8MB / 2.25GB
wordpress-db    0.71%     533.6MiB / 7.752GiB   6.72%     78.8MB / 3.6GB    3.45MB / 1.87GB

To make this work we employ the usual caching strategies and pre-load the pages to make sure that they are already cached on server and ready to go.

  • WordPress: Some kind of caching plugin, e.g. WP-Optimize + Jetpack
  • CDN: Cloudflare

While setting this up I also found out that there is Redis object caching for WordPress, but it seems like it’s only useful if my site has many reads from the database. Based on my gut feel, I doubt it, so I’m omitting Redis until the day when this setup cannot handle it anymore.

All things considered, pretty good performance!

lighthouse report from chrome: 99 performance

Of course, I’ve no idea how this’ll perform under load but given that there’s barely any dynamic content on this site, it’s unlikely that this setup will buckle under any typical loads.

To expose this blog over the internet, it’s done with Cloudflare Tunnel, which saves me the typical hassle of securing the connection to my server with origin certificates.

illustration from: https://blog.cloudflare.com/getting-cloudflare-tunnels-to-connect-to-the-cloudflare-network-with-quic

It’s secure and easy to setup, would recommend anyone who wants to host public services with it. There is one major caveat: Cloudflare would be able to see all traffic between your origin server and Cloudflare, so you have to trust Cloudflare. Honestly, it’s kind of inevitable that you have to place your trust on someone or something. Given their track record of transparency when there are downtime or when shit hits the fan, they’ve earned my trust.

While I was aiming for zero-downtime, unfortunately, there was about 10 minutes of downtime.

the importance of uptime monitoring, which I’m planning to self-host in the near future

I had my new site up and running and it was a simple DNS cutover. Unfortunately, I forgot to take into account DNS propagation time, and clients that got the old IP ended up not being able to reach the site. To be honest I still don’t understand why it failed because it should still show the old site and seamlessly switch over when the new DNS kicks in. Let me know in the comments if you have any ideas!

Summary 📖

Thanks to the beauty of virtualisation, I’ve saved myself $104.62 usd over 4 years. If this mini pc server lasts anywhere as long as that, it would’ve paid for itself plus interest (including the other services that it’s hosting).

Now., on to figuring out an automated backup solution…

Categories
Deployment DevOps Docker Learning Productivity

Miniflux: self-hosted RSS reader

In an attempt to stay more updated with the things that are happening online, I’ve recently started following the top stories on Hackernews via the Telegram channel. But I’ve very quickly realized that it is just not part of my routine to check news via telegram.

What about RSS readers? I remember using Google Reader donkey’s years ago before it was abruptly shut down and I never did get back to RSS readers from then on; probably something to do with the trauma of losing all my news feed suddenly without a good alternative.

In my search for something that just “works”, Dickson hooked me up again with another recommendation that does exactly what I ask for: works.

TLDR; it’s a very simple and opinionated RSS reader that has a self-hosted option.

Setup

It was so simple that I got the docker up and running on my Synology NAS within minutes. Here’s the docker-compose.yml file that I used to get up and running. Docs on configurable parameters

Categories
DevOps Learning Weekly

Weekly: AWS DevOps Certified

At this point I’m not sure whether to call this weekly any more cause I’m just haphazardly writing roughly on a weekly basis but damn it I’m just going to keep this going.

I am pleased to say that I have finally passed my AWS DevOps Engineer – Professional certification! It was quite a lot of hard work, like it was honestly harder than I expected it to be cause most of the questions were situational and very AWS specific in-terms of CICD. Honestly, I took this because I thought it would be easier compared to the Solution Architect Professional. But man I was wrong.

This also means that I would probably be looking to pick up the CSAP cert when I have the time for it, perhaps at the end of the year.

It has been a long time since I’ve studied so hard for something, and it was really helpful not just for the exam, but I realized that there were a lot of tools/services I could’ve used for my current team that we weren’t using yet. I think we are very capable in designing functional services, but there’s still a gap between change management and having full visibility over everything. I’m planning to apply some of the things I’ve learnt in my team, cause it helps to bring us one step closer to having DevOps as culture.

Categories
DevOps Learning Weekly

Weekly: Microsoft Azure

Took an online introductory course (Udemy) on Microsoft Azure AZ-900 because lo-and-behold, my team has chosen the Azure platform for our translation services (will write more about this next time).

As someone who has been 99.99% working on the AWS platform and Linux systems in general, Azure feels pretty foreign because most of the concepts seem to tie into the Windows systems more so than anything else.

  • Access control? Active Directory
  • RBAC? Active Directory
  • Networking? Virtual networks
  • Pricing? Subscriptions
  • Compliance? Almost everything under the roof

The main difference I find between AWS and Azure is that: AWS is a loose collection of services that are “grouped” through networking, Azure is a logical collection of services that are “grouped” by “folders” of resources.

Categories
Deployment DevOps Learning Weekly

Weekly: Migration

The past week has been extremely exciting and nerve-wrecking. My team has finally completed the migration from on-premise to the cloud. It’s the first time that I’ve done anything like this and I’m blessed to have someone senior to lead us through the migration period.

ps: I wrote but forgot to post so this was actually 2-3 weeks ago

I’m a part of the MyCareersFutureSG team, so our users are the working population of Singapore, and we host hundreds of thousands of job postings, so there are definitely some challenge in migrating the data.

It’s the first time that I’ve handled such huge amounts of data when migrating across platform and the validation and verification process is really scary, especially when we couldn’t get the two checksum to match. It’s also the first time that I’ve done multiple Kubernetes cluster base image upgrade rollover. There were multiple occasions where we were scared that the cluster will completely crash but it managed to survive the transition.

Let me sum up the things I’ve learnt over the migration.

  • When faced with large amount of data, divide and conquer. Split data into smaller subsets so that you have enough resource to compute.
  • When rolling nodes, having two separate auto scaling groups will allow you to test the new image before rolling every single node.
  • If you want to tweak the ASG itself, detach all the nodes first so that you will have an “unmanaged” cluster, then no matter what you do to the existing ASG, at least your cluster will still stay up.
  • When your database tells you that the checksum doesn’t match, make sure that when you dump the data, it’s in the right collation, or right encoding format
  • Point your error pages at a static provider like S3, because if you point it at some live resource, there’s a chance that a mis-configuration will show an ugly 503 message. (something that happened briefly for us)
  • Data less than 100GB is somewhat reasonable to migrate over the internet these days
  • Running checksum hash on thousands and thousands files is quite computationally and memory intensive, provision enough resources for it.

Overall, the migration actually went over quite well and we completed ahead of time. Of course, the testing afterwards is where we find bugs that we have never found before because it’s the first time in years that so many eyes are on the system at the same time.

The smoothness is also thanks to the team who has carefully planned the steps required to migrate the data over, as well as setup streaming backups to the new infrastructure so that half of the data is already in place and we just need to verify that the streamed data is bit perfect.

Since it’s been a couple of weeks since this happened, I realize that I am lucky to be blessed with the opportunity to do something like this. Cause I’ve just caught up with my friends and most of the times, their job scopes don’t really allow them to do something that far out of scope. Which… depending on your stage of life it could be viewed as a pro/con. I’m definitely viewing this 4 day migration effort over a public holiday weekend as positive cause it’s something not everyone can experience so early on in their career!

Categories
Development DevOps Weekly

Weekly: building CICD pipelines

The past week has been spent trying to build a centralized Gitlab CICD repository for all services to bootstrap and standardize on.

I’m happy to announce that it has been open sourced! https://gitlab.com/mycf.sg/central-cicd

What’s a centralized CI? It’s basically a template repository for CI pipelines. In this case, it’s for Gitlab because I’m familiar with it and it’s what I’m working with day in day out.

This idea started with my previous project team, but is slowly maturing as I figure out the various cases that it might be used/useful and tweak it accordingly. What it has currently is more of a MVP and POC that it can be used across various projects on Gitlab. You know that because the versioning currently only support patch and not minor/major bumps. It has something to do with how my current team does versioning but it’s the top of my list for things to improve.

Currently there are 4 repositories relying on the CCI, 2 of which are external but still within my control. Features will be incrementally added onto it, and I hope that this could really be something that would help people reduce the amount of time/complexity to build pipelines.

Categories
DevOps Keyboard Learning Weekly

Weekly: AWS and Keyboards

As I am helping another team part time to setup some infra on AWS, I felt my fundamental AWS knowledge being tested all over again. I’ve gotten so used to doing the more “tricky/complex” things that when starting from fresh, got tripped up by some basic setup.

  • Internet facing ELB must have public subnet associated
  • As long as the each AZ has a public subnet associated, the ELB will be able to route to the AZ
  • Public subnets must have IGW, NAT not counted
  • NAT instance must be created in a subnet which has IGW
  • ELB does not need to be in the same subnet as Target Group to route to it
  • ELB needs at least a /27 subnet
  • ELB reserves 8 IP in the subnet for autoscaling
  • NLB does not load balance cross-zone by default
  • ALB load balance cross-zone by default
  • Smallest subnet in AWS is /28
  • OpenVPN Access Server needs EIP
  • OpenVPN Access Server needs to setup through SSH first

While I wasn’t the one who setup the bulk on the networking, I wasn’t able to quickly pinpoint the exact reason why I was unable to get connectivity for the VPN that I was setting up. Just proves that there are some fundamental concepts that I need brushing up on.

On happier news, I finally bought/receive the lube for my future keyboard. Over the weekends I decided to try lubing my current Filco TKL keyboard without disassembly to see how it works/feels.

Categories
DevOps Learning Thoughts Weekly

Weekly: It has been a week?

The past week has been pretty hectic changing between roles as a dev and ops, helping out with other projects till 2-3am every day has really taken its toll and I feel old.

Unsurprisingly, I haven’t been able to really work on any of my own projects but I did learn something interesting that I wish to write about.

Recently facing an issue on Gitlab CI pipeline, where I want to run integration/regression tests on the latest docker build. However, since each image is meant to be production ready, it means that it will be ran as a non-root user. Which means that it will restrict what the user can do when the container starts. Here’s why this problem has caused me such a headache.

Beware, below is really more of a rant about the troubles I faced.