Categories
DevOps Learning Weekly

Weekly: AWS DevOps Certified

At this point I’m not sure whether to call this weekly any more cause I’m just haphazardly writing roughly on a weekly basis but damn it I’m just going to keep this going.

I am pleased to say that I have finally passed my AWS DevOps Engineer – Professional certification! It was quite a lot of hard work, like it was honestly harder than I expected it to be cause most of the questions were situational and very AWS specific in-terms of CICD. Honestly, I took this because I thought it would be easier compared to the Solution Architect Professional. But man I was wrong.

This also means that I would probably be looking to pick up the CSAP cert when I have the time for it, perhaps at the end of the year.

It has been a long time since I’ve studied so hard for something, and it was really helpful not just for the exam, but I realized that there were a lot of tools/services I could’ve used for my current team that we weren’t using yet. I think we are very capable in designing functional services, but there’s still a gap between change management and having full visibility over everything. I’m planning to apply some of the things I’ve learnt in my team, cause it helps to bring us one step closer to having DevOps as culture.

Categories
DevOps Learning Weekly

Weekly: Microsoft Azure

Took an online introductory course (Udemy) on Microsoft Azure AZ-900 because lo-and-behold, my team has chosen the Azure platform for our translation services (will write more about this next time).

As someone who has been 99.99% working on the AWS platform and Linux systems in general, Azure feels pretty foreign because most of the concepts seem to tie into the Windows systems more so than anything else.

  • Access control? Active Directory
  • RBAC? Active Directory
  • Networking? Virtual networks
  • Pricing? Subscriptions
  • Compliance? Almost everything under the roof

The main difference I find between AWS and Azure is that: AWS is a loose collection of services that are “grouped” through networking, Azure is a logical collection of services that are “grouped” by “folders” of resources.

Categories
Deployment DevOps Learning Weekly

Weekly: Migration

The past week has been extremely exciting and nerve-wrecking. My team has finally completed the migration from on-premise to the cloud. It’s the first time that I’ve done anything like this and I’m blessed to have someone senior to lead us through the migration period.

ps: I wrote but forgot to post so this was actually 2-3 weeks ago

I’m a part of the MyCareersFutureSG team, so our users are the working population of Singapore, and we host hundreds of thousands of job postings, so there are definitely some challenge in migrating the data.

It’s the first time that I’ve handled such huge amounts of data when migrating across platform and the validation and verification process is really scary, especially when we couldn’t get the two checksum to match. It’s also the first time that I’ve done multiple Kubernetes cluster base image upgrade rollover. There were multiple occasions where we were scared that the cluster will completely crash but it managed to survive the transition.

Let me sum up the things I’ve learnt over the migration.

  • When faced with large amount of data, divide and conquer. Split data into smaller subsets so that you have enough resource to compute.
  • When rolling nodes, having two separate auto scaling groups will allow you to test the new image before rolling every single node.
  • If you want to tweak the ASG itself, detach all the nodes first so that you will have an “unmanaged” cluster, then no matter what you do to the existing ASG, at least your cluster will still stay up.
  • When your database tells you that the checksum doesn’t match, make sure that when you dump the data, it’s in the right collation, or right encoding format
  • Point your error pages at a static provider like S3, because if you point it at some live resource, there’s a chance that a mis-configuration will show an ugly 503 message. (something that happened briefly for us)
  • Data less than 100GB is somewhat reasonable to migrate over the internet these days
  • Running checksum hash on thousands and thousands files is quite computationally and memory intensive, provision enough resources for it.

Overall, the migration actually went over quite well and we completed ahead of time. Of course, the testing afterwards is where we find bugs that we have never found before because it’s the first time in years that so many eyes are on the system at the same time.

The smoothness is also thanks to the team who has carefully planned the steps required to migrate the data over, as well as setup streaming backups to the new infrastructure so that half of the data is already in place and we just need to verify that the streamed data is bit perfect.

Since it’s been a couple of weeks since this happened, I realize that I am lucky to be blessed with the opportunity to do something like this. Cause I’ve just caught up with my friends and most of the times, their job scopes don’t really allow them to do something that far out of scope. Which… depending on your stage of life it could be viewed as a pro/con. I’m definitely viewing this 4 day migration effort over a public holiday weekend as positive cause it’s something not everyone can experience so early on in their career!

Categories
Development DevOps Weekly

Weekly: building CICD pipelines

The past week has been spent trying to build a centralized Gitlab CICD repository for all services to bootstrap and standardize on.

I’m happy to announce that it has been open sourced! https://gitlab.com/mycf.sg/central-cicd

What’s a centralized CI? It’s basically a template repository for CI pipelines. In this case, it’s for Gitlab because I’m familiar with it and it’s what I’m working with day in day out.

This idea started with my previous project team, but is slowly maturing as I figure out the various cases that it might be used/useful and tweak it accordingly. What it has currently is more of a MVP and POC that it can be used across various projects on Gitlab. You know that because the versioning currently only support patch and not minor/major bumps. It has something to do with how my current team does versioning but it’s the top of my list for things to improve.

Currently there are 4 repositories relying on the CCI, 2 of which are external but still within my control. Features will be incrementally added onto it, and I hope that this could really be something that would help people reduce the amount of time/complexity to build pipelines.

Categories
DevOps Keyboard Learning Weekly

Weekly: AWS and Keyboards

As I am helping another team part time to setup some infra on AWS, I felt my fundamental AWS knowledge being tested all over again. I’ve gotten so used to doing the more “tricky/complex” things that when starting from fresh, got tripped up by some basic setup.

  • Internet facing ELB must have public subnet associated
  • As long as the each AZ has a public subnet associated, the ELB will be able to route to the AZ
  • Public subnets must have IGW, NAT not counted
  • NAT instance must be created in a subnet which has IGW
  • ELB does not need to be in the same subnet as Target Group to route to it
  • ELB needs at least a /27 subnet
  • ELB reserves 8 IP in the subnet for autoscaling
  • NLB does not load balance cross-zone by default
  • ALB load balance cross-zone by default
  • Smallest subnet in AWS is /28
  • OpenVPN Access Server needs EIP
  • OpenVPN Access Server needs to setup through SSH first

While I wasn’t the one who setup the bulk on the networking, I wasn’t able to quickly pinpoint the exact reason why I was unable to get connectivity for the VPN that I was setting up. Just proves that there are some fundamental concepts that I need brushing up on.

On happier news, I finally bought/receive the lube for my future keyboard. Over the weekends I decided to try lubing my current Filco TKL keyboard without disassembly to see how it works/feels.

Categories
DevOps Learning Thoughts Weekly

Weekly: It has been a week?

The past week has been pretty hectic changing between roles as a dev and ops, helping out with other projects till 2-3am every day has really taken its toll and I feel old.

Unsurprisingly, I haven’t been able to really work on any of my own projects but I did learn something interesting that I wish to write about.

Recently facing an issue on Gitlab CI pipeline, where I want to run integration/regression tests on the latest docker build. However, since each image is meant to be production ready, it means that it will be ran as a non-root user. Which means that it will restrict what the user can do when the container starts. Here’s why this problem has caused me such a headache.

Beware, below is really more of a rant about the troubles I faced.