Introduction#
About Me#
Hi! I’m 88888888_kota, a 2024 new graduate who joined Money Forward as an engineer in April 2024. I currently work as a Site Reliability Engineer (SRE). It’s been one year since I joined, and I’d like to reflect on this period.
About My Team#
My SRE team is primarily responsible for ensuring the availability and reliability of B2B products:
- Money Forward Cloud Accounting
- Money Forward Cloud Tax Filing
- Money Forward Cloud Invoicing
- Money Forward Cloud Partner
- STREAMED
These are our scope. Since each is built as microservices, we’re responsible for 10+ services in total.
The team currently has 5 members from diverse backgrounds (Taiwan: 1, Vietnam: 2, Indonesia: 1, Japan: 1).
Below, I’ll describe the most memorable projects. For the first six months (placement through November), please refer to my 6-month retrospective.
Work Overview#
Here are the major projects I worked on this year. If you’re interested in SRE work, this section should be particularly helpful.
Kubernetes Custom Controller Development#
Details
Overview
- Improved credential management for securely accessing AWS resources from on-premise Kubernetes clusters. Specifically, I combined EKS Pod Identity with IAM Roles Anywhere to provide secure credential delivery.
Period
- May 2024 – August 2024
Details
- I handled the entire development solo.
- During the Proof of Concept phase, I built a minimal implementation to verify feasibility using kubebuilder. After team discussions and iterating on the implementation, we deployed to production. At that point, I forked EKS Pod Identity and customized it rather than building from scratch, which improved quality.
Key Points
- Reading the source code of aws-sdk-go-v2 and EKS Pod Identity was challenging, but I overcame it through:
- aws-sdk-go-v2 is largely auto-generated by Smithy, and recognizing the patterns made it much easier.
- For EKS Pod Identity, understanding how Kubernetes Admission Controllers work was the key to unlocking the source code.
- This improvement eliminated the need for manual credential rotation, significantly reducing operational overhead.
Tech Stack
- ArgoCD, AWS, Docker, GitHub Actions, Go, Kubernetes, Terraform
Side Note
Before placement, I had a 1-on-1 with my mentor about what work I wanted to do. They matched tasks to my career goals:
- Mentor: “What career path are you envisioning?”
- Me: “I want to shift from backend toward infrastructure. I want to work with Kubernetes and AWS. And I want to go deep technically!”
- Mentor: “Then these tasks might be a good fit.”
That’s how my very first “job” as a working adult ended up being Kubernetes Custom Controller development.
On-Premise Kubernetes to EKS Migration#
Details
Overview
- Participated in a project to migrate a service from on-premise Kubernetes to EKS.
Period
- September 2024 – November 2024
Details
- As a project member, I handled various tasks including building CI/CD pipelines with GitHub Actions and CircleCI, conducting network testing between dependent microservices, and improving monitoring.
Key Points
- Mapping out service dependencies was tough.
- The architecture diagram had gaps, so I resorted to directly interviewing developers – a hands-on approach that ultimately got the job done.
- This was a cross-border project with high communication overhead, but it was a valuable experience.
Tech Stack
- ArgoCD, AWS, CircleCI, Datadog, GitHub Actions, Kubernetes, Ruby, Terraform
CI/CD Optimization#
Details
Overview
- Optimized the Terraform CI/CD pipeline, achieving a 90%+ cost reduction and 80%+ execution time reduction.
Period
- November 2024 – December 2024
Details
- We manage AWS service resources with Terraform. Nearly all services’ Terraform code lives in a single repository, with separate Terraform workspaces per service, each potentially running different Terraform versions.
- Before the optimization, changing Terraform code for one workspace would trigger lint and formatting checks across all workspaces. After the improvement, lint and fmt only run on the affected workspace.
Key Points
- I was fortunate to take on a high-impact task that nobody had gotten around to.
- I discovered the issue through a casual Slack conversation and took the initiative to solve it.
- It was an ideal flow of “discover the problem → solve it efficiently → create significant impact.”
Tech Stack
- AWS, GitHub Actions, Terraform
Aurora MySQL v2 (MySQL 5.7) to v3 (MySQL 8.0) Upgrade#
Details
Overview
- Performed an Aurora MySQL v2 (MySQL 5.7) → v3 (MySQL 8.0) upgrade for a service.
- Covered all environments (dev, staging, production) using Blue/Green deployment.
Period
- December 2024 – January 2025
Key Points
- Amazon RDS Extended Support for Aurora MySQL version 2 was incurring a cost of $0.12 per vCPU-hr.
- The upgrade had to be completed within one month before the busy season. Missing this window would mean upgrading after the busy season, during which the extended support costs would accumulate significantly.
- An unexpected error occurred during the first operation:
- In a 2-instance Aurora cluster, a failover occurred that swapped the Writer and Reader. The application then attempted to write to the Reader (the former Writer), causing errors.
- After investigating the root cause, the second operation succeeded. We completed the upgrade across all environments within the planned one-month timeframe.
Tech Stack
- AWS, Aurora MySQL, Terraform
Interrupt Tasks#
Details
Overview
- Beyond the projects above, we handle daily interrupt tasks (PR reviews, requests from other teams, etc.) on a rotation basis.
Period
- May 2024 – Present
Details
- Prioritizing and reactively handling interrupt tasks.
Key Points
- The variety of requests was initially overwhelming, and I often felt anxious about my lack of technical depth.
- With support from team members, I gradually expanded what I could handle.
- After one year, the range of requests I can confidently handle has grown dramatically compared to six months ago.
Tech Stack
- ArgoCD, AWS, CircleCI, Datadog, Docker, GitHub Actions, Go, Kafka, Kubernetes, Linux, MySQL, Ruby, Terraform, Ansible






