Ride the Wave
I posted this picture to a slack channel at work a few weeks ago:
And I got attention (as in, a slack reaction) from several in the company, including the CEO. My boss, responding to or developing the momentum, suggested I talk about the effort in the next team presentation to management. I was surprised, as I post a lot on slack and rarely receive such a response.
What is the image of? The time to load a key page on our website. In theory, an improvement of this magnitude should make a meaningful impact to the business bottom line. So how did I accomplish this sizeable feat?
Engineering initiatives
My company follows a fairly standard prioritization process. Product managers tend to a ticket queue, from which engineers pull to get their next task or project.
What makes my company a little different is the autonomy given to engineers. We're encouraged to exercise our judgment to tickets on the queue. If the effort is under-estimated, and the payoff over-estimated, a ticket can be passed back to product managers for re-consideration.
Engineers are also in charge of the more technical projects, think typical maintenance tasks like routine package upgrades. The lines get blurred at technical initiatives that may have a business payoff, but don't really fall into the realm of our product managers.
One good example is site reliability and performance. The only times a product manager has expressed interest in these topics is when there's a problem, i.e. the website is crashing or so slow as to harm our operations.
Exploratory coding
I like to tinker and poke around in the code base. Usually, I'm on the hunt for security vulnerabilities. On this occasion I was investigating the website performance of a particular page. I became intrigued by the slow performance, and eagerly tore the code apart, looking for the culprit.
This type of work is important and under-appreciated and under-discussed. Engineers need time to let their minds wander, to figure out how that one system works, to prove to themselves that bug fix from last year is still working, and to try to prove their coworker wrong about that argument in the pull request (hey, not all motivation is pure at heart!).
I was able to diagnose an obvious performance regression on this page. The caching implementation was broken, in addition to unnecessary data processing. Basically, it took me an hour to fix, and an extra hour to double check my code.
Socializing wins
My fix was released at the end of the week. The next week I checked to see if it had made a meaningful impact. I should note that one coworker who tested the fix claimed that there was no performance improvement on his machine - a classic eye twitch generating comment!
To verify the site performance, I had to jump through hoops to download production logs, and I even installed new software to generate the graph. I'll come back to this.
As I gain maturity in my career, I am putting more effort into the social aspect of my work. As an engineer, it's easy for me to respond to a problem by writing code and creating a pull request. It's not easy for me to go through the trouble of proposing an idea to the team, building consensus, delegating, and then following through and verifying results. Much easier to create a pull request, even if it never does get merged..
I took this small opportunity to post about the performance win to a public slack channel. I expected a small kudos from my boss and maybe my fellow engineers. Getting the CEO's eyes was welcome - and another opportunity!
Ride the Wave
My team has had its current logging setup for years, before I joined. Its functionality is limited, and it's not available to all engineers nor the product managers. To do anything remotely interesting requires going through its HTTP API. That is how I generated my graph above: I wrote a script to download the logs, convert them to a friendlier csv format, uploaded the csv to my local apache superset instance, and was then able to plot web metrics over time. Talk about barriers to entry!
I believe in setting people up for success. When it's this challenging to analyze production performance metrics, engineers just aren't going to do it. To that end, I seized the opportunity of the CEO and my boss's interest in performance, and advocated for the integration of a superior logging setup. The new tooling would be available to everyone on the team and make it easy to explore and investigate site performance - in addition to other metrics logged. The cost is negligly higher than the current product. Win-win!
I am pleased to say that work has begun on integrating the new logging tooling. I am proud of myself for identifying and seizing the opportunity to make this improvement for my team and the business. I did not anticipate working on logging tooling improvements when I was investigating a performance regression, nor when I was showcasing the results.
When the wave comes, ride it and see where it takes you.