Chapter 04

"That story about a lightweight video player"

This story took place at a large online video streaming provider.

It probably does not suprise you, but it turns out that hosting millions of videos online for billions of people is hard work. Every megabyte that you can shave off can matter, especially in areas of the world where the internet connection is poor. There is a lot of work involved just keeping the platform alive, let alone make big platform changes. Which is why a fair amount of planning took place for whenever a new feature was needed.

At first, when the platform wasn't that popular yet, the focus of the engineers was to be able to handle good quality videos for those with a solid internet connection. Especially back in the early days, this was the target audience to focus on because you needed these speeds in order to properly benefit from the service. Not just that, but internet speeds were getting faster in most places, so part of the strategy was to grow with the renewed speeds.

Eventually though, this userbase would be saturated and growth stated to stagnate. The group had to figure out a new way to attract users and at that time the only way to grow more was to invest in regions with low bandwidth. That meant a new project had to get underway, to see if the video page could be made more lightweight.

Before starting on the effort, folks got together and started talking about expectations. A lot of time and energy needed to be invested so management started thinking about success criteria. They settled on the average page load time. If the page could load quicker, then that would be an indicator of a win. On the hand, if page load times were to go up then the project should halt and a new approach to the problem had to be considered.

To help understand the scale of the effort, it helps to appreciate the different kinds of skillsets that were involved here. To pull this off, you needed to have folks who understood compression algorithms for video. But also experts in mobile networking, embedded hardware, browser implementations, mobile device programming as well as web technology. This wasn't a small group, it honestly took a large chunk of the entire platform team. It was a big bet as it pretty much halted many other scaling efforts. Part of the reason was that the problem here wasn't just that these new regions had low internet speeds, but also that users in these regions would typically also have low-power hardware that couldn't handle a lot of compute.

A few months passed and eventually the team was ready to do the first mayor rollout. They had already ran some standard tests on small samples, but to measure the real effect they needed to roll it out on entire countries. They crossed their fingers, hit the launch button and started tracking.

This is where something counterintuitive took place. It turned out that the average page load time went down initially, but started skyrocketing two weeks later! It triggered a small panic with the engineers because it seemed like there must have been a mistake in the code. However, no update had been made in those two weeks. Maybe it was the way that they were tracking their users?

It made a lot of people nervous because they had placed a big bet on the project. Thankfully, while the team had a success metric, it wasn't the case that anybody's bonus was dependant on it. They could just take the time to look at the big picture to see what was happening instead of getting a managerial rugpull.

So what was happening here?

When you make a page load significantly faster, especially if you are in a region that could not handle the bandwidth before, then you will initially speed up the devices of your users. But, as a consequence, these people might tell their friends. It is not hard to see that it could then start to attract folks to use your service who otherwise would not be able to do so. That means that you can expect new users with even worse hardware that you were used to before.

All in all, this is an amazing thing! You would attract first time users on a massive scale by doing this. But as a result, the average page load can go down! By having a platform be friendly to low-powered hardware, you might also get more users with that hardware.

The world, it turns out, is a moving target. Especially when you make an improvement.


Appendix

This was a story about a page load, but it is also a cautionary tale.

What would have happened if the metric was strictly tied to team success? Would the effort have been yanked? Would they have flipped the switch back to the slower page? Would they have had the patience to figure out what was really going on? Or would they have moved people and resources to other efforts?

It is hard to say these things in hindsight, but you might be able to imagine how things might have gone wrong if the managerial style was overly metric driven. They might've missed the bigger picture and as a result they would've left a lot of users out of their ecosystem.