So You Want to Make a Change: Patching & Configuration in Live Games

Imagine you and I recently released the first version of our new game. What a herculean effort that was! There was endless QA, long packaging steps, and not to mention arcane rules in our store of choice. We want to keep expanding and improving our game, but it was so much work to ship once, how are we going to do that all the time!? Having tools that make live changes faster and easier will be crucial, and there isn't a one-size-fits-all solution. Let's discuss our options.

Binary Patching - The Old Standby

If we’re making a PC game for Steam circa 2010, we’d make live changes pretty infrequently. Usually to A) patch a critical bug, B) fix egregious imbalances, or C) release an expansion pack/DLC. While not particularly expedient, it’s clear how it is done on Steam, and it looks a lot like what we just did for release. We rebuild our game, upload it to Steam, push publish, and wait for Valve to release it. At Gamebreaking we also require a rigorous and somewhat time-consuming QA pass on these changes before release. That's because this method allows us to update any file in the entire game including the most dangerous ones: the binary files or code. Because of that, we call this method “binary patching.” All these requirements make the change-to-live time on a binary patch a day or so, and most studios only do this every week or two.

The number of reasons for changing a live game have increased exponentially since 2010. There are time-based events, skins, and A/B testing, just to name a few. The frequency with which we make live changes has increased dramatically as well. These features gave rise to the growing practice of live ops in games. Live changes are a key component of a live ops strategy.

Live Changes - A Modern Approach

With this new demand should come a new way of thinking about live changes. Some changes are small and safe, like changing the damage value for a sword or the price of a hat. Some changes are large and dangerous, like changing the way physics works in our game. Sometimes our changes need to happen at a specific time, like giving gifts on Christmas morning. Others could wait a week, like fixing an off-center UI element. The philosophy that we adopt must adapt to these circumstances:

Safe, small things should change easily and frequently, while dangerous, large things change slowly and endure more rigorous review.

Our tools and processes must support that, but most teams rely solely on tools the platform provides. Generally, this leaves us right back in 2010, shoehorning changes of all shapes and sizes into the slowest, most disruptive, and most dangerous process. With limited resources, we may not have a choice. If we want to have a truly living game though, we'll want to change it more frequently, more safely, and more on our own terms than Apple or Sony support through their systems.

Thankfully, since 2010, we’ve come up with new ways to deliver changes to production! Here are the three categories of live changes I think every developer should be familiar with and the terms we use to describe them at Gamebreaking.

Ideal Patch Categorization

Content Patching

Aside from delivering an entirely new version of the game, many live games have a smaller patch that happens right as you open the game. You’ve probably seen this when opening your favorite mobile game. Many mobile games have daily or weekly content updates that download when you open the game. These content patches were originally used to circumvent lengthy human reviews in Apple’s App Store. However, neither Apple nor Google allow downloading and executing code outside your store-delivered app bundle (see section 2.5.2 of the App Store Review Guidelines). As a result, content patches add content without changing code, hence the name.

Startup-time content patches are useful outside of mobile too. Content patches can safely add new skins or enable in-game events. Another benefit to content patching is preventing users from data-mining content from your game ahead of a timed release. This is especially useful for surprise events, whose content could be leaked by enterprising fans like League of Legends fan site surrenderat20. This often requires a small bundling process (like Unity’s AssetBundles) and perhaps an upload to a CDN or online file store.

Configuration “Patching” - Live Configuration

Savvy developers will also have a subset of their data that can be updated at a moment's notice without notifying their platform of choice, perhaps without their players even noticing. Most commonly, this includes store prices and item availability, but can also change XP rewards, enable new features, or nerf an overpowered weapon. Unity’s Remote Config is an example of configuration patching (though I cannot speak to its suitability). Importantly, config changes utilize code and content that previously went through rigorous testing in either binary or content patching and are on a player’s machine already. They are also small, independent, and reversible. It’s not so much “patching” as they are “reconfiguring”, so they’re often called live configurations.

In this category, we can do some pretty exciting things, like enable a live event at 12:27pm on Wednesday, January 1st. We can disable a bugged item or character to prevent crashes on live. We can run A/B tests and incremental rollouts as quickly or slowly as we’d like. You can see how this speed could become tempting, and we are tempted by it, but we don’t give in.

Every Change is not a Config Change

Once we support multiple types of live changes, we tend to categorize changes incorrectly and break things. Like, a lot. Most frequently, we decide that because we can make changes quickly via our live configuration or content patching systems, we decide we should change everything that way.

If every change is frictionless, we feel like they are all equally safe, but that isn’t true. Code is dangerous and is tightly coupled with both data and other code. Appropriate content and config changes do not share those dangerous properties. Below are some changes whose categories may feel unclear and tend to cause the most pain.

Scripts

Scripts are code. Breaking our scripts is a game breaking change. Just because the files can be delivered independently like content, or they look like pretty diagrams (looking at you, Unreal Blueprints), doesn’t mean scripts meet the safety requirements of either a config or a content patch. Script changes belong in our binary patch process with all the associated testing procedures.

Asset Linkage

This is a tough one. Asset linkages connect one asset to another. They specify which animations belong to which models, or which sound effects play when an ability is used. We might connect them via Unreal Blueprint variables (see below), in the Unity Editor, or just through a string in a text file.

IMAGE - Assets linked via UE4 Blueprints

Importantly, where this data lives does not determine what type of change to classify it as. Just because asset linkages are stored as a string in a text file doesn’t make them a config change. By definition, asset linkages couple data together, and so they fail the independent clause of our config-change litmus test (small, independent, and reversible). Asset linkages are most often considered content changes. However, many asset linkages, especially if there is potential to reference missing content, can cause the game to crash or halt player progress. This kind of data doesn’t meet the safety criteria of a content change.

I highly recommend using type-safe and null-safe mechanisms to specify asset linkages. Unreal Blueprint variables like the one above (or their Unity equivalents) are a good first step, but both allow null/empty values. If we want to deliver these as a content patch, either rigorous testing or, better yet, additional (automated) verifications should be put in place to ensure required fields cannot be left empty.

Feature Toggles

Feature toggles or feature flags are configurations that enable or disable functionality in our game. They’re often the backbone of time-based events, incremental rollouts, and A/B testing. Feature toggles are a powerful poster child for our configuration category of patching (as they should be). That’s because feature toggles appear to allow drastic changes to a live product in very little time. That last statement should give you pause. Indeed, misuse of feature toggles are the cause of many issues even in mature companies. Feature toggles often fail when they are no longer independent. When feature toggles rely on each other, a misconfiguration can result in a broken product instead of a different player experience. If developers aren’t careful, feature toggles can also fail backwards or forwards compatibility. The strongest practice against both of these failure types is to minimize the number of feature toggles that are supported at any time. Define a plan for when to remove old feature toggles.

Final Thoughts

New tools and middleware to support games as a service are coming online every day that make running a game easier. It’s faster than most teams can keep track of, much less evaluate, and not all of those tools are equally suited to the task at hand. This article provides a structured framework for assessing needs around patching, and to critically evaluate what tools to build or buy and when to use them.

If you’re interested in these problems and many others at the intersection of games and backend services, Gamebreaking Studios is hiring experienced software engineers. Feel free to send a resume to: [email protected]