Networking Is Hard (Part 2) [JoshJers' Ramblings]

2009-05-30

Networking Is Hard (Part 2)

In my previous post, I weighed the advantages and disadvantages of my game vs. a standard FPS with regard to networking. After doing so, I came up with an initial network design.

The biggest issue, personally, was dealing with the causality of the network game. Each player gets information about the other with a delay, so neither player is ever seeing exactly the same set of circumstances (perfectionism and network lag don’t mix very well). I think Shawn Hargreaves describes it best in one of his networking presentations: he says to treat each player’s machine as a parallel world. They’re not exact, but the idea is to try to make them look as close as possible. It doesn’t matter if things don’t happen exactly the same, but you don’t ever want the following conversation to occur:

Player 1: “Wow, can you believe I killed that giant enemy at the last second? That was amazing!”

Player 2: “…What giant enemy?”

Procyon’s Initial Design

The design I started with was a hybrid of both the peer-to-peer lockstep and client-server models.

Diagram of the network system showing two systems, A and B, where each has a client and a server that talk to each other, then the servers of the two machines act as one coordinated server using network communication

Basically, within each system there is a client/server pair. In addition, each server runs in lock-step with the other, so that they both always tick with the exact same player inputs for a given frame, keeping the two machines’ servers perfectly in sync. Since the only data being sent across the network is player inputs, bandwidth use was ridiculously low. And since nothing ever has to round-trip from the client to the server and back, the system still gets the effectively-halved-ping that a pure lockstep setup gets. Also, because the client and the server are running on the same machine, communications between the two (mostly, the server notifying the client of events that happened based on the other player’s input) becomes very trivial – the server can just call callbacks to the client, and no sort of RPC mechanism is necessary.

Ignoring the clients for a moment, the servers are a very traditional lockstep system. Each normal tick, the player input is polled and then sent across the network. Once the server receives the input from the remote system for the next frame, it ticks forward. As you can imagine, each machine’s server is a bit lagged, because it has to wait for inputs across the network. That’s where the client comes in.

Each system also runs a client, which runs ahead of the server. This client always ticks every (in Procyon’s case) 60th of a second, processing the local player’s inputs right away, and running prediction on the remote player based on the last set of inputs and the last known position that the server knows about. That way, the local player’s ship always reacts right away to player inputs (on the client, which is represented on-screen).

Essentially:

The current local player’s input is polled and sent to the server.
The input is also sent immediately across the network to the other machine.
The client ticks using this input – the position and actions of the remote player are predicted based on the last-known input and position from the server.
If the server has the remote input for the next frame that it has to do (which is always an eariler frame than the client), it also ticks (no prediction necessary on the server, as it has up-to-date information about both players for the given frame it’s simulating). Again, there is a server on both machines, but they’ll both end up with the exact same simulation.

(Mostly-) Deterministic Enemies

Here’s where one of the advantages of the game comes into play. In my last post, I mentioned that enemies are deterministic in their movements. This is actually a key part of the networking. What it means is, there is no prediction required for simulating enemy behaviors ahead of the server. For instance, say that the client is currently 5 frames ahead of the server. While the client is ticking frame 450, the server is only on 445 (because it hasn’t received network input for frame 446 from the other machine yet). Even though the server hasn’t simulated enemy movements yet, their movements are very strictly defined, so the client doesn’t have to guess – it knows exactly where an enemy is on frame 450. Consequently, the client running ahead of the server is not a problem with regards to enemy positions – when you see an enemy in a specific place on the screen, you know that’s exactly where it will be on the server when the server finally simulates that same frame.

Unless the other player kills it first.

That’s right – there is exactly one case in which an enemy’s behavior is non-deterministic, and it’s most-easily described as follows: your local client is currently simulating frame 450, the server has simulated frame 445. You see a large cannon ship start to charge up its beam weapon for a massive attack. However, the server gets remote player input from frame 446 (the next frame it has to simulate), and when simulating it, realizes that the other player dealt the killing shot to the ship. Suddenly, the client’s view of that enemy is wrong – it should have died four frames prior.

In essence, the only way that a client’s view of an enemy is ever wrong is if the other player has killed it in the past, and the server hasn’t caught up. This is a very important property: the only time, ever, that an enemy is not where you see it as is when it’s not anywhere at all (because it already died). However, this is where it starts to get tricky.

Most of the time, it’s not a big issue. The client kills the enemy as soon as the server tells the client that the enemy should be dead, and it just dies a few frames too late. But when the enemy has fired bullets (or any other type of weapon), suddenly there’s a problem. What has to happen in that case is that the client has to remove the bullets that shouldn’t really have been spawned. With reasonable pings, the bullets will generally be close enough to the enemy’s death explosion that they won’t really be noticeable, they’ll blink up in the middle of the explosion and disappear by the time it’s done. With larger lag times, though, bullets for dying enemies might spontaneously disappear. Ah, lag. How networked games love you so.

The real problem, though, is when, using the frame numbers above, you crash your ship into an enemy on tick 450 that actually died on tick 446. What then? You crashed into something that shouldn’t even have been there! After much internal debate, I decided there’s a really simple way to arbitrate this: on your screen, you crashed, so you still get the consequences. Even though you hit something (either a should-have-been-dead enemy or a bullet spawned from such an enemy) that shouldn’t have even been there, you still ran into something on your screen, and still pay the price. So, in addition to player inputs that get sent to the server (and across the network), the client also sends a flag that says “I died!”

As an added bonus, the server no longer needs to calculate collisions against either player – when a player dies, the client signals it (This plays into another advantage of my game – in a competetive multiplayer game, you would never, ever trust a client machine to tell you whether or not its player got hit by something).

Random Number Generation

One issue has to do with random number generation. Obviously, when connecting the two machines across the network, the machines have to agree on a random number seed so that random numbers generated on each are the same. In fact, not only do the machines on either side of the network have to have the same random number seeds, but the client and server also have to start with the same seeds, or they’ll get out of sync (a machine getting out of sync with itself is always funny).

However, what happens when an enemy (Smallship 5) shoots a bullet in a random direction, and then dies “in the past” based on a server correction? On the server(s), this random number generation never happened. On the client, however, a bullet was shot, and a random number was created. With a single random number generator, that means all of the random numbers from that point on on the client are going to be incorrect. Thankfully, there’s an easy way around that.

Give every enemy its own random number generator!

I found a very lightweight (and very high-quality) random number generator: Marsaglia’s mutliply-with-carry generator (or, if you prefer, the infinitely more difficult-to-read Wikipedia version). This generator only requires two unsigned integers for each generator to do its magic, so there wasn’t much overhead to hand one of these off to each enemy. So, the initial random number seed is decided upon before the level begins to load. As the level begins to load, and enemies are created, a new generator is created for each entity, using random seeds generated from the original generator. This way, each enemy gets its own generator, and if an enemy ever generates any extra random numbers before it dies, it doesn’t affect any of the other enemies at all! Problem solved.

Remote Player Bullets

Again, pretend the client is ticking frame 450. When the server reaches 445, it notices that the remote player fired a bullet. So it notifies the client: “Hey, 5 frames ago, the other player fired a shot!.”

The client knows that it needs to spawn a new remote player bullet.
It also knows exactly where that bullet will be at the current tick (since it was spawned at tick 445, but it’s now tick 450, it can tick the bullet forward 5 frames).
It does not, however, know that the bullet will actually survive until tick 450. It’s possible that anywhere from tick 446 until 449, that it might hit something.

Even though it knows exactly where the bullet should be, it doesn’t display it there. It actually starts the bullet in front of its current estimate of where the remote player is (the predicted position), and interpolates it into the correct spot. That way, remote player’s bullets don’t seem to appear way out in front of the remote player, they still start right where they seem like they should.

The client knows exactly where the bullet should be, but not that it’s actually there (because it might have died on, say, tick 448). So while it does collision detection against enemies, if the remote bullet hits an enemy, it doesn’t actually do any damage on the client – it just removes the bullet. If it turns out it dealt damage after all, the server will eventually send a correction to the client.

However, eventually, the server reaches frame 450 (when the client first learned about the bullet). If the bullet is still alive on the server, then we know that it never hit anything from frame 445 until then, so it was a live bullet when the client found out about it. Also, if it’s still alive on the client (it didn’t collide with anything while the server caught up to frame 450), that means the client knows that the bullet is still alive.

Now that it knows that it has a bullet that is guaranteed to still be alive, and is at the exact position that it’s supposed to be, the bullet can flip from being treated as a remote bullet to being treated exactly like a locally-fired bullet. Basically, this bullet now will deal damage and act exactly like a bullet fired by this machine’s own player! It’s one less thing that will act different on the other machines, further strengthening the illusion that both players are seeing the same thing.

Floating Points Are Sharper Than Expected

However, there was one big issue with this entire design: floating point errors. Due to minute differences in the way floating point numbers are calculated on different systems (due to different CPUs, code optimizations, quantum entanglement, etc), the collision code on each system acted slightly differently. Consequently, the two servers running in lockstep weren’t actually in sync. This was causing all sorts of issues – slight timing differences on enemy deaths caused discrepencies in scores and energy amounts…and the real kicker is that it was possible that two objects would collide on one system and never collide at all on the other simulation (a grazing collision on one might not trigger as a collision on the other system). Added bonus when an enemy died on one server just after shooting a ton of bullets, but just before shooting them on the other server. Suddenly, one player’s screen would be full of enemy bullets, and the other’s would be clear as the summer sky.

This became a real problem, and it took a while to solve it (again, perfectionism and networking don’t mix)…and next time, I’ll present the solution.