How I Made Something That Should Have Killed Any Server
Space Station 14
If you don’t know what SS14 (Space Station 14) is, let me explain it to you.
In layman terms, it’s an open source role-playing game where around 85 people try to survive in a space station, whilst doing their jobs. Ever heard of among us? It’s basically that, but 1000 times more complicated.
The “impostors” of SS14 are called “antagonists”, and there’s a ton of them to make each round chaotic. Changelings (from The Thing), Traitors, Cultists, Wizards, Space Dragons, Demons, Revolutionaries and the list goes on.
Now, the unique thing about SS14, is that each server runs its own codebase, forked off one central vanilla codebase. This allows for servers to have their unique content and isolating their changes without modding the game.
Programming in SS14
SS14 is possibly one of the easiest games to add new content to.
It uses a custom engine named Robust Toolbox, which makes use of the ECS pattern with a heavy event bus based approach.
All programming is done in pure C#, and anything relating to data (e.g. creating entities, custom prototypes) is done through YAML.
Now, don’t do this kids
My first contribution in SS14 was in a codebase named GoobStation (this is important).
Someone made an issue to add a specific recipe to a machine in the game, so all I did was edit the YAML to allow that.
It was a 11 line YAML PR that started my journey.
My next PR, in another codebase named Einstein-Engines (dead as of now), was adding a new antagonist with 23 abilities, unique mechanics and progression. It was 8k~ lines.
Suffice to say, it did not turn out well and the code was insanely bad (had to be rewritten and it’s still bad for reasons I will explain later), but I did learn a lot from it.
The destroyer of servers
The antagonist I added was called “Shadowling”. It originated from SS13 (Space Station 13), but was discontinued there due to no one wanting to maintain it and having balance issues. My own version is a complete remake and port of that.
Now, this antagonist falls under the group of “conversion” antagonists, basically antagonists that rely on converting players to their cause (via some ability or action).
However, it had some mechanics that made it stand out from the others.
- Shadowlings get hurt from lights, but heal in the darkness
- They are usually solo (other conversion antagonists usually start in groups of 2-3)
- They are unable to fire guns, and wear anything except from radios.
- They get new abilities based on how many converted players they have
- They are aliens!
Those points alone make them one of the hardest and weakest antagonists to play in the game, with a high reward (which was what I originally intended).
Their end-goal is to convert 20 people and ascend into a higher being that is unkillable by any means (not even godmode can save you from their final form).
Light-detection and the effects it had on the server
Okay, everything seemed normal so far. But, wait… This antagonist requires us to detect lights.
And the lightning system in SS14 is client-side, but this is an important game mechanic, so we don’t want to validate the lights in the client.
Oh, server is die, thank you forever.
Yes, this one thing made this antagonist IMPOSSIBLE to implement. But don’t you worry, there’s always a way!
Now, before implementing this system, we have to first look if anyone has made something similar.
Turns out they did! Time to use it in our project… I’m kidding. All roads lead to the same exact destination; it’s gonna be expensive no matter what you do, there’s no magic “optimized light-detection” button. And that’s fine.
Sometimes, optimization reaches a point where you must make gameplay changes in order to make it work flawlessly.
Time to implement our own method then.
I wish I paid attention in physics class
Note: this is the first iteration, it has been further updated by a maintainer who helped port it to GoobStation codebase
public override void Update(float frameTime)
{
var query = EntityQueryEnumerator<LightDetectionComponent>();
while (query.MoveNext(out var uid, out var comp))
{
// Skip dead entities
if (_mobStateSystem.IsDead(uid))
continue;
if (_timing.CurTime < comp.NextUpdate)
continue;
comp.NextUpdate += comp.UpdateInterval;
DetectLight(uid, comp);
}
}
This is a basic update method in SS14. We want to reduce the expensive operations as much as we want before we get to them.
Firstly, we run a query on every entity with the LightDetectionComponent so we can access it.
Before we call the DetectLight() method, we must first establish some rules.
The first rule, is that dead entities should not run this method, since the player doesn’t play in the game anymore (although they can be revived).
So, we call _mobStateSystem.IsDead(uid) and return if its true.
After that, we do a time check. We don’t want to run this system every frame, so we implement a specific time to run the expensive method every X seconds. This improves performance by A LOT. It’s the most important optimization.
Now, after all the checks are done, we run the mighty DetectLight() on our entity.
private void DetectLight(EntityUid uid, LightDetectionComponent comp)
{
var xform = EntityManager.GetComponent<TransformComponent>(uid);
var worldPos = _transformSystem.GetWorldPosition(uid);
// We want to avoid this expensive operation if the user has not moved
if ((comp.LastKnownPosition - worldPos).LengthSquared() < 0.01f)
return;
...
}
Let’s take it one step at a time and address the problems.
var xform = EntityManager.GetComponent<TransformComponent>(uid);
This is wrong and can potentially crash the game if we don’t have TransformComponent (we always do but it’s important to play it safe). Usually a better way to get the Transform of an entity is through modern methods like Trasnform(uid). It’s good to be up-to-date with an engine’s methods.
Next up, there’s another “if” check that allows us to check if we have moved at all before doing the operations. A lot of the times you may not move either due to AFKing or talking to someone, or trying to evaluate a situation. So this is a useful and simple optimization.
And now… The digusting piece of code:
comp.IsOnLight = false;
var query = EntityQueryEnumerator<PointLightComponent>();
while (query.MoveNext(out var point, out var pointLight))
{
if (!pointLight.Enabled)
continue;
var lightPos = _transformSystem.GetWorldPosition(point);
var distance = (lightPos - worldPos).Length();
if (distance <= 0.01f) // So the debug stops crashing
continue;
if (distance > pointLight.Radius)
continue;
var direction = (worldPos - lightPos).Normalized();
var ray = new CollisionRay(lightPos, direction, (int)CollisionGroup.Opaque);
var rayResults = _physicsSystem.IntersectRay(
xform.MapID,
ray,
distance,
point); // todo: remove this once slings get night vision action
var hasBeenBlocked = false;
foreach (var result in rayResults)
{
if (result.HitEntity != uid)
{
hasBeenBlocked = true;
break;
}
}
if (!hasBeenBlocked)
{
comp.IsOnLight = true;
return;
}
}
As you can see, we query ALL the entities with PointLightComponent. That’s wrong on so many levels. Imagine yourself counting all lights on earth, instead of counting the lights in your room.
There’s already implemented methods to do fast lookups on nearby entities, without having a huge overhead cost like EntityQueryEnumerator! That was sadly a beginner mistake.
Either way, let’s continue.
We check if the light is on, if its not then we already know we aren’t standing on light.
if (!pointLight.Enabled)
continue;
After that, we check if we are close to the light (I did mention that there’s systems that can do that but wise me didn’t think to use them)
if (distance > pointLight.Radius)
continue;
And finally, we get to the expensive physics method.
var direction = (worldPos - lightPos).Normalized();
var ray = new CollisionRay(lightPos, direction, (int)CollisionGroup.Opaque);
var rayResults = _physicsSystem.IntersectRay(
xform.MapID,
ray,
distance,
point);
This is the reason of any kind of lag. Casting rays is insanely expensive, and imagine casting them for an insane amount of lights…
There was one time where I forgot to add some checks like the timer check, and I was wondering why everything was delayed by 10 minutes when hosting a dev environment. Fun times.
Small optimizations matter
Putting the LightDetectionComponent on an entity shouldn’t lag the server at all now. Especially if its just one entity. However, that’s not true…
The people who get converted by Shadowlings have a special ability that makes them invisible in the darkness once activated.
Attaching the component to them is a dumb move.
Rather, what we want to do is attach it once we activate the ability and remove it after the ability ends its duration.
Small optimizations like that can greatly improve performance without anyone noticing.
The new version
Now, I’m gonna showcase the new system which was written by a GoobStation maintainer (Rouden). I respect this person a lot cause he knows a lot and helped me port this antagonist to a better codebase, and try to explain it.
This is the new Update method
public override void Update(float frameTime)
{
if (_nextUpdate < _timing.CurTime)
return;
_nextUpdate = _timing.CurTime + TimeSpan.FromSeconds(UpdateFrequency);
_job.UpdateEnts.Clear();
var query = EntityQueryEnumerator<LightDetectionComponent, TransformComponent>();
while (query.MoveNext(out var uid, out var comp, out var xform))
{
_job.UpdateEnts.Add((uid, comp, xform));
}
_parallel.ProcessNow(_job, _job.UpdateEnts.Count);
}
First thing we notice is that the timer check has moved before we query the LightDetectionComponent.
Second thing we notice is that there’s a _job variable. I will explain this later.
Third thing we notice after the query is that there’s a _parallel variable that calls an internal method ProcessNow(_job, _job.UpdateEnts.Count)
Do you see where this is getting at? Yes, threading.
Let’s take a look at _job, which is of type HandleLight and is pre-allocated in the system:
private record struct HandleLightJob() : IParallelRobustJob
{
public readonly int BatchSize => 16;
public readonly List<Entity<LightDetectionComponent, TransformComponent>> UpdateEnts = [];
public required LightDetectionSystem LightSys;
public required SharedTransformSystem XformSys;
public required SharedPhysicsSystem PhysicsSys;
public required EntityLookupSystem LookupSys;
public void Execute(int index) {...}
An IParallelRobustJob which allows us to split the workload on the server’s CPU to multiple cores.
That means, this code will not run on one core like usual and possibly “stall”. This is a huge performance boost, and allows us to manage more entities with this component. However, that doesn’t mean it still isn’t expensive! Just less expensive and playable state.
We are not expecting the antagonist to convert 85 people. In most cases, we are expecting at most 30 people with this component, if the Shadowling does win.
The rest of the code in Execute(int index) theoritically the same as my first iteration’s DetectLight() so I won’t explain it again.
So far, there has been no lag issues reported about this, which is fascinating.
My first iteration was alright lag-wise (I asked some server owners) in mid population servers (think 40 players out of 85), but this one can handle a full server way better.
Lessons learned
- Don’t make ambitious PRs when you are starting out in open source development. Learn the codebase, ask around and start with smaller mechanics!
- Make use of your CPU cores when dealing with a lot of data. It can help when you are trying to optimize the un-optimizable
- Small and simple optimizations matter
- Be willing to learn from your mistakes and improve! Nobody knows anything from the start.
If you’re interested on how it looks and plays in-game, you can watch this video: