Search This Blog

Saturday, April 04, 2009

Long Standing Memory Leak is Fixed!

OMG.

The memory leak. You've all seen it. You'll be playing a long game of RealmSpeak with 3-4 characters, and the game gets slower, and slower. Without warning, the game will ultimately crash with an "Out of Memory" error, and that's that. Sure, you can reload your game, and its good again for a few game weeks, but what a pain! Also, if you've ever tried to play with more than about 6 characters, the game is nigh-impossible to stand, as everything is sooooo slowww. This memory leak has been a royal pain in the !@#$ since the inception of RealmSpeak. I've never been able to nail down the problem, and its been driving me NUTS!!!

Warning: Java tech speak ahead!

On a whim this evening, I decided to set up a small simulation of my GameServer and GameClient, and have them communicate indefinitely in a loop, and watch what happens to memory in a controlled environment. Memory gets eaten up quickly, and my loop ultimately dies in an "Out of Memory" error. Great! I've reproduced the problem in a smaller (non-RealmSpeak) application, which will serve as an excellent test subject.

After some toying around with different freeware memory profilers, I found one that gave me an "Object Allocation" graph. This basically tells you which objects are being created, how many, and how long lived. From this graph, it was quickly obvious that my primary data transfer object (GameObjectChange) was piling up in memory, and outliving many other objects. Hmmm, that shouldn't be happening, should it?

Looking at the code, I made sure that there were no static collection objects (a common memory leak), and that each collection object went out of scope at some point in its life, so that I could be sure that the objects contained therein were cleaned up by the Java Garbage Collector (GC). Okay, so that all looked good, but that was no surprise. I had looked at that like a thousand times before!

Next, I added a finalize() method to my GameObjectChange object, with a message to the console to let me know when the object was GC'd. I ran my test app, and...whoa... FINALIZE IS NEVER CALLED. That means that the GC had determined that every single one of these objects was still being referenced somewhere, and passing it up for collection! That's definitely NOT good, and I realized that I was going to HAVE to find the source of those references if I was ever to fix this problem.

Did you know that there really is no good way to do this? I vaguely remember a tool that would help me here, but I'll be damned if I can remember what it was. Anyway, I ultimately had to resort to looking at every place that used one of these objects and ask whether it was holding a reference or not. My first pass revealed nothing, and I thought I was going to fail yet again solving this problem.

Then, for some reason, I was looking at the Java docs for the ObjectOutputStream (the stream I use to transfer data), when I discovered something I did not know: this stream actually maintains a reference to every object that passes through it, until the stream is either closed, or reset! Since close() was out of the question (that would kill the connection), I immediately tried adding a reset() call after the normal flush() call in every case. Lo-and-behold, the objects started getting GC'd, and the finalize() method I wrote started spitting out messages to the console!

Excited, I ran the test application.... and memory is STABLE. It creeps up briefly as new objects are created (normal), and then stabilizes at a reasonable size, and changes no more! Wow!

The next test is RealmSpeak itself. I fired up a game, and added eight characters, and played for a game week. The game rolls along quickly, with no sign of the sluggishness seen in previous versions. In fact, the performance has clearly improved overall. I want to play a longer game when I have more time, but I'm pretty convinced the leak is sealed.

Bug fixed!