Knights version 21 released!

A new version of Knights has been released!

This time there are two major new features:

  1. There is a brand new Tutorial level. This should make it a lot easier for newbies to learn how to play. Old timers might also like to play through the Tutorial, as there are a couple of interesting challenges in there 🙂
  2. Lua support has been added. This allows players to make their own modifications (mods) for the game. Already some players have created mods, adding things like spiders, living armours, and a hilarious (imho) new wand type. To see what people are doing, check out the forums.

As usual, head to the download page for downloads and detailed release notes.

P.S. Due to a technical issue the official Knights game server is currently offline. I’ll fix this later today.

About Knights

Knights is a multiplayer dungeon bashing game. Players must explore randomly generated dungeons and compete to solve various quests. For more information please visit http://www.knightsgame.org.uk/.

Notes on Lua/C++ error handling

This is a technical post about how to handle Lua errors in C++ code.

Background

Recently I added support for Lua scripting in Knights. During this process I found out that there are several “gotchas” in terms of how to correctly handle Lua errors in your C++ code. I’m writing this blog post (1) as a way of documenting the issues I found, and (2) because it might be helpful to other programmers who run into similar issues.

I will present this as a series of problems that can occur, and possible solutions to each.

Problem 1: C++ destructors may not called when doing a “longjmp”

The scenario:

Your C++ code calls a Lua API function. Something goes wrong and a Lua error is generated. This causes Lua to do a “longjmp” through your code. Your C++ code contains at least one local variable with a non-trivial destructor.

The problem:

The problem in this scenario is that “longjmp” does not guarantee that destructors will be called. Clearly, this is bad news if you were relying on the destructor to clean up memory or other resources.

Example code:

int MyFunction(lua_State *lua)
{
   std::vector<int> v(10);
   // ...
   lua_call(lua, 0, 0);  // call some lua function
   // ...
   return 0;
}
void Foo(lua_State *lua)
{
   lua_pushcfunction(lua, &MyFunction);
   int result = lua_pcall(lua, 0, 0, 0);
   // ...
}

Discussion:

Imagine that the lua_call above (in MyFunction) results in a Lua error being raised.

At first sight this appears perfectly OK. Lua will handle the error by doing a longjmp. As per the specification of lua_pcall, the stack will be unwound back to the lua_pcall statement in Foo. An error code will be returned into “result”, and execution will continue, with an error message sitting on top of the Lua stack.

The problem is that, on some compilers at least, longjmp() does not call C++ destructors as it unwinds the stack. On such a compiler, the destructor for vector “v” would not be called, and we would have a memory leak.

Indeed, the C++ standard says this:

The function signature longjmp(jmp_buf jbuf, int val) has more restricted behavior in this International Standard. A setjmp/longjmp call pair has undefined behavior if replacing the setjmp and longjmp by catch and throw would invoke any non-trivial destructors for any automatic objects.

(Note: I have seen different wording for this in different places; the above comes from a Stack Overflow answer).

If you think about it, this means that most C++ programs that use longjmp would have undefined behaviour. Considering that Lua uses longjmp ubiquitously for error handling, this is not ideal 🙂

The solution:

The easiest solution is to compile Lua as C++. (In Visual Studio, use option /TP; in GCC, compile the code with g++ instead of gcc.) When Lua is compiled as C++, it uses try/catch instead of setjmp/longjmp for error handling. Problem solved.

Of course, compiling Lua as C++ might not always be practical. For example, we might be working with an external Lua library that has already been compiled as C, or we might want to disable C++ exception handling for performance reasons. In these cases, the only real solution is “don’t do that”; in other words, don’t create stack objects in such a way that destructors would need to be called if there was a Lua error. Of course, this requires some careful programming; if you are going down this route, it would be worth reading this mailing list thread where the issue and some possible solutions are discussed.

What does Knights do?

In Knights I opted to compile Lua as C++. This is the simplest solution, and the overheads of exception handling have not proved to be a problem in practice (remember that Knights is only a simple 2D game with low CPU and RAM requirements in any case).

Problem 2: C++ exceptions should not propagate through Lua

The scenario:

We have written a C++ function and made it available to Lua (via lua_pushcfunction or lua_pushcclosure). Our C++ function can throw exceptions.

The problem:

If Lua calls our C++ function then there is the risk that an exception might be thrown and propagate up through the Lua source code. This is a problem because Lua is written in C, and therefore cannot be assumed to be exception safe in the C++ sense. Although we cannot know for sure that throwing an exception through Lua will be unsafe, we should err on the side of caution and assume that it is unsafe unless proven otherwise.

Example code:

int MyFunction(lua_State *lua)
{
   throw std::runtime_error("Boo!");
}
void Test(lua_State *lua)
{
   lua_pushcfunction(lua, &MyFunction);
   lua_pcall(lua, 0, 0, 0);
}

This code will cause a std::runtime_error to propagate up through the Lua implementation, which is a bad idea for the reasons mentioned above.

The solution:

There is only really one solution, which is to ensure that C++ functions exposed to Lua do not throw exceptions (i.e., they should have the nothrow guarantee).

The simplest way to ensure that is to enclose each function in a try/catch block, e.g.

int MyFunction(lua_State *lua)
{
   // This function is to be exposed to Lua
   try {
      // insert code here
   } catch (std::exception &e) {
      luaL_error(lua, "C++ exception thrown: %s", e.what());
   } catch (...) {
      luaL_error(lua, "C++ exception thrown");
   }
}

This code converts any C++ exception thrown into a Lua error. (If the exception is derived from std::exception, we construct a more useful Lua error message by calling std::exception::what(); otherwise, we just use a generic error message.)

What does Knights do?

One objection to the above is that it would be tedious to write those try/catch clauses in each and every function that we wish to expose to Lua. In Knights, I addressed this by defining a wrapper function “PushCClosure” that is a replacement for lua_pushcclosure. This automates the task of wrapping the enclosed C++ function in a try/catch block. Those interested in the implementation of PushCClosure may wish to refer to its source code.

Problem 3: Avoiding Lua “panics”

The scenario:

C++ calls a Lua API function, and the Lua API function raises a Lua error. Unfortunately, the C++ code did not call Lua in the so-called “protected mode” and therefore, Lua “panics”.

The default action after a panic is for Lua to abort the program. Clearly, this is undesirable in production code.

Example:

The following code illustrates the problem. The code is supposed to read the contents of the global variable “EXAMPLE” and return it as a std::string.

std::string Example_Bad(lua_State *lua)
{
   lua_getglobal(lua, "EXAMPLE");           // push value of EXAMPLE onto the stack
   const char *c = lua_tostring(lua, -1);   // read stack top (as char *)
   std::string result = c ? c : "";         // store to a std::string
   lua_pop(lua, 1);                         // restore the stack
   return result;
}

Discussion:

The problem with the above code is that the calls to either lua_getglobal or lua_tostring might raise an error. For example, a maliciously-minded user might write the following:

local m = {}
function m.__index(tbl, key)
   error("Boo!")
end
setmetatable(_G, m)

Then, the call to lua_getglobal will execute the __index metamethod, which in turn raises an error. The C++ code does not handle the error, and the program aborts.

The solution:

Unfortunately, there is no way to directly check for errors from a Lua API function. Instead, you are supposed to call the Lua API in “protected mode” if you care about error handling. (This often traps beginners, who just write Lua API calls without thinking about what happens when there is an error.)

Calling Lua in “protected mode” is, unfortunately, somewhat tedious. You have to wrap your use of the Lua API inside a lua_pcall, as follows:

std::string Example2_Good(lua_State *lua)
{
   // Create a dummy struct which will be used for the pcall.
   struct Pcall {
      std::string result;  // Will hold the result.

      static int run(lua_State *lua) {
         // Read the pointer to the Pcall struct.
         Pcall *p = static_cast<Pcall*>(lua_touserdata(lua, 1));

         // Get the global "EXAMPLE" and store it in the Pcall struct.
         lua_getglobal(lua, "EXAMPLE");
         const char *c = lua_tostring(lua, -1);
         if (c) p->result = c;

         // Done.
         // (Note there is no need to pop the stack, 
         // Lua will do that for us when we leave the function.)
         return 0;
      }
   };

   Pcall p;
   lua_pushcfunction(lua, &Pcall::run);
   lua_pushlightuserdata(lua, &p);
   int result = lua_pcall(lua, 1, 0, 0);

   if (result != LUA_OK) {
      // Handle error here (e.g., throw C++ exception).
      // Don't forget to pop error message off the Lua stack.
   } 

   // Operation was successful. Return the result.
   return p.result;
}

As you can see, the general idea is to create a local struct, with a static “run” method, and then lua_pcall that method. A pointer to the struct is passed as the sole Lua argument; this allows you to read from and/or write to the struct from within the “run” method.

Importantly, neither lua_pushcfunction nor lua_pushlightuserdata can raise Lua errors, therefore they are safe to call outside of the pcall wrapper.

Note: Panic Functions

An alternative strategy would be to install a panic function, using lua_atpanic. However, it’s not really recommended to use panic functions for regular, everyday error handling. (For one thing, one is never quite sure what is left behind on the Lua stack after a Lua panic.) Instead it is better to use the lua_pcall method as described above.

What does Knights do?

In Knights I don’t currently wrap Lua API calls in a lua_pcall. (The exception is when I call Lua functions, where I do generally use lua_pcall instead of lua_call.) Therefore, if any top-level API calls trigger a Lua error, then a Lua panic occurs.

In Knights I have set up a panic function that throws a C++ exception. When this exception is caught, Knights assumes that the lua_State is unusable and closes it as soon as possible. The server then resets itself and execution continues.

While this method works, it is not particularly pleasant because any clients connected to the server do not receive any explanation of what happened. Their connections are just immediately closed when the Lua panic is handled.

Some time I would like to modify Knights to wrap all uses of the Lua API inside lua_pcall wrappers, as described above. This would ensure clean handling of all Lua errors. (The panic function would be kept, but it would only be there as a “backup” in case I ever forgot to put a lua_pcall wrapper in some place where it was needed.)

Further Reading

The Lua wiki has a page on Lua/C++ error handling:
http://lua-users.org/wiki/ErrorHandlingBetweenLuaAndCplusplus

Here is another blog post on this topic:
http://julipedia.meroh.net/2011/01/error-handling-in-lua.html