One of the main axioms of secure programming is a rather simple principle: never trust the user input. It might seem obvious, but this is even more relevant if the user input relates to something in the code, whether it be used to generate a piece of code, or simply a class or function name. This is precisely what happens with deserialization vulnerabilities. When deserializing the payload, a class is loaded, instantiated and values are set. This may seem harmless, but quite a few factors cause these problems to escalate, including:
- Some classes execute code in the constructor, even though it is known to be a bad practice. These violations even occur in most standard libraries.
- Object destruction can also contain exploitable code.
- Setters can also execute code, and the deserialization process may require using those.
- Simply loading the class can lead to code execution. For instance, importing a module in python will execute more code than simply loading the class, and the autoloading mechanisms in PHP might also lead to more files being included.
- Magic methods are often called pre- and post-serialization.
- What is most surprising is that the list of vulnerable libraries and languages just keeps growing.
Java was found to be vulnerable in 2006, PHP in 2009, Ruby on Rails in 2013 – and the list goes on. The recommendation that followed these revelations was to avoid the standard library’s implementations that cannot be fixed and to instead use safe formats like XML or JSON for data exchange.
Alvaro Muñoz and Oleksandr Mirosh found the exact same issue in over a dozen implementations, both .NET and Java. The formats include JSON, XML and even binary formats. It turns out the problem is not so much with the serialization format used as the concept of serialize-anything implementations. Using JSON-based serialization alone does not make the mechanism safe. All those developers who listened to the recommendation of switching to JSON serialization, but used a serialize-anything implementation, ended up not fixing the problem at all.
The presentation also included a demonstration in DotNetNuke, a popular .NET CMS, which deserialized data from a cookie. In a single payload, a remote shell could be deployed. Obviously, the flaw has been reported and fixed, but the vulnerability used the PullFile utility provided by the CMS, which upon being called by the deserializer, pulled a file over HTTP and wrote it to the disk in the specified location. Had the class not performed the action from the constructor, the functionality could not have been triggered.
Depending on the implementation language, various features can mitigate those risks. For example, by using reflection, they can also avoid calling constructors and setters. However, destructors are impossible to avoid. Magic methods are features of the serialization libraries and thus cannot be avoided. A php-serializer example shows the best intentions. Reflection is used to build the object, but __wakeup() is still called. Here is a silly proof of concept that would exploit it.
Implementations can add some mitigation, such as whitelisting what can be deserialized. If you use any of those libraries to handle user input, make sure you at least use a whitelist approach. If your library of choice does not support this, use one that does. The safest route is most likely a library that will enforce the schema, ensuring that no unintended data leaks. If the serialized content refers to a class in the code, there is a risk.
If you use those libraries to persist data in a persistent data store, keep in mind that it could allow escalating vulnerabilities such as SQL injections into remote code executions. On this topic, how safe is your cache? What about your message queue?