Michal Kohutek

Serialising C++ classes while writing as little code as possible

A C++ library for saving objects into files usable with as little code as possible

Suppose you are writing a GUI program, for example in Qt, and you don’t feel like setting its working folder and other settings that change a lot every time you open it. The solution is obvious. Save some of the settings on disk. Doing this requires making memory dumps (which is platform dependent and difficult to debug and edit manually) or writing a parser (which takes some time). I have faced this problem several times and usually kept copying the solution into new programs. Then I decided I had enough and that it was time to write a small library for that, one that would be as convenient to use as possible.

The design

The most convenient way of doing it would be to declare the class as serialisable and something would check for presence of members and generate save and load methods appropriately. This could be done in Python by checking the contents of classes and serialising everything that looks serialisable. However, Python voids encapsulation to allow this, which is a bit too much of a sacrifice, plus its performance is significantly inferior even to many other interpreted languages. In C, this could be done using some crazy variaditic macro whose incorrect use could generate error messages so cryptic that if they were written in Elvish, they would be equally understandable. Furthermore, it might be actually more convenient to control which members are serialised (some might better not be saved) and how they are named (like more explanative names in the saved file, as those long names will not clutter the source code).

The obvious solution have one method to serialise it and another method to deserialise it, calling methods that write each member type to string or read it from string. These writing and reading methods can be provided by a parent class. But this approach requires writing the name of every member and its serialised name once in the saving method and once in the loading method, leading to literally Writing Everything Twice, shortened as WET, an ostentatious violation of the DRY principle. In addition to writing more code, nothing will check if the serialised name is written without typos in both cases, which is a source of hard-to-find errors.

The requirement is thus to write only that the class is serialisable and to list the members that should be serialised and their names inside the saved file. Something like this:

class Preferences : Serialisable {
  std::string folder = "";
  int steps = 100;
  void saveOrLoad() {
    serialise("folder opened at last file opening", folder);
    serialise("number of computation steps", steps);
  }
};

This would also allow declaring a macro that would allow writing only one argument that would be both the attribute name and the name in the file.

How to implement it?

Because dependencies are still a big nuisance in C++17, single file header-only libraries are still the most convenient kind of small library. That implies that it should preferably depend only on standard libraries and nothing else.

I was considering the INI format as the format to save the data, but I ended up chosing JSON because it allows block nesting (doing that is quite impractical in INI) and doesn’t write a lot of unnecessary stuff like XML. Including a JSON library would add a dependency that would render the library less convenient, so I created a short one myself and made it part of the library. It didn’t take much time.

Having the same function save or load different types of variables can be done using overloading, which allows having more functions use the same name and be selected at compile time according to the types of variables.

Having one function to both load and save is a bit tricky. The function is only one and the functions it calls are fixed at run time. However, there is one ancient tool for deciding what to do at run time. It’s called if. The parent has a variable set differently by its save() and load() methods and the unified serialisation methods check it to decide if they are saving it or loading it. The place to serialise it is also a member of the parent, because it would have to be pasted into every call of every method. Note for Java/C# developers: C++ allows multiple inheritance, so there is no difference between interface and parent class and implementing methods and adding attributes to classes to inherit from does not prevent us from inheriting more implemented methods and attributes from something else. In fact, it’s a very convenient to avoid writing repetitive code.

class Serialisable {
	mutable std::shared_ptr<JSON> preferencesJson_;
	mutable bool preferencesSaving_;
public:
	virtual void saveOrLoad() = 0;

	inline void save(const std::string& fileName) const {
		preferencesJson_ = std::make_shared<JSONobject>();
		preferencesSaving_ = true;
		const_cast<Serialisable*>(this)->saveOrLoad();
		preferencesJson_->writeToFile(fileName);
		preferencesJson_.reset();
	}

	inline void load(const std::string& fileName) {
		preferencesJson_ = parseJSON(fileName);
		if (preferencesJson_->type() == JSONtype::NIL) {
			preferencesJson_.reset();
			return;
		}
		preferencesSaving_ = false;
		saveOrLoad();
		preferencesJson_.reset();
	}

	// ...
}

An alternative approach could be to add a templated argument to the method and supply it as an argument to the synchronise() methods, causing the compiler to actually create two methods from it. It would be faster because the parent would not add members and it would avoid a virtual function call, but it would require more code. It is not supposed to be executed too often anyway. Furthermore, it would be impossible to implement the save() and load() functions as methods, they would have to be functions taking them as arguments.

This may look like a violation of the Single Responsibility Principle, because these serialise() methods do two things. They serialise and they deserialise. That, however, we can look at these methods as logical pairs of methods that are selected at runtime. The rationale behind the Single Responsibility Principle exists is that we might need a half of the functionality later and we would have to write extra code only to create something that does only a half of it. These two functionalities are callable independently anyway, so this principle isn’t really violated.

The single element serialisation method

Serialisation of strings is trivial:

inline void serialise(const std::string& key, std::string& value) {
	if (preferencesSaving_) {
		preferencesJson_->getObject()[key] = std::make_shared<JSONstring>(value);
	} else {
		auto found = preferencesJson_->getObject().find(key);
		if (found != preferencesJson_->getObject().end()) {
			value = found->second->getString();
		}
	}
}

Bools can be synchronised in pretty much the same way. But numbers can’t, because there are more types of them and if it’s returned through a reference, it has the be an exact match. This can be solved using a template. SFINAE (Substitution Failure Is Not An Error) can be used to enable the method only if the type is a number by causing the derivation of the return type to fail if the type isn’t a number and cause the compiler to choose from other methods with such a name. It can be also used to prevent the bool type from being accepted as argument. The method’s header thus looks like this:

template<typename T>
	typename std::enable_if<std::is_arithmetic<T>::value
			&& !std::is_same<T, bool>::value, void>::type
	serialise(const std::string& key, T& value) {
		// Basically the same code as above

The same is useful for member classes as well. We can serialise them if they inherit from the parent class that makes them serialisable. The only larger difference is that they can’t be assigned, they have to be created and have their synchronisation methods called. The way to check if they are inherited is standardised since C++11:

	template<typename T>
	typename std::enable_if<std::is_base_of<Serialisable, T>::value, void>::type
	serialise(const std::string& key, T& value) {

Now, there will also be vectors of serialisable objects and vectors of pointers of serialisable objects. In the case of a vectors, it’s quite trivial, all that has to be done is to simply repeat the same in a loop. But now, how about the vector of pointers? It will be useful to allow putting objects of multiple child classes of the same parent into the vector.

If there were just raw pointers, it would be trivial. But in modern C++, smart pointers are a must and raw pointers are still useful. It could be solved using concepts, but they are part of C++20, which is not widely available yet and even C++14 support is not everywhere. Everything needed so far required only C++11. So we can use some kind of duck typing. If using * before it makes it return a type that is serialisable and a pointer to it can initialise this type, it has all we need of a pointer or a smart pointer (raw pointers would not pass checking for the -> operator, so it’s not used in the implementation). Because the * operator is not always a method, we need a decltype, declval pair.

The result looks like this:

template<typename T>
typename std::enable_if<std::is_base_of<Serialisable, typename std::remove_reference<decltype(*std::declval<T>())>::type>::value
		&& std::is_constructible<T, typename std::remove_reference<decltype(*std::declval<T>())>::type*>::value, void>::type
serialise(const std::string& key, std::vector<T>& value) {
	if (preferencesSaving_) {
		auto making = std::make_shared<JSONarray>();
		for (unsigned int i = 0; i < value.size(); i++) {
			auto innerMaking = std::make_shared<JSONobject>();
			(*value[i]).preferencesSaving_ = true;
			(*value[i]).preferencesJson_ = innerMaking;
			(*value[i]).saveOrLoad();
			(*value[i]).preferencesJson_.reset();
			making->getVector().push_back(innerMaking);
		}
		preferencesJson_->getObject()[key] = making;
	} else {
		value.clear();
		auto found = preferencesJson_->getObject().find(key);
		if (found != preferencesJson_->getObject().end()) {
			for (unsigned int i = 0; i < found->second->getVector().size(); i++) {
				value.emplace_back(new typename std::remove_reference<decltype(*std::declval<T>())>::type());
				T& filled = value.back();
				(*filled).preferencesSaving_ = false;
				(*filled).preferencesJson_ = found->second->getVector()[i];
				(*filled).saveOrLoad();
				(*filled).preferencesJson_.reset();
			}
		}
	}
}

Finally, the library needed a name. I called it QuickPreferences. I named the class QuickPreferences instead of Serialisable to avoid possible identifier conflicts (they can be dealt with using namespaces, but why cause them at first place?). I also renamed the unified serialising and deserialising function synch(), because it’s shorter and I found it more descriptive. The source code can be found here.

Conclusion

Using some variables for real time decision instead of multiple functions, some virtual functions and some methods with same names, I have created a library for saving preferences with as little code as possible. This allows saving C++ data structures into JSON in a controllable way to be used with as little code as this:

#include "quick_preferences.hpp"

struct Settings : public QuickPreferences {
	std::string folder = "";
	std::string fileNames = "data";
	int lastIndex = 9;
	virtual void saveOrLoad() {
		synch("working_folder", folder);
		synch("working_file_names", fileNames);
		synch("last_working_file_index", lastIndex);
	}
};

10 thoughts on “Serialising C++ classes while writing as little code as possible

  1. We did fairly similar to this for network synchronization and save games in Crysis back in the day!

  2. “…they can be dealt with using namespaces, but why to cause them at first place?…”

    *but why ~to~ cause* the “to” is extraneous

Leave a Reply

Your email address will not be published. Required fields are marked *