Michal Kohutek

Writing a JSON library in two and half hour, in C++

Do you think writing code in C++ takes long? Here’s the story how I wrote a fully featured JSON library in C++ in two and half hours when using existing ones was inconvenient.

I wanted to make a small library for comfortable serialization and deserialization of classes. I needed some format to save it to and I chose JSON. There are many nice JSON libraries, but they mean additional dependencies (contributing to dependency hell) that would also be much larger than the library itself. So I chose to write a JSON library of my own. Because it was meant to be encapsulated, there was no need to give it a practical interface.

The analysis

The fundamental property of JSON is that unlike XML, it’s typed. Types are string, number, bool, object, array and null. Objects contain any number of other content (string, number, object, etc.) named by strings. Arrays contain any number of objects. This creates a tree structure that is usually visualized by indentation.

Tree structures usually have their nodes represented by classes that may contain references to other instances of that class. Because these nodes can have different content, creating an abstract class that all nodes representing the JSON types would inherit from looks like a good idea.

The way to store variables of types string, number and bool is clear. Objects can be stored in string-indexed unordered maps of the abstract class. Arrays can be vectors of objects. Null does not need to be stored

Base structure

To avoid code duplication, the abstract class is not really an interface. To avoid having to downcast, the abstract class implements exception throwing placeholder versions of all methods. It also needs a way to determine its actual type, because inspection of types necessary if the input’s structure isn’t necessarily fixed would otherwise have to be done using try&catch blocks. Because the null type does not have anything accessible, the parent class can be used as the null type.

Using these ideas, I produced this code:

struct JSON {
	virtual JSONtype type() {
		return JSONtype::NIL;
	}
	virtual std::string& getString() {
		throw(std::runtime_error("String value is not really string"));
	}
	virtual double& getDouble() {
		throw(std::runtime_error("Double value is not really double"));
	}
	virtual bool& getBool() {
		throw(std::runtime_error("Bool value is not really bool"));
	}
	virtual std::vector<std::shared_ptr<JSON>>& getVector() {
		throw(std::runtime_error("Array value is not really array"));
	}
	virtual std::unordered_map<std::string, std::shared_ptr<JSON>>& getObject() {
		throw(std::runtime_error("Object value is not really an object"));
	}
};

Note: JSONtype is an enum class with a value for each type. The null type had to be written as NIL because NULL is the obsolete version of nullptr.

The reference-returning access functions could be replaced by operations doable on the data if the real type is correct, but it would not fit every use case, make the code less obvious and need a lot of code (the purpose was to write little code!) that would take long time to write.

Now, I had a basic structure for keeping the tree structure in memory. The clock was around 30 minutes from the start.

Child classes

I am not writing them all because they are very much the same. They overload only those of the parent’s methods they need, so they don’t need much code to write. The least trivial one is object, so I am including it as an example:

struct JSONobject : public JSON {
	std::unordered_map<std::string, std::shared_ptr<JSON>> contents_;
	JSONobject() {}

	virtual JSONtype type() {
		return JSONtype::OBJECT;
	}
	virtual std::unordered_map<std::string, std::shared_ptr<JSON>>& getObject() {
		return contents_;
	}
};

Now, all of the JSON data could be stored in memory and manipulated there. Here is an example of usage:

JSONobject testJson;
testJson.getObject()["file"] = std::make_shared<JSONstring>("test.json");
testJson.getObject()["number"] = std::make_shared<JSONdouble>(9);
testJson.getObject()["makes_sense"] = std::make_shared<JSONbool>(false);
std::shared_ptr<JSONarray> array = std::make_shared<JSONarray>();
for (int i = 0; i < 3; i++) {
	std::shared_ptr<JSONobject> obj = std::make_shared<JSONobject>();
	obj->getObject()["index"] = std::make_shared<JSONdouble>(i);
	std::shared_ptr<JSONobject> obj2 = std::make_shared<JSONobject>();
	obj->getObject()["contents"] = obj2;
	obj2->getObject()["empty"] = std::make_shared<JSONobject>();
	array->getVector().push_back(obj);
}

While not super practical, it serves its purpose.

Saving

Now, each class needs to be added a print() method that saves its contents into a stream provided by an argument. This method’s implementation in classes representing objects calls the print() methods of the contained classes.

Some operations repeated, so the parent class was added two additional non-virtual methods, writeString() (for escaping newlines and double-quotes) and indent() (for indentation), both taking the stream as an argument. The indent() method also took the identation depth as argument. To allow the call stack to keep track of the depth the structures have to be indented to, an argument dictating the depth was added to the print() method.

To allow the whole thing capable of saving into files, an additional non-virtual method for the parent, writeToFile(), was added.

Here is one of the less trivial write() functions:

void write(std::ostream& out, int depth = 0) {
	out.put('[');
	if (contents_.empty()) {
		out.put(']');
		return;
	}
	for (auto& it : contents_) {
		out.put('\n');
		indent(out, depth);
		it->write(out, depth + 1);
	}
	out.put('\n');
	indent(out, depth);
	out.put(']');
}

At this point, I could try to write the JSON data into a file. I had to deal with a load of compilation errors, but they were quite obvious. It run at first try, but the formatting was terrible, with new lines placed chaotically and messy indenting making it less readable rather than more readable. After a short debugging, the indentation was correct.

At this point, the library could be used to save data, but it had to be read by some other application. The time elapsed was one and half hour.

Loading

From the point of software design, this part was trivial. One function parseJSON() that takes in a stream and returns a JSON object. The parser can have a variable keep the state of what it’s reading or it may recursively call itself and keep the state on the stack. I chose the second option.

The function first identifies the type it is reading (which can be done using the first character), then parses it and returns the created object. It ends with the stream on the position exactly behind its last character. This allows the part that parses the object type to use recursion to read its contents and then return correctly when it finds its ending brace.

It needed two auxilliary functions, readString() to read strings and correctly deal with escaped characters and readWhitespace() to read any number of tabs, spaces, newlines and commas (allowing parsing of ugly and in some cases also incorrect code). To avoid polluting the namespace, they were implemented as lambdas inside the function.

std::shared_ptr<JSON> parseJSON(std::istream& in) {
	auto readString = [&in] () -> std::string {
		char letter = in.get();
		std::string collected;
		while (letter != '"') {
			if (letter == '\\') {
				if (in.get() == '"') collected.push_back('"');
				else if (in.get() == 'n') collected.push_back('\n');
				else if (in.get() == '\\') collected.push_back('\\');
			} else {
				collected.push_back(letter);
			}
			letter = in.get();
		}
		return collected;
	};
	auto readWhitespace = [&in] () -> char {
		char letter;
		do {
			letter = in.get();
		} while (letter == ' ' || letter == '\t' || letter == '\n' || letter == ',');
		return letter;
	};

	char letter = readWhitespace();
	if (letter == 0 || letter == EOF) return std::make_shared<JSON>();
	else if (letter == '"') {
		return std::make_shared<JSONstring>(readString());
	}
	else if (letter == 't') {
		if (in.get() == 'r' && in.get() == 'u' && in.get() == 'e')
			return std::make_shared<JSONbool>(true);
		else
			throw(std::runtime_error("JSON parser found misspelled bool 'true'"));
	}
	else if (letter == 'f') {
		if (in.get() == 'a' && in.get() == 'l' && in.get() == 's' && in.get() == 'e')
			return std::make_shared<JSONbool>(false);
		else
			throw(std::runtime_error("JSON parser found misspelled bool 'false'"));
	}
	else if (letter == 'n') {
		if (in.get() == 'u' && in.get() == 'l' && in.get() == 'l')
			return std::make_shared<JSON>();
		else
			throw(std::runtime_error("JSON parser found misspelled bool 'null'"));
	}
	else if (letter == '-' || (letter >= '0' && letter <= '9')) {
		std::string asString;
		asString.push_back(letter);
		do {
			letter = in.get();
			asString.push_back(letter);
		} while (letter == '-' || letter == 'E' || letter == 'e' || letter == ',' || letter == '.' || (letter >= '0' && letter <= '9'));
		in.unget();
		std::stringstream parsing(asString);
		double number;
		parsing >> number;
		return std::make_shared<JSONdouble>(number);
	}
	else if (letter == '{') {
		auto retval = std::make_shared<JSONobject>();
		do {
			letter = readWhitespace();
			if (letter == '"') {
				const std::string& name = readString();
				letter = readWhitespace();
				if (letter != ':') throw(std::runtime_error("JSON parser expected an additional ':' somewhere"));
				retval->getObject()[name] = parseJSON(in);
			} else break;
		} while (letter != '}');
		return retval;
	}
	else if (letter == '[') {
		auto retval = std::make_shared<JSONarray>();
		do {
			letter = readWhitespace();
			if (letter == '{') {
				in.unget();
					retval->getVector().push_back(parseJSON(in));
			} else break;
		} while (letter != ']');
		return retval;
	} else {
		throw(std::runtime_error("JSON parser found unexpected character " + letter));
	}
	return std::make_shared<JSON>();
}

Note: Numbers had to be read using the same way they were written because the decimal point can be replaced by decimal comma in some localizations while is very annoying

When I double checked the code, fixed compilation errors and tried to run the result, it crashed with a segfault. Valgrind revealed it was a null pointer dereference of a variable that shouldn’t have been null. After a short debugging, it turned out that the parser returned a null pointer when the file was not found instead of an object representing the null type (I probably decided to have it always return something in the middle of writing). After correcting that, it worked flawlessly.

At this point, the library was fully featured, albeit not 100% standard compliant (because of accepting incorrect code with missing or extra commas) and not with a super practical interface. That didn’t matter in this case. At this point, the time elapsed was two and half hour.

Conclusion

This way, I made a 260 lines long full featured JSON manipulation tool for the small library I was developing. A detailed description of the library along with links to the entire source code can be found here. Of course, it could be faster (using unions) or more practical (overloading square brackets for accessing contained values), but such usually need much more code which goes against the purpose.

It was also a nice challenge that has shown that the idea that writing programs in C++ takes long is a myth.

2 thoughts on “Writing a JSON library in two and half hour, in C++

Leave a Reply

Your email address will not be published. Required fields are marked *