this is part of a series of posts i wrote for my software engineering course at university of texas.
a significant portion of the first big group project in my software engineering course was deciding on a data format and schema. we were required to use xml, and though we weren’t required to agree on a schema, we will have to import eachother’s data later in the course, so we made our future lives much easier by developing and adhering to a shared schema.
json is designed to be lightweight and easily parsed. it assumes several implicit types including:
while this may seem limiting, it turns out to strike a good balance between simplicity and power, and it is gaining (has gained) popularity in arenas including configuration, archiving, and most prevalantly api’s.
yaml aims to be human readable above all. it is pleasant to write and read, and it includes implicit types like json (though it has many more).
unfortunately all of this convenience and syntactic flexibility isn’t free. yaml is very difficult to parse – compare the lines of code in the pyyaml versus simplejson python libraries. this also makes it difficult to standardize implementations and eradicate security issues.
while its complexity makes it undesirable for something like an api (where you cannot tolerate ambiguity), its sweet spot is configuration files and things like internationalization string bundles where you just want something readable.
xml is extensively standardized, available in nearly every language (often several times over), and very well understood by the industry. it is the most expressive of the formats presented here, which explains its applications to not only the aforementioned tasks, but also more intricate problems such as user interface definition (see .xaml, .xib, glade, and arguably web apps written in xhtml).
xml is also the most verbose of the formats. if json is a gentleman’s handshake, then xml is a 100-page legal document. while json and yaml have implicit types, xml leaves that function up to the application, which is less convenient, but more extensible.