(Is what for which they cried out, in the long dark night of IT bogosity. Then a clarifying light came, and a voice replied… “This is the way, this is the fresh light of day, this is how K&R would do it…”)
May I, as a seasoned (if not veteran) coder, please propose a sensible one – a standard for the storage of contacts – that we might promote into the limelight… and ever after be fondly remembered (if not loudly celebrated) by hackers everywhere, for having done so?
Plain, Simple, “key: value” text files, like apt and dpkg lists, email and http headers, with records delimited by a blank line. This is the best file format for nearly everything. Not CSV, not SEXPS, not gzipped XML (spit), not binary bitblob monstrosities. Simple records of key value pairs.
We use this format for HTTP and Email headers, and for your favourite Linux Distros’ package lists, etc., etc., because it’s very good: it’s human readable/writable, and it’s ultra simple. Even a mouse can understand it, with a little coaching. It’s still sadly underused.
Look, something like this:
Name: Fred Nurk Address: 1234 Numpy St Worthingthwaite UK 123545 Phone: 012345678 Phone: 564387347 Email: firstname.lastname@example.org Email: email@example.com Name: Emily Jane Nurk Address: 256 Bitwise St Camelhide Australia 3243
Is it simple? It is simplest.
Is it good? It is good.
Comprehensive too? Verily So!
I’m sure you all get the idea. If not, I can specify the simplest and preferable variant in meticulous detail within 1/4 page less flowery / befuddled prose. You can even store binary values in it! (if don’t mind the long lines). A double blank line may mark the EOF, can be useful.
You can put all the data from a whole RDB in one of these files: just add a “table: foo” line, or I prefer “type: foo” to each. It’s in fact more flexible than a set of fixed-width tables, more like an LDAP directory. If you’re worried about the disk used by repeated key names, and “: ” tokens, compress the file. You’ll find it turns out much much smaller than a normal binary database, probably even when you compress that too.
A longer (615K) example file, formatted by me, loads in 0.13 sec on my slow netbook: http://sam.nipl.net/code/talaga/dict. Maybe no ‘advanced’ features such as repeated keys or multi-line value there.
To use ANY other file format for most any sort of records is bare-naked insanity, for this is the file format of the Gods, simplicity itself: it’s exactly what any sensible human would write on paper for an address (well, we might skip the Keys since we have more brain). It’s trivial for computer programs to parse and generate it (10 lines of C?), it’s completely generic, and – thank the Gods especially for this – it’s not XML. Your IRL dead-tree address book looks like this too.
For indexing, if needed, we can easily enough write a ‘textmap’ program that for a file “addresses.txt” will create “addresses.ix” its index. The index could also be a text file, or a binary file, but with fixed width fields. The well-esteemed Postfix does similar for its text databases with the ‘postmap’ program.
If you are concerned about complete index rebuilds, please don’t be, at least not for this application. Indeed this also can be solved by using padding within the main database text file, but we lose some simplicity then.
Go forth, spread the word, join the revolution, baptise those confused hackers in the name of that great god Simplicity. Thx for reading anyway. Comments here, as usual, are published only at my whim, so be polite, smart and unscathing… and they’ll likely reach the WWW. If not (polite, …), there’s no’ a chance (… WWW). This is my personal blog (somewhat like a home). It’s not some flame-festfully bogus unmoderated web forum, email list, IRC channel, or netnews group. Be kind to me.
Have a nice day. 🙂