Splat.cx: a long winded history lesson

So after writing a few posts in February the only feedback I received was it doesn’t render well on a phone. Nothing about the content or the opinions expressed, just that. Oh well. That jab in the guts lead me to revisit migrating to a newer codebase and more importantly, a codebase someone else maintains. As that will more easily enable deploying a new theme, which was the primary complaint.

This feels like something I’ve done before, so lets cast an eye back into the history books.

dictionary page showing history

Part 1: Archaeology of a code base

So here we go for a history lesson. I’ve moved between a fair few content systems over the years. Some home grown, some from elsewhere and even that pattern seems cyclical.

From	To	Name	Notes
Jan 2000	Oct 2000	Manually edited	Manual html and basic php templater
Oct 2000	Nov 2001	SIPS	Polls, user accounts, comments etc
Nov 2001	(at least) Apr 2005	SIPS majorly refactored	User accounts and comments removed, added lists, story trees in 2004
(not earlier than) Apr 2005	Oct 2009	bBlog	(export tool from SIPS(refactor) to sql targets wrong schema)
Oct 2009	May 2011	Wordpress	Tags not in use before wordpress
May 2011	Apr 2022	StaticBlog	My C# based tool
Apr 2016		Hugo	Initial migration attempt, not completed
Jun 2017		Hugo	Another migration attempt, not completed
Apr 2022	Future	Hugo	Another migration attempted and cut over

SIPS (Simple Internet Publishing System)

It was a php based blog tool along the line of what the slash engine could do, except it stored data in flat text files. Last release was in 2002 but I had majorly refactored/rewritten it before that release (referenced in a post 11 Nov 2001).

The rewrite ripped out user accounts and comments, added ordered lists, and later on (Jan 2004) added support for a alt/parallel blog (story tree). That idea never continued on to future versions.

The code includes an exporter script which output sql insert lines. This showed up in Aug 2004 so a migration was being worked on at the time, last alt/parallel post was Sep 2004. I haven’t isolated the cutover date precisely but I know it was after April 2005.

bBlog

bBlog was also a php based blog too, which used smarty templating and mysql for data storage. It had a plugin capability and was fairly full featured. Not much remains from this era in my backup archives.

There’s a gap on archive.org also for this entire time, so there was probably an overly aggressive robots.txt present. The SIPS exporter produced sql inserts but column names and order didn’t match any version of bBlog I could get today. It’s possible it was exported to b2/cafepress which is a direct predecessor of both bBlog and wordpress or I just manually massaged the data in.

An sql dump from Oct 2009 indicates bBlog 0.7.6 was in use at the time, which was the final release (back in 2005 however). The same backup contains an sql dump from wordpress with the same posts present and prior to this point in time posts were not tagged, so I’m fairly certain of the end date for bBlog. A post drawing a line in the sand marked this transition, 24 Oct 2009.

Wordpress

Wordpress was the in platform at the time. Lots of themes and plugins was the idea, but the reality of wordpress is you’re a honeypot for all the malware of the world. You had to keep on top of that, and auto updates couldn’t be relied on either. This lead me to move away from a dynamic site entirely as the maintenance it required was more time consuming than I wanted, so it didn’t justify the dynamic features. I can remember two themes in use at this time, but haven’t found any archives containing the complete site code. Around this time (2011) I decided to move to a static file based solution, and ended up writing my own.

StaticBlog (private)

This home grown tool, uses markdown for file storage on disk (with metadata headers per file) and a C# command line app to generate all the files. Templates are compiled at runtime and executed. One advanced feature of this codebase was the tag cloud. The data was imported from wordpress’s mysql schema via a little tool I knocked up (also in C#) which spat out the text files with various wordpress-ism’s formatted out and the metadata headers at the top. This codebase saw a few themes over the years, the original pixel perfect of the previous wordpress theme - which was done intentionally so no one would notice the change. Later (2013) a bootstrap based theme was built from scratch and that is essentially what is present right through into 2022.

This tool ran in mono for cross platform support, and was self contained (exe and a few dlls in a folder). The biggest issue with it was the templates. It was written to be like a T4 template (so dotnet native), but compiled at runtime. The exact library was TemplateMaschine. This was fine (and fast) when they worked, but any syntax error was a total pain to track down - and worse, any parser errors necessitated step through debugging to isolate the issue (I found a new one just last month when banging away at a new theme).

Based on the converter scripts I found in my source tree, I had two separate initial attempts at migrating to hugo, both quite a while ago though, so the itch has been there for ages.

Hugo

In 2016 and 2017 I looked at migrating to hugo as it ticked many boxes of what I wanted in a static blog tool - single binary with no dependencies. That ruled out tools written in python/ruby or any other similar language which is an abomination of package garbage on a modern OS. I wrote new converters for each attempt to translate the flavours of markdown and dump files on disk, however I didn’t swap over either time because of niggling issues in the tool or theme. It just couldn’t do what I wanted - and having not learned go at that stage I wasn’t going to dive in and extend it.

The tool migration was not without issues - needing to upgrade the current layout and not fully grasping the apparently very complex templating environment in use I opted to go and find one that I liked instead. Then I only had to bend it to suit exactly what I wanted in the site. In doing this I came across a handful of bugs (either still open or auto-closed as stale even though they’re still present). Fortunately I was able to resolve or work around these issues by editing the template source. This is slightly streamlined by hugo’s ability to inherit/stack themes so you can use someone else’s theme and then layer on top a subset for your personalisation of the theme. Doing this with hugo modules makes it even more streamlined.

Part 2: Post formatting clean up

So the C# code lasted until today when I’ve actually cut over to hugo. In the years between when the C# tool was written and now the style of markdown in use has evolved substantially, which made a lot of what the old code parsed actually not work in a modern (strict) markdown parser. The one major thing I especially wanted to maintain was the ability to keep post images and text together (or very close), rather than having them in parallel structure a few levels deep - to me it mattered that the content was not split. The content is king, and it’s layout on disk should make sense to the author - and in my opinion - not necessarily relate to the published layout in any way (because the rendering engine is responsible for and controls the finished layout).

Hugo can do this with page bundles - however the compromise is your post has to be a folder and it’s filename has to be index.md. To migrate I had to do some translation on the markdown dialect too. So for this third attempt I wrote another converter from scratch (but did find the 2016 and 2017 ones after it was working) - which was a good exercise as the methodology I used changed over the years. The bulk of the markdown format changes I was able to clean up with some regular expressions and some content aware parsing (where changes implied formatting intent). The headers were a simple translation just to turn the old key=value pairs into yaml. All that remained at the end was a bunch of one off corrections to be made so I just did those manually. It became apparent too that the previous migrations were not perfect either, so I fixed up a bunch of mangled html inside markdown and half translated bits which caused rendering errors. I did translate all of the posts, not just what is visible on the site today, and the bulk of these major errors were in the much older posts.

Fixing up these posts meant skimming or reading them. It meant checking included images showed up and captions were correct. It meant changing previously ambiguous/custom markdown syntax into commonmark’s more defined syntax. One thing I didn’t want to do was to reformat posts. Most of the limitations I faced were due to old markdown allowing html pass through for language deficiencies, and now having html pass through disabled (by default) I had to make some adjustments. A quick braindump of these were:

Superscript. Mainly used for footnotes, so they were updated to footnote syntax. Other superscript use I went to more traditional markup as the standard isn’t supported. New standard is to wrap in caret’s x^2^ would render as x^2^ (x squared) if it worked.
Subscript. Used in fewer places and to indicate number base, so really needs math markup. Reverted to more basic markup.
Strikeout. Fortunately wrap in double tilde is now wide spread for strikeout. ~~example~~ renders as ~~example~~
Underline. Would you believe this isn’t defined? Traditionally markdown uses single and double asterisk for emphasis, single is italics, double is bold. Alternatively you were allowed to use underscore the same way. The Markdown parser I used in C# varied this, single underscore meant underline. Double was still bold. So I had to fix them up. These days using underscore triggers a markdown lint warning.
Code fences have improved a lot, so html inlined pre blocks had to be removed. But I also had inconsistent use of block tabbed indents for preformatted text. Sometimes without blank lines before or after. Bunch of combinations to clean up but easy enough in the converter.
Emphasis on a link. Do you put the *’s inside or outside the []’s. #OCDthings

A more complete list of the markdown features supported by hugo is here.

Part 3: Post content clean up

Old posts are a window to a previous simpler time.

Probably a more interesting thought than the codebase or markdown syntax changes, aren’t you glad you skimmed over all that down here?

The simplest statement would be the posts varied in quality, size and frequency. But the real eye opening was the posts content. Ranging from late night gibberish during uni assignment cram time, though some deep and meaningful but probably misinterpreted or misunderstood feelings, a fair amount of semi technical challenges (with some MacGyverisms) and too many one liners about film or tv. If I had to select a word to sum it up, it would be crap.

At times there were huge gaps between posts too, when interest just wasn’t there or was just simply too busy; For a long time the longest quiet stretch was Jun 2007 through Oct 2009 (28 months), but that’s now been blown away by the latest 55 month gap from Jul 2017 to Feb 2022. But rewinding from there makes it worse - Jul 2017 was a burst of 7 posts in a week following a 14 month gap from a single post in May 2016, which itself followed a 22 month gap. That sequence was Jul 2014 to Feb 2022 and included 8 posts in the 91 months (7 years 7 months). Hopefully I can improve on that posting average without a terrible drop off in quality. Outside of those times it was common for a post gap to be from a few weeks to 5-10 months. I’m not the posting every day type (and when I tried that in Apr 2001 the quality and subject matter died faster than, umm perhaps this download of q3test over dialup)

q3test download failing

A few years back I did a great purge and hid the least professional posts, which hopefully improved the average - what you’re seeing now is the bit above the water on this iceberg. There’s 30 visible posts today, and 246 that didn’t make the cut.

Part 4: How deep does this rabbit hole go?

To come up with this garbage I actually put a boat load of time into figuring out when various cutovers occurred. I even considered building a git repo showing the evolution over time from all of the backups. Similar to (and inspired by) the ancient unix history repo. I didn’t go quite that far (yet) but did manage to locate 29 snapshots of the site from Aug 2000 through to Feb 2005. From there through Feb 2011 the site was dynamic and needed the database dump also to be useful - of those I found only a single backup containing both wordpress and bblog dumps from 2009. This gap I can only attribute to me moving from haphazard and disorganised backups to more structured backups. But also between moving hosting providers.

For a static file based site, the disorganized random zip of a folder and stashed somewhere survived really well. The only snag was some of them lost dot files in the top level folder. These archives also preserved the file timestamps really well. However they were random and disorganised, so there was huge gaps in their dates and they were not located in a single well organised place (data management is hard). With these dumps I can also derive which upstream codebase was still in use and any custom modifications or bug fixes I might have done.

The database driven dynamic sites were hosted on a more professional platform, where I had cron’d up backup jobs shipping tarballs and sql dumps off box regularly, with basic retention rules. They were backup and not archive so not kept long term (but also because there were paying customers at the time the size was much bigger). When that hosting platform shutdown and I moved servers again those backups aged off. Eventually the final backups from there were dumped, possibly to save disk space and all that remains is a single snapshot from 2012 which still had the files from 2009 in it. So although disk space is a solved problem, it can be an expensive problem, and is no replacement for archives.

Looking forward perhaps I should pay closer attention to things like archive.org to ensure snapshots are saved semi-regularly, and that I keep a regular archive tarball in a sensible place. Since the static blog C# tool was in use, I can re-generate the site with any number of the posts, in any of the themes. I’ve carefully archived that tool, all the posts, it’s source repo and some notes into a sensible place just in case. The content itself was not kept in a repository though, so didn’t have version history attached (1 post per file sort of doesn’t need it.)

With hugo I’m also keeping the site content in a git repo, so as long as I can keep some kind of archive for it maybe in 10 years time there won’t be a gaping black hole of nothing in this decade. Oh but snafu, using hugo modules for the theme include means it depends on an external repo. An alternative there was git-submodule which has the same limitation. Another hope is with a more modern markup language the posts appearance can be better formatted for a more professional look, who knows, maybe I’ll post something worth saving one day.

No new mail alert

Or perhaps not. This did distract me from writing about all the things that have changed over the past few years. Oh well, there will be another time for that. 🤷‍♂️