My personal website has pushed the development and removal of a decent amount of software.
Back in my very young youth, I used to hand-write HTML, copying and pasting a template and inserting content where needed. This has its obvious pitfalls which I discovered early on when I had more than two or three web pages which needed common changes made to them. I subsequently discovered PHP and used it to
include("header.php"); and so on. This solved all my problems until I realised all my pages had the same
<title>. This led me to use my mad PHP skillz to, for every page, write a simple script which set the title and content file-name for that page, then included a common script. This suited me for a while on the small scale.
Closer to now
While it was functional and worked, it had its pitfalls. I found myself telling my main script that the content file for
content/x.html for pages foo, bar, baz and qux—an obvious pattern had emerged. Did I really keep having to tell the computer about this pattern?
It was about this time that I was starting to see a couple of websites dropping file extensions and turning everything into a "folder". I looked into it and ended up writing a bit of PHP which would take a look at the request URI, choose the appropriate HTML file to deliver, tack a footer and header onto it, and send it off. I also configured a web server to redirect all requests to this script. I wrote in some simple stuff to allow me to replace various macros like
<!--GEN_TIME--> with meaningful values generated at run-time. After tidying it up, I named it Yarmukle and used it on my personal site to give it a flex. After it settled in and I fixed up the new issues, I actually used this on a production site. It was pretty lightweight and did the job. The only downside was the fact that for every page load, it had to generate it on the fly using the header, content and footer. But for anything but sites with very high volume this is alright.
I subsequently learned about Markdown and wrote a parser for it. It had its quirks, e.g. it only supported the
[foo](bar) style of linking and inserting images, whereas true markdown supports being lazy and telling the parser the
src elsewhere in the file. At one point, I even had a parser tacked on which could parse my own LaTeX-inspired mathematics markup language into HTML+CSS. Upon reflection, it actually looked pretty sexy, but I really had no business implementing such a thing. I also added various non-standard aids on top of the parser which would help make my web pages more typographically correct, such as converting various quotes into left and right quotes. Since Markdown is a language taken from how people write on plain-text media, such as e-mails, I found myself wanting to take strings like "(C)" or "(TM)" and turning them into their correct symbols. I used to do this by typing the HTML entities directly, but that sucks for readability and plain-text-ness. Naturally, I wrote this in too.
Now that my pages were slowing down (still well fast though), I decided that it might be a good idea to write a caching system into this growing monster. By this point, the "framework" (now named Yardle) was already using PHP's output buffering to capture the output (for the macros). So I wrote some code in which checked the cache dir first, and if it found no cached version of a page, or the cached version was expired, it would re-generate it. I don't remember the specifics, but I saw a speed up from something like 0.0001 of a second per page load to one one thousandth of that in my cute little
GEN_TIME macro thingy.
Reflecting on it now, that piece of software grew into a medium-sized monster… Is there an opposite to rose tinted glasses?
Closer-er to Now
Sometime in 2015, I slashed out most of Yardle and cut it back to a tiny PHP script which looks at the URL, sees if it points to a meaningful file and if so, dumps it to the client, otherwise dumps the custom 404 page. It also applied redirections for me, issuing an HTTP 301 redirection where appropriate. I knew that I can perform these redirects (faster, probably) in the web server configuration, but I had to favour this method for two reasons. Firstly, the order of rules doesn't matter and secondly it's portable between web servers. This changed in 2018 after some reconsideration.
This actual site
By 2015, I had grown up and matured as a programmer some more. Naturally, I leaned towards using Perl where I could.
Currently, I use Perl with Text::Markdown to do the heavy lifting. I also implemented a small handful of the useful bits of the extensions I added onto Markdown. For example, vanilla Markdown doesn't implement super- or sub-script. As an aside, this stuff is really cool for stuff like:
Sorry, I'm just playing now. I'll get back to the point.
So there's a Perl-driven Markdown→HTML stage which generates full, static HTML pages pulled from template HTML and Markdown content. There's a makefile wrapped around this script so that when I
make, only the pages which need to be generated over again actually are. Then there's a git repository somewhere which I push to which actually runs
make and all that by itself when I update it. This means that all I need to clone and to work on is a copy of the site's assets (CSS, images and what have you), the Markdown content and the template HTML. This makes for the smoothest maintenance I've ever experienced.
I still use a similar concept to what I was trying to emulate initially with Yardle, where I get to have pretty URLs like
/programming/pi-thagoras/ when I have a slightly less tidy structure working (file extensions, etc.) in the background. I had intended to rewrite the PHP of Yardle in Perl, but never thought it worthy of my time. As of 2018, what is worthy of my time is to remove the dependency on any CGI whatsoever. I now have everything programmed in the web server config, redirection-wise.
I still aim to switch to a smarter Markdown parser so that I can generate TOCs (tables of contents) and other neat stuff. It would be interesting to look at an extensible Markdown parser so that I could re-implement things like smart quotes and escape characters properly.