Jamie Lawrence

Moving from Wordpress to Middleman

I’ve been unhappy running Wordpress for a few years, particularly the effort required to keep the server, the Wordpress instances, plugins, and themes updated. Of course, you don’t have to do this, and most of the internet doesn’t, which is why we have botnets.

I’m perfectly capable of running my own servers. Or rather, I know enough to do it to an acceptable standard, and know more than enough to realise that unless running servers is my full-time focus, I should not be doing it. It’s boring, tedious, and requires attention. I have better things to do with my life.

I’m also not too happy with Wordpress itself: every theme looks the same and doing anything custom requires more knowledge of PHP and Wordpress than I want in my brain.

Moving to Static

I’ve built a few sites recently with middleman and I’ve really enjoyed it. I can dump html files in the source directory and they’re done. It’s all ruby so it’s familiar and hackable (for me). It uses a lot of Rails-esque helpers and conventions so there’s not too much of a mental shift. I also played around with Hugo for a few weeks, and liked it, but I still couldn’t wrap my head around all the different “things” (templates, themes, types, sections, content, taxonomies, archetypes, etc). It’s mind-boggling. It’s also written in Go and whilst I like Go, I don’t want to learn it.

There’s some disadvantages to middleman, most notably that it’s pretty slow (a full build for my 900-post blog takes about 5mins). But even in the worst case, I know I’ll have a directory of fairly standard Markdown posts I can import elsewhere.

My process was… less than smooth and only funny in hindsight.

Export the Wordpress posts to markdown!

I installed the Jekyll Exporter since it’ll export Markdown files (even using the Wordpress markdown content if you’ve been using Jetpack’s markdown feature).

Except I needed PHP 5.5 so I upgraded PHP

Except that Ubuntu wouldn’t update PHP because of apt-related bullshit

Lots of twiddling.

Fuck this. Restart the server. Nope.

Not playing this game: let’s upgrade Ubuntu from 14.04 to 16.04!

Nope. Try again? No… and the install has hung. Yep, connection dropped this should be…

yeah, the server is now banjaxed.

Reboot in rescue mode. Mount the disk. Fire up a new DigitalOcean droplet. Rsync the files over to the new server. Fart around trying to get MySQL running again. More farting to get MySQL to really run instead of almost-running in a it-can-see-the-tables-but-not-the-contents-sort-of-way. Re-construct the nginx configuration. Point a new subdomain at the server. Faff, faff, faff. Blog index loads. Posts don’t. Fuck it anyway, the admin panel works and that’s all I need: disable all the plugins, revert to the official TwentyFifteen theme.

Right, now I finally can install Jekyll Exporter.

Run the export from the command line and it seems to be… no, that didn’t work.

Install php-zip package. Re-run the export. Tons of warnings. Long pause.

And, finally, a zip file!

Create middleman site

I’m using my own template for this which is configured for hosting on Amazon S3 and served by Cloudfront

Copy in the markdown files and rename them to *.html.md

Rename the posts

Then began my little adventure in sed which is perfect for changing bits of text in tons of similar files.

Remove the id attribute

Middleman reacts badly to the id attributes in the markdown frontmatter. Let’s remove them since they’re superfluous

sed -i "" -Ee '/^id: [0-9]+$/ d' *

Remove non-ASCII characters from title

Middleman also chokes on non-ASCII characters in the titles since this is used to create the output file names. My culprits over the years was mostly the em-dash. Let’s find all those characters and fix them manually

pcregrep --color='auto' -n "^title: .*[\x80-\xFF].*$" *

Rewrite all the image paths from the wordpress/wp-content/uploads folder to the images folder and use relative urls

sed -i "" "s?/wordpress/wp-content/uploads?/images?g" *
sed -i "" "s?http://jamie.ideasasylum.com/images?/images?g" *

The feature images were added to our markdown frontmatter as the image attribute. We can add the post images to the middleman template:

image_tag article.data.image, class: "img-responsive" if article.data.image

Convert wordpress caption shortcodes to simple markdown image links

sed -i "" -E 's?\[caption.+\]<a href="(.+)" .*><img(.*)/></a>(.+)\[/caption\]?\![\3](\1)?' *
sed -i "" -E 's?\[caption.*<a.*href="/([^ ]+)".*</a>(.*)\[/caption\]?\![\2](\1)?' *

Fix the title attributes

Wrap title attributes in double-quotes because single-quotes breaks anything with an apostrophe in it

sed -i "" -E "s/title: '(.+)'/title: \"\1\"/" *

Use Wordpress categories as our tags

I have a total mess of tags and they’re pretty irrelevant these days. Let’s remove the tags list

sed -i "" '/tags:/,/---/ {
  /tags:/d
  /---/ !{
    /  -/d
  }
}' *

…and rename the categories list as “tags”

sed -i "" 's/categories:/tags:/' *

We could have created a custom collection in middleman but I found there are some problems paginating that collection and it’s not as functional as the in-build tags functionality.

Syntax highlighting

I have a fair bit of code in some posts so I need a way of rendering it. This did the job for me

gem "middleman-syntax"
gem "redcarpet"

Configure how we want to parse the markdown:

set :markdown_engine, :redcarpet
set :markdown, fenced_code_blocks: true, smartypants: true, autolink: true, footnotes: true

and then adding a Pygments theme.

I’d gotten into the habit of just pasting in a link to a tweet and letting Wordpress render the rich tweet card. I reproduced this functionality in javascript instead of messing with markdown rendering.

Let’s deploy!

The first small problem:

https://twitter.com/ideasasylum/status/814510537509863424

but the results are pretty great!

https://twitter.com/ideasasylum/status/814581929160871936

Let’s talk about Cloudfront

Cloudfront is pretty good, if only because you can use it to get a free SSL cert. But it’s a royal pain in the arse if you want any real control over what files are being served, what are cached, etc. You can using asset hashing but you’ll still want to invalidate the html files so the latest can be served.

I’ve been using middleman-cloudfront in other sites to selectively invalidate the Cloudfront distribution but my blog is much bigger. It has ~2000 html files and Cloudfront limits the number of invalidation URLs to 1000/per request. And each invalidation request can take 10-15mins. And middleman-cloudfront blocks as each invalidation request completes. So deploying the blog takes about 30mins. Nope!

Oh, and did I mention that Cloudfront charges for each invalidation? Yeah, I spent $25 on invalidations just testing the blog deployment.

Cloudfront invalidations can really add up!

So let’s replace Cloudfront with another content-distribution network that a) serves files from s3, b) provides a free SSL cert, c) has an API to easily and quickly invalidate files. KeyCDN fitted the bill and the costs are on a per-use basis and quite reasonable.

And it doesn’t charge for invalidations!

A few hours of click, tap, tap, tap-tap, tapity-tap-tap and I had a working version of a middleman extension which invalidates KeyCDN zones (largely based on middleman-cloudfront).

Summary

I’ve skipped over a lot of the general middleman configuration, and how I created the theme (generic bootstrap), but this catalogues my little Christmas adventure.

Am I happy with the change? Well, this is my first post so we’ll see. I love the site performance from serving static files through a CDN. I love that I can hack the layout and rendering without having to delve into the ugly guts of Wordpress. I freaking love that I can ditch “look after servers” item from my todo list.

I can see myself getting frustrated by the build time but I also think this could be drastically reduced with a slight modification of site structure and some code optimisation.