After diving through hard trying to recover old blog posts, I was left with a bit of a problem. Jekyll was putting my blog posts at lovely little URLs like
Those links are lovely — you don’t have to worry about URL collisions when you datestamp stuff. The problem is that many of the posts I had recovered were from different URLs. This post, under the previous incarnation of this site, would have probably lived at
cheerskevin.com/fixing-broken-links.html. The version before that?
cheerskevin.com/blog/fixing-broken-links/. If you tried to visit previously shared out posts, you’d get a lovely 404 page.
Jekyll does allow you to override where posts are placed, but I figured it was better to move forward, and get about setting up redirects.
The Nginx configuration for redirects are pretty darned easy. Here’s an example:
rewrite ^/no-place-like-home.html$ /2017/11/10/stuff.html permanent;
We specify a regular expression (in this case, we’re lazy and don’t even bother escaping the period, but it’s fine) for Nginx to match against. If it encounters a request that matches that pattern (
/no-place-like-home.html), it’ll send a 301 Permanent Redirect to
/2017/11/10/stuff.html. It sends a 301 instead of a 302 thanks to our
So all we’ve gotta do is snag all the old URLs that were shared out, and add a rewrite rule to our Nginx config for each of ‘em.
Fetching the old blog posts
Most of my blog posts, I submitted to my subreddit. So in the interest of saving time, I simply pulled stuff from there. It’s possible I’ve missed a few, or that I may have shared something on Twitter with or without a trailing slash. Ah well. Close enough is better than nothing at all.
Thanks to the internet, and my keen sense of laziness, I stumbled across crisbeto/subreddit-downloader. Nice and simple tool. Pulled down the repository, and a quick
npm install && npm subreddit had me off to the races. I told it to snag stuff from /r/cheerskevin, and got this lovely links.txt
No Place Like Home: https://cheerskevin.com/no-place-like-home.html Unburying the Lede: https://cheerskevin.com/unburying-the-lede.html Lets Start Reacting: https://cheerskevin.com/lets-start-reacting.html The HTTPS certificate on sudosongs.com expired about two months ago...: https://sudosongs.com/ I Love React: https://cheerskevin.com/i-love-react.html What is this rubbish?: https://cheerskevin.com/what-is-this-rubbish Need some help with KoS: https://www.reddit.com/r/CheersKevin/comments/5snbrv/need_some_help_with_kos/
I quickly went through and removed the links that weren’t blog posts, and then it was down to vim to delete everything up to (and including) the first
: character, replacing “https://cheerskevin.com” with “^”, replacing the end of each line with “$”, and prefixing the rewrite keyword:
rewrite ^/no-place-like-home.html$ rewrite ^/unburying-the-lede.html$ rewrite ^/lets-start-reacting.html$ rewrite ^/i-love-react.html$ rewrite ^/what-is-this-rubbish$
Now all we need to do is to find the new, correct URLs…
JS to the rescue
I initially thought we’d have to do some Bash magic to look at the publish date in our Markdown posts, but there’s a much easier solution. CheersKevin.com doesn’t have pagination right now. That means every post link is right there on our homepage. Popping open the web inspector, and we can run the following in the console:
Array.from(document.getElementsByClassName('post-link')).map(a => a.href).join("\n")
First, we grab every element on the page with a “post-link” class. Then, because a collection of DOM nodes isn’t actually an Array, we make it one with
Array.from. We use
map to pull out the href of each link, and finally join everything with a newline character to make it easy to copy and paste.
https://cheerskevin.com/2017/08/07/the-automation-fallacy.html https://cheerskevin.com/2017/07/08/starting-over-again.html https://cheerskevin.com/2017/04/10/no-place-like-home.html https://cheerskevin.com/2017/04/08/unburying-the-lede.html https://cheerskevin.com/2017/04/01/lets-start-reacting.html
Same old, same old vim to the rescue. Strip off the “https://cheerskevin.com”, add a “ permanent;” to the very end, and we’ve got nicely formatted second-halves to our redirect rules.
/2017/08/07/the-automation-fallacy.html permanent; /2017/07/08/starting-over-again.html permanent; /2017/04/10/no-place-like-home.html permanent; /2017/04/08/unburying-the-lede.html permanent; /2017/04/01/lets-start-reacting.html permanent;
Now all that was left to do was to pair the old rules with the new. Some posts didn’t have links, but most did. So it’s just a matter of merging those that overlapped. In the small example I used for the post, that results in these three rules:
rewrite ^/no-place-like-home.html$ /2017/04/10/no-place-like-home.html permanent; rewrite ^/unburying-the-lede.html$ /2017/04/08/unburying-the-lede.html permanent; rewrite ^/lets-start-reacting.html$ /2017/04/01/lets-start-reacting.html permanent;
Vim: the unsung hero
Honestly, this sort of work is trivial — and I could have written out each of the ~50 redirect rules by hand in a pinch. But being able to use vim to quickly add stuff to the end of each line, or delete everything up to the third
/ character…it makes the process so nice and simple.
I can absolutely imagine myself reaching for a scripting language at some point in my past just to do stuff like this — reformat data into exactly the string that I want. But being able to work in an editor that makes those transformations intuitive (and easy to reverse if I screw it up) is a much better solution.
Anyway, the gist here is: the inbound links should all be working again (even if some of the content is a bit outdated).