The LiveJournal to WordPress migration

Thought I’d outline a bit of what I did to get all my posts and tags migrated from LiveJournal to WordPress 2.7.1. Note that this information will be redundant soon enough — there’s much better LJ import support in the latest WordPress trunk. It’ll even pull in your ‘Current Music’ and ‘Current Mood’ fields, which I couldn’t do. :-(

Some background first. LiveJournal lets you export your blog posts one month at a time. You can feed these files to the WordPress LiveJournal importer. I’ve been blogging there since December 2003, so that was definitely not an option. Some digging around eventually brought me to ljdump. This is a really nifty tool, even if you just want to back up all your posts. It dumps your data into a large set of XML files, which you can collate with the convertdump.py script for uploading to the WordPress LiveJournal importer.

There was one hiccup here — a lot of the XML files corresponding to my earlier posts (at least) had an extraneous ASCII character 4 at the end of some lines. I had to use a simple for i in /*xml; do sed -i -e s:$'\004':: before using convertdump.py, and things were back on track (sed ftw!). I used the script to make one big XML file with all my posts, and fed it to the LJ importer, and all my posts were in.

But my tags, unfortunately, were not. ljdump happily pulls the tags from LiveJournal, but the importer just ignores them. I found a sort-of patch to fix this, but it seems to be quite antiquated. Based on this and the WordPress importer (that’s the importer that allows WordPress to import from another WordPress blog’s exported output), I wrote my own patch to import LJ tags (against WordPress 2.7.1). Just cd into your blog directory and do a patch -p0 < wp-livejournal-import-tags.patch to use it.

That’s it — I dropped all the old posts (requires a plugin to do it all at one shot), and then imported the big XML file again, and voila!

Trivial as it was, it was great to see how easy hacking the WordPress code was. There’s more to come in days ahead. I hope it remains this easy. :D

Update: Just noticed that the imported comments are not threaded. This kind of blows, because there have been some really long threads on some posts. I guess I’ll wait till the new WordPress goes stable and do a re-import. (file under #suckage)

5 Comments

Add yours

  1. Guess I will just wait till the new WP importer is in place

  2. Author of convertdump.py here:

    Just wanted to point out that I found a good procedure and set of patches to WP that enable threaded comments.. I followed them on a test set of data and it seemed to work flawlessly. I haven’t tried them since I wrote convertdump.py (I still haven’t actually pulled the trigger on switching to WP from LJ yet), but I see no reason why they wouldn’t work..

    http://rusty-halo.com/wordpress/?p=2256 http://rusty-halo.com/wordpress/?p=2535

    The second article specifically deals with threaded comments, but I found both articles useful.

    Also, be sure to note that there is no way for convertdump or ljdump to handle “lj-embed” tags. LJ’s XML-RPC protocol doesn’t provide a mechanism to request that these tags be expanded to include the embedded content.

    So if you ever posted any YouTube videos or the like to your blog, you should be sure to copy them over manually. I’m going to be adding a feature to convertdump which will flag any entries which contain lj-embed tags to make it easier for you to do this. I’ve also toyed with creating a third tool which will use something like the Mechanize library to crawl LJs site to get the content of the lj-embed tags that way. I think that would be more useful to users who have quite a bit of embedded content. It probably runs afoul of LJ’s policies, though. :)

    • Thanks for the pointer! Might be worth pulling in everything once again. :-)

      Fortunately, I didn’t post too much in the way of embedded content. This does blow on LJ’s part, though, and reinforces my decision to leave them. The screen scraper might be border-line w.r.t. LJ’s policy, but they can hardly blame us if they make exporting our own content so hard.

      I was looking at the new importer (http://svn.automattic.com/wordpress/trunk/wp-admin/import/livejournal.php) to see if they’re doing any magic for lj-embed, but it only seems to handle lj-cut and lj-user.

  3. I actually did the one-month-at-a-time migration. It doesn’t take that long, it’s just monotonous.

    • I guess the whole reason I put this much effort into it is that I was annoyed with LJ for making the process so hard! I’m quite sure it doesn’t take that much of an effort for them to support exporting your entire history. If it’s time consuming, and they’re worried about DoS’es, they can always rate-limit the service.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.