Lessons Learned: Natively Compiling Tidy HTML for Heroku

Recently I was working on a project and wanted to be able to utilize Tidy to clean up some HTML output. I added the tidy_ffi gem to my project and voila, it worked! Or, to be more specific, it worked locally.

Once I pushed to Heroku I started running into trouble, namely that libtidy.so, the dynamically linkable native library that tidy_ffi depends upon, wasn't found. Uh oh.

Getting My Hands Dirty To Get Tidy

Before yesterday I knew about Heroku buildpacks in theory. I also knew that I really didn't want to have to use one to solve this problem. My first clue came in the form of a Stack Overflow post in which someone used Tidy on a Bamboo-stack application but was having trouble migrating it. Aha! Surely this will solve my problem.

So I rolled up my sleeves to do the kind of low-level work I usually try to avoid (while still trying to avoid as much of it as possible). I used heroku run bash to shell into a fresh Bamboo app that I created and then used SCP to copy out the libtidy.so file there. I added it to my repo, followed the instructions on the StackOverflow post, and pushed. And the app came crashing down.

Sha Sha

As it turns out, since the post was authored Bamboo and Cedar have diverged in their precise Linux installations. The versions are the same, the git SHAs are different. C'est la vivre. Now we turn to more complex solutions.

I knew that I would need to compile Tidy myself, but how? As it turns out, Heroku offers a tool called vulcan that allows you to create a build server in the cloud and compile binaries that are compatible with Heroku (because they're compiled on Heroku). After a few hiccups, I had my build server up and running, but now I needed to build from source!

Tidy is an old project. I mean, it's a really old project. It uses CVS as its versioning system. Unable to check out using CVS as per the project's instructions (I'm still not sure why this didn't work, but it probably has something to do with the fact that it was CVS), I instead downloaded a tarball from browsing the CVS repo on SourceForge.

Once I had the source, it was time to build it. Tidy doesn't have a typical structure for a buildable library, but after some experimentation I finally figured out the necessary incantations:

vulcan build -v \
  -s ~/Downloads/tidy \
  -p /tmp/tidy \
  -c "sh build/gnuauto/setup.sh && ./configure --prefix=/tmp/tidy --with-shared && make && make install"

Some notes about what's going on here:

  • -s ~/Downloads/tidy was the directory into which I downloaded Tidy's source.
  • -p /tmp/tidy sets the prefix on the Heroku filesystem. Since Heroku apps can only write to /tmp this needed to be inside /tmp.
  • -c "..." I used the same prefix when configuring for make to build to the right directory. --with-shared gets Tidy to compile the .so files and not just .a files.

Once this command was run, vulcan downloaded a tarball containing the files I needed. Woohoo! I added this to my repo as lib/native/libtidy.so and I was ready to rock and roll!

Getting Up and Running

Further experimentation and frustration ensued trying to get everything just right but here's the Ruby code that finally got things working:

require 'tidy_ffi'
if ENV['RACK_ENV'] == 'production'
  TidyFFI.library_path = "/app/lib/native/libtidy.so"
  require 'tidy_ffi/interface'
  require 'tidy_ffi/lib_tidy'
end

Here we set the library path manually and throw in some extra requires that didn't autoload properly for some reason in production. After all that work my app was able to Tidy HTML like a champ!

So Long, Comfort Zone

While I'm comfortable plumbing the depths of any Ruby application I'm not actually well versed in solving problems like this. It was a chance to step outside my comfort zone and figure something out through trial, error, patience, and frustration. Knowing a little about how vulcan works is going to have me feeling more confident the next time I need a native library that isn't available by default on Heroku.

If you want to use Tidy on Heroku, you don't have to go through quite the same ardor that I did because you can download the Heroku-compatible libtidy.so file directly! Just add it to your repo, link it using the Ruby above, and have fun tidying up!