Fun with Ruby: Get All Nancy Drew on Chrome

I use the Chrome history tab when I forget about something I've looked up in the past. I initially thought that the data would be stored in a CSV or XML file and thought I could do some string munging for kicks and giggles. To my delight, when I looked in the "Application Support" directory for Chrome, I found several data-rich sqlite databases ready for mining. With a few Ruby tricks, I found some cool data. All the code this article covers is available on the chrome_spy project.

With chrome_spy, you can answer some of these queries:

  • what have I searched for recently?
  • which sites do I visit most frequently?
  • what urls do I type into the address bar frequently?
  • what have I downloaded and how big are the files?

And that's only the surface. Let's dive straight into the technical details.

Since the data is stored in sqlite, I decided to use trusty old ActiveRecord to wrap around the tables:

    require 'sqlite3'
    require 'active_record'

    ActiveRecord::Base.establish_connection({
      :adapter => "sqlite3",
      :database => File.expand_path("~/Library/Application Support/Google/Chrome/Default/History")
    })

I've been using Chrome for a while now, so there were multiple archived 'History' databases (e.g. History Index 2010-07), but the most recent database is named 'History'. Rather than inspecting the schema on each of the tables, ActiveRecord has a module SchemaDumper that generates a very readable schema dump:

    puts ActiveRecord::SchemaDumper.dump

From here, it's pretty straightforward to map the tables to ActiveRecord models. For each 'create_table' declaration, I declared a new model. For example, to define a model for urls:

    class Url < ActiveRecord::Base
    end

At this point, you should be able to query for Urls through the models:

    > Url.first
     => # 
    > Url.where(:title => "Google").count
     => 21 
    > Url.first.last_visit_time
     => 12940646404770210 

Everything was looking peachy up until the last_visit_time. At first I thought it was an epoch timestamp, or a JS timestamp. After looking at a few other timestamps, I noticed that it's 17 digits long rather than the usual 10 digits. The frustrating part was that some fields used epoch timestamps, but other fields would use this 17-digit timestamp. I wrote a little helper module to clean up the typecast from these columns:

    module TimestampAccessors
      def timestamp_accessors(*attributes)
        attributes.each do |attr|
          name = attr.to_s

          # Some timestamps have 17 digits
          # Since 10000000000 is year 2286, so I'm assuming that no dates are longer
          # than 10 digits
          define_method(name) {
            raw = read_attribute(name).to_s.slice(0, 10)
            Time.at(raw.to_i)
          }

          define_method(name+'=') { |t|
            write_attribute(name, t.to_i)
          }
        end
      end
    end
    ActiveRecord::Base.extend(TimestampAccessors)

Then for every table that has timestamp columns, I can declare them in the model. For example:

    class Url < ActiveRecord::Base
      timestamp_accessors :last_visit_time
    end

Let's retry that same query from before:

    > Url.first.last_visit_time
     => Mon Jan 03 06:24:00 -0800 2011

That's much better.

Some table names and column names don't follow Rails conventions, so there's a little extra work to specify associations and some tables. For example, the SegmentUsage model is backed by the 'segment_usage' table rather than 'segment_usages':

    class SegmentUsage < ActiveRecord::Base
      set_table_name "segment_usage"
    end

Another example is when Visit uses 'url' as a foreign key to Url rather than 'url_id':

    class Visit < ActiveRecord::Base
      belongs_to :url, :foreign_key => 'url'
    end

With just these thin wrappers, we can easily come up with queries to answer the questions at the beginning of this article. To find the most recent searches, we can do:

    > ChromeSpy.recent_searches
     => ["intridea", "thegist", "dimsum icons", ...]

The definition for this method is just a simple ActiveRecord query:

    def recent_searches
      KeywordSearchTerm.includes('url').order('urls.last_visit_time desc').map(&:term)
    end

At this point, you should take a break and share your search history with a friend. Context-free search terms can be hilarious and embarrassing. A few my girlfriend could not stop laughing at:

  • super meat boy
  • exercise mix tapes
  • huntsville bed intruder
  • factory girl
  • haml (she thought I repeatedly misspelled 'ham')

To answer the other questions at the beginning of the article:

Which sites do I visit most frequently?

  ChromeSpy.most_frequent_sites

What urls do I type into the address bar frequently?

  ChromeSpy.most_frequently_typed_addresses

What have I downloaded and how big are the files?

  ChromeSpy.recent_downloads

While ChromeSpy may not be the most useful example, it shows how ActiveRecord can be applied outside of Rails. A similar thought process can be reused for other problems where quick data manipulations or reporting is needed. Whether those reports are useful, or just plain silly is entirely up to you. Now go forth and find some funny recent searches you've done!