Back to Blog

Get in Touch

Summarize - A Ruby C Binding for Open Text Summarizer

By Sean Soper December 3, 2010 in ruby, tldr

Missing

Despite not signing up for a Rails Rumble team this year, I nevertheless followed the results closely. One project in which I took an early interest was Jeremy McAnally's project, tldr.it. Having always been fascinated by machine parsing of human language, the technology that powered it, Open Text Summarizer, was a real draw for me. Reading through the source code, I realized that this would be a perfect opportunity to combine my resurrected C knowledge with Ruby.

There's a lot going on under the hood but a quick peek shows us that the library first loads up a stemming dictionary based on your language of choice. Parsing a document based on the loaded stem rules creates an OtsArticle, a pre-defined struct which keeps track of a document's statistics such as term frequency and word scores. The parsed result is then fed into a highlighter which returns only a portion of the text based on a passed in ratio, an integer between zero and 100.

The source is on github and installation is a breeze provided you are on a POSIX-compliant system with glib-2.0 and libxml-2.0 installed and properly configured.

gem install summarize

For the sake of convenience, I've made the summarize method available as a public instance method on both String and File.

Soon this gem will replace some of the more complex inner workings of tldr.it. Feel free to contact me with any feedback you might have.

Tags:

ruby tldr
Medium

Sean Soper

A Ruby enthusiast for over five years, Sean brings a decade of software development experience in industries ranging from defense to health care. In addition, he is a registered iPhone developer with several apps to his name. An avid learner, Sean is constantly seeking out interesting languages and solving computer science-related problems.

More posts by Sean Soper

Sean Soper

When building a shared library for an iOS application, we have no choice b...

Sean Soper

Soon to be powering parts of tldr.it, the Summar...

Sean Soper

Tired of file uploading from 1999? Get with the future with

About Us

Intridea is based in Washington, D.C. Most of us live in the DC-MD-VA metro area, though we also have team members in California, Colorado, Kansas, Maine, Minnesota, Missouri, New Hampshire, New York, Pennsylvania, Wisconsin and Wyoming.

Interested in working with us, or have a question?
Feel free to contact us anytime.

© 2013 Intridea, Inc. All Rights Reserved.

Contact Us

DC Office
1020 16th Street NW
7th Floor
Washington, DC 20036
Phone
1-888-968-IDEA (4332)
1-888-968-IDEA (4332)
Email
info@intridea.com
Fax
1-202-280-1472
Twitter
@intridea

Contact Us

DC Office
1020 16th Street NW
7th Floor
Washington, DC 20036
Phone
1-888-968-IDEA (4332)
1-888-968-IDEA (4332)
Get in Touch
Email
info@intridea.com
Fax
1-202-280-1472
Twitter
@intridea

© 2013 Intridea, Inc. All Rights Reserved.

We're Hiring! Directions to office