Do you read more about politics or economics? What are the top websites you recently visited to read articles? How do you plan your casual reading?
My team (Katie Zhu, Basil Huang, and Amelia Kaufman) and I wondered about these questions and more at the start of Knight Lab's "Collaborative Innovation and Journalism in Technology" course in spring of 2013. The initial idea we riffed on was about managing personal news consumption. We got to pick our approach and tools and were told to go wild... and we built an app! (graciously maintained by Knight Lab)
Slimformation is a tool designed to help you track your reading habits and even set goals and track how well you meet them. There's a short video on Vimeo where I give a demo. Also, Katie (@ktzhu) wrote about this project for the Knight Lab blog and describes the motivations and details very well.
I thought I'd speak to the engineering challenges we faced.
Our weapons of choice: Chaplin, Brunch, Chrome's extension API, some cowboy CoffeeScript, a dash of Clojure (news article categorization! we rolled our own!), and not a small amount of grit.
manifest.json file. We have two html files of concern:
The extension communicates with Chrome's extension API and
newscat (our categorizer) in a process that runs in the context of the background page. Here's the background view and template, and the popup views and templates, for reference.
We have the familiar concepts of controllers and models and routes at play. As always, the routes help uncover the overall structure of the app. Just follow the routes as you would in a Rails-like app.
A custom concept I introduced is services, building on existing patterns in Chaplin. Here's all the services we have. Look at that news categorization service, for instance. It's describing the function it will call whenever
add:PageVisit is published. The other services work like this too. Anywhere in the code you can publish and subsribe to events and then execute code. This was absolutely essential to keep track of all of our distributed resources and keep things clean.
- The standard choice for doing categorization of news articles was to use Alchemy API. But you have to jump through hoops and sacrifice a lot for the free plan. Forget that noise! I built a news article categorizer for us in Clojure, using the OpenNLP Java library (same one I used for Sentimental). It's right here. I used Reddit to train the categorizer, grabbing the top articles from
/r/Businessand so on for
["Politics", "Business", "Science", "Technology", "Entertainment", "Sports"]. And that's about it... a few easy to write functions and I was done, and could host the whole thing on Heroku and call it a freaking day.
- The publisher/subscriber model in Chaplin is very nice.
You can visit the full open source project here and Knight Lab's version of one of the repos here.