Please join us from 2:00 to 3:00 pm ET, Wednesday, January 16 for a live discussion with Scott Klein. Submit your questions in advance by clicking 'make a comment' or emailing firstname.lastname@example.org. All questions are moderated, so please be polite.
Hi folks! We'll be kicking off at 2:00pm EST. If you have a question, just click 'Make a comment' above and send it our way. We'll be posting them for Scott to answer.
Scott has joined the chat and will begin answering questions momentarily. To submit yours, select ‘Make a comment’ or email email@example.com.
Hi everybody! I'm Scott Klein. I'm the Editor of News Applications at ProPublica (I've been promoted, actually, now I'm "Senior" Editor of News Applications, which probably means as much as you think it does).
Scott, our first question is from me and it builds off of the interview I linked earlier. Back in 2011, you said we were still in ‘the infancy of news apps.’ A few years later, how has the landscape changed?
While Scott’s typing, why not take a moment to send us your questions for him? Just click 'make a comment' above and send it our way.
Every year seems to bring new changes. This past year we really levelled up in how we use some of the more sophisticated techniques from computer science and math like Natural Language Processing, Machine Learning, Decision Trees and some other "Big Data" analytics techniques.
There have also been big leaps in adoption of technologies like Google Fusion Tables, which let newsrooms without dedicated developers to try out mapping and other kinds of sophisticated interactives.
Lots of new stuff is still coming out -- and I bet in 2015 we'll look back at what we did in 2012 as relatively rudimentary.
It's pretty much the same as on other desks. It's a mix of our own enterprise and great ideas brought in from around the newsroom. One difference (and it's pretty slight) is that we spend as much time as we can making sure the data to build our app is actually available and in good enough shape to build an app out of. I have to admit we need to get better at that part. We've gotten into projects thinking the data was readily available but in actuality cleaning and fixing the data ended up taking months.
We're lucky at ProPublica to have few incumbent beats we've got to cover, which does leave us time to take up really ambitious stuff.
It's got what I like to see in a news application -- a way to see the "far" view (in other words, the big national picture) and the "near" view (how the big national phenomenon relates to me personally). It's a really hard story to get. The data is really dispersed. To be honest I'd pitched this to ProPublica and we've been gathering string on it. I'm totally envious, but the Times did a great great job on it.
Sure. The quick upshot, btw, is that I founded DocumentCloud with Aron Pilhofer at the NY Times and my colleague Eric Umansky here at ProPublica. So: before ProPublica actually started publishing we did a barnstorming lunch tour of lots of newsrooms in NY and DC. We met with Aron Pilhofer, who was then just starting to publish projects out of his new INT team at the Times.
He told us about a project they were doing to help people search and display then-candidate Hillary Clinton's travel schedule (from when she was First Lady). We looked at the project and loved it and asked Aron if he thought he might give us the code, as we thought documents were going to be a big part of what we'd do and therefore a nice document reader was key.
Out of those conversations we realized that their document reader was more than just applicable to ProPublica and that everybody could use it to "level up" how they present documents, and further than that, we could use it as a "trojan horse" to get newsrooms to share their documents much more. The Knight Foundation very generously supported the idea and the rest is history.
There are! I'm actually no longer in charge of DocumentCloud -- we handed it off to IRE to give it a safe long-term home -- so for details you should reach out to IRE directly. Probably a good address to start with is firstname.lastname@example.org. Real humans check that, I know for a fact.
But public annotations are an active and ongoing effort...
It's a funny thing -- news organizations don't normally have to deal with the idea of durable objects. The typical news story is old news the next day. A news application can stay relevant almost indefinitely. We get more traffic on Dollars for Docs almost every day than on any other story. Our dialysis data, for instance, gets updated every year and our nursing homes data gets updated every month.
So to answer your question, there are essentially three states to the post-publishing lifecycle of an app. The first is an app that's hard to update but wildly popular or really such a great public service that we feel duty-bound to keep it up to date. We spend the time it takes to keep them up to date. The second is an app that's easy to update and still newsworthy so we build ourselves easy tools to update them with a minimum of human effort. The third is an app that's not that newsworthy anymore so we put up a notw that says when we stopped updating it -- but we still have to be ready to make any data corrections that come in. For instance, our app about homes with Chinese Drywall still gets people writing in to say they've mitigated the problem and asking us to take their home off the list, which we try to be responsive to.
A brief detour: We apply all the normal rules of the newsroom to our work. So: We have real bylines that are right under the headline, and we correct on the same page just like other reporters here do. And yes, that means if we make a mistake that affects 10,000 pages, our correction runs on 10,000 pages. And it probably feels 10,000 times as crummy.
The CAR world has really been working on this problem of how to make sure you're right for decades. And the answer is not hard to predict: check your work. Ask people who are smarter than you to double-check your understanding of a given data set or technique. Then ask people who aren't you to spot check your numbers to make sure you don't have a bug in your code that is putting Wisconsin's results in Minnesota, etc. Jennifer LaFleur, our CAR director, taught us to take statistically valid samples of our data and check them all against the original sources, and to take subtotals and make sure they add up in both the source material and in our apps.
A quick programming note: The window for questions is closing soon, so please submit yours if you haven't already.
Even if you're not going to write code that gets put in production, there are few beats that won't be affected by the influx of data. If you want to be self-reliant in covering your beat, it's a good bet that knowing how to code will help. It doesn't matter if that's Ruby or Python or something like R, an open source stats package. You want to be able to take a data set from a source and munge it yourself, and you want to be able to scrape a website. So, you should consider learning to code today -- start with a project you want to do and then take the basic tutorials for each language (start with Python and Ruby but there are others) and stick with the one that you like best and that matches your brain waves best. You could start by scraping a website (that will teach you a ton) or by playing with a big data set from data.gov. Look for stories! It's not about the code, it's about your story.