A new tool for online verification: Google's 'Search by Image'

Google have launched a 'Search by Image' service which allows you to find images by uploading, dragging over, or pasting the URL of an existing image.

The service should be particularly useful to journalists seeking to verify or debunk images they're not sure about.

(For examples where it may have been useful, look no further than this week's Gay Syrian Blogger story, as well as the 'dead' Osama Bin Laden images that so many news outlets fell for)/

TinEye, a website and Firefox plugin, does the same thing - but it will be interesting to see if Google's service is more or less powerful (let me know how you get on with it) Find it here:

http://www.google.com/insidesearch/searchbyimage.html

Video here: http://www.youtube.com/watch?v=t99BfDnBZcI

Secure technically doesn't mean secure legally

The EFF have an interesting investigation into WSJ and Al-Jazeera 'leaks' sites and terms and conditions which suggest users' anonymity is anything but protected: 

"Despite promising anonymity, security and confidentiality, AJTU can “share personally identifiable information in response to a law enforcement agency’s request, or where we believe it is necessary.” SafeHouse’s terms of service reserve the right “to disclose any information about you to law enforcement authorities” without notice, then goes even further, reserving the right to disclose information to any "requesting third party,” not only to comply with the law but also to “protect the property or rights of Dow Jones or any affiliated companies” or to "safeguard the interests of others.” As one commentator put it bluntly, this is “insanely broad.” Neither SafeHouse or AJTU bother telling users how they determine when they'll disclose information, or who's in charge of the decision."

How I hacked my journalism workflow (#jcarn)

I've been meaning to write a post for some time breaking down all the habits and hacks I've acquired over the years - so this month's Carnival of Journalism question on 'Hacking your journalism workflow' gave me the perfect nudge.

 

Picking those habits apart is akin to an act of archaeology. What might on the surface look very complicated is simply the accumulation of small acts over several years. Those acts range from the habits themselves to creating simple shortcuts and automated systems, and learning from experience. So that's how I've broken it down: 

 

1. Shortcuts

 

Shortcuts are such a basic part of my way of working that it's easy to forget they're there: bookmarks in the browser bar, for example. Or using the Chrome browser because its address bar also acts as a search bar for previous pages. 

 

I realise I use Twitter lists as a shortcut of sorts - to zoom in on particular groups of people I'm interested in at a particular time, such as experts in a particular area, or a group of people I'm working with. Likewise, I use folders in Google Reader to periodically check on a particular field - such as data journalism - or group - such as UK journalists.

 

Getting more specific, when it comes to data journalism tasks I rely on a whole range of tools and shortcuts for cleaning and interrogating datasets: the =TRANSPOSE formula, for example, will swap a spreadsheet's rows and columns; =VLOOKUP will copy across data from matching cells; and the free tool Google Refine will quickly identify similar entries (which may have been misspelled).

 

On my desktop I rely on plugins for Firefox and Chrome such as Firebug (check a page's HTML), OutWit Hub (scrape a page), TinEye (check if an image has been used elsewhere), ErrorZilla (check for cached and older versions of a webpage), and Easy YouTube Downloader (download YouTube videos). Links to these and other useful plugins can be found at http://delicious.com/paulb/firefox 

 

But the most frequently used shortcuts are the bookmarklets that are installed on my mobile phone browser - 'Read Later' (Instapaper); 'Bookmark on Delicious'; 'Tweet with Echofon'; 'save on Springpad' or Evernote; and 'Blog on Tumblr'. These are made even more powerful through automation.

 

2. Automation

 

RSS can be a hugely useful technology when it comes to saving time and automating processes - and Delicious is the king of useful RSS feeds in this respect.

 

If I want to tweet a useful link as well as bookmark it, for example, I simply add the tag 't' - the RSS feed for which is automatically tweeted to my account by Twitterfeed. If I want to tweet it using the @helpmeinvestig8 account I add the tag 'hmitwt'. Webpages which I think might be useful to students on the MA in Television and Interactive Content I tag 'tvi' - this not only sends them to the @bcumedia_matvic account but also to an email newsletter that students receive (I use Feedburner for this). If I wanted to I could set up a Tumblr blog to automatically pull items from the RSS feed for a particular tag, too. And all of this is triggered by one click, and one tag.

 

The process works the other way: Packrati.us will bookmark any link you tweet in your Delicious account. And Trunk.ly automatically archives both your Delicious bookmarks and tweeted links, providing a backup search engine.

 

IFTTT (IF This Then That) is a new service which promises some amazing possibilities for automating processes between (currently) 32 different services, including Delicious, Google Reader, stock performances, times and dates, emails, phone calls and any RSS feed. I've been using it to bookmark anything I share on Google Reader, but I'm on the lookout for other uses.

 

For other tasks the Firefox plugin iMacros can automate web-based actions so you don't have to repeat them, while Automator on the Mac will do the same for computer-based actions. For links to these and IFTTT see http://www.delicious.com/paulb/automation+tools

 

3. Habits

 

For all the above it is ultimately up to you to set balls in motion, and here I think establishing habits is key. In particular, bookmarking is one habit that I find saves me more time than anything else. 

 

Every morning I check my RSS feeds and bookmark items I think may be useful in future. Bookmarking and tagging them builds a resource that I can look to whenever I need to solve a problem, help someone, or write something quickly. So if I decide to write something on data visualisation, I already have an archive of pre-filtered material to refer to. If I need data on health, I already have several health datasets that I've bookmarked and tagged. And if I have a Yahoo! Pipes-related problem, I can check my bookmarks first.

 

Delicious is the main place that I do this - but it's no longer the only one. My Tumblr blog is essentially a place where I bookmark multimedia and quotes - so if I need some multimedia or a choice quote, that's where I look first.

 

And blogging itself is a great habit to have: it makes me remember things better, provides a space where I can re-find them, and helps me (or others) identify gaps.

 

4. Discipline

 

The final journalism hack is my most recent one - and I think something that more and more online journalists are learning too as they hit information fatigue. It's self-discipline. 

 

With so many sources of information, so many things to tweet, blog and bookmark, it's easy to lose a morning in following links, tweets and feeds, and replying to emails. Having a clear idea of what you need to achieve on a particular day, and sometimes switching off other signals in order to complete it, is a hard skill to build - but an important one. 

 

And so I try to only check email three times per day (start, midday and end). At the end of the day emails that require more time to respond go into my 'Starred items', and I check those and respond if I can first thing the next day. 

 

I set limits on the time I spend checking RSS feeds, and on the number of blog posts I write.

 

I email longer webpages, reports and documents to my Kindle address to be read when I'm travelling.

 

I use the Springpad app to create 'To Do' items that I schedule for future days, taking them out of my head so I can focus on the here and now. And at the start of every day I go through these so that nothing is missed. 

 

Then, I make time to switch off, to remove the phone from my hand, the laptop from my desk (it is set to switch itself off at a particular time every night), and sleep.

FAQ: Mobile Reporting

Another FAQ:

What good examples of mobile reporting have you seen?

It's hard to say because the fact that it's mobile is not always very visible - but @documentally's work is always interesting. The Telegraph's use of Twitter and Audioboo during its coverage of the royal wedding was well planned, and Paul Lewis at the Guardian uses mobile technology well during his coverage of protests and other events. Generally the reporting of these events - in the UK and in the Arab Spring stories - includes lots of good examples.

Could it become a genuine niche in journalism or just offer an alternative?

Neither really - I just think it's a tool of the job that's particularly useful when you're covering a moving event where you don't have time or resources to drive a big truck there.

Do you think more newspapers and print outlets will embrace the possibilities to use mobile technology to "broadcast"?

Very much so - especially as 3G and wifi coverage expands, mobile phones become more powerful, the distribution infrastructure improves (Twitter etc.) and more journalists see how it can be done.
But broadcast is the wrong word when you're publishing from a situation where a thousand others are doing the same. It needs to be plugged into that.

Do you think the competition that mobile reporting could offer could ever seriously rival traditional broadcast technology?

It already is. The story almost always takes priority over production considerations. We've seen that time and again from the July 7 bombing images to the Arab Spring footage. We'll settle for poor production values as long as we get the story - but we won't settle for a poor story, however beautifully produced.

Have you seen any good examples of how media orgs are encouraging their staff to adopt mobile reporting techniques?

Trinity Mirror bought a truckload of N97s and N98s and laptops for its reporters a couple years back, and encouraged them to go out, and various news organisations are giving reporters iPhones and similar kit - but that's just kit. Trinity Mirror also invested in training, which is also useful, and you can see journalists are able to use the kit well when they need to - but as long as the time and staffing pressures remain few journalists will have the time to get out of the office. 

What are the main limitations that are holding back this sector - are they technological, training related or all in the mind?

Time and staff, and the cultural habits of working to print and broadcast deadlines rather than reporting live from the scene.

What advice would you give to individual journalists thinking of embracing the opportunities mobile reporting offers?

Start simple - Twitter is a good way to get started, from simple text alerts to tweeting images, audio and video. Once you're comfortable with tweeting from a phone, find easy ways to share images, then find a video app like Twitcaster and an audio app like Audioboo. Then it all comes down to being able to spot opportunities on the move.

[cached] Newsgathering IS production IS distribution (Model for a 21st century newsroom pt.1 cont.)

How news is produced in a print- or broadcast-only news operation

[Originally published Feb 9 2009]

Above is an image representing how journalism has traditionally been done:

  1. You went and gathered your information
  2. You put it all together in an attractive package: the article, the broadcast package
  3. And someone else took that to the readers or viewers

That linear process is pretty much redundant online.

See the diagram below. I’ve found myself drawing this so often recently that I thought I should put it online and save some ink.

Newsgathering, production and distribution are often the same thing in an online environment

Newsgathering, production and distribution are often the same thing in an online environment

The point is clear. Thanks to networked technologies – and RSS in particular – there is no reason why newsgathering cannot also be news production, or news distribution. For example:

  • You bookmark something on Delicious (newsgathering). That is published on Delicious, your blog, Twitter, and/or your news website (see Jemima Kiss’sPDA Newsbucket), and distributed via RSS which can be embedded anywhere
  •  
  • You ask a question on Twitter (newsgathering). That is published on Twitter, and distributed via RSS – perhaps as a widget on your blog or Facebook.
  •  
  • You film some raw material on your mobile phone using Qik. It’s published on Qik, with an update posted to Twitter too. The video feed is embedded on your blog or news site, and once again RSS distributes it anywhere you or someone else wants.

I could go on, but here are the implications: 1) a web-savvy journalist or news operation will seek to make as much of their activity visible in this way as possible, adding value to what they do and providing numerous access points for users. It’s for this reason I’m a massive fan of social bookmarking (it also makes it very easy to find things you read previously)

2) Journalism is becoming less polished, more iterative and more networked. Broadcast and print do the ‘finished version’ pretty well – online, we’re often happy with raw information, with the emphasis on ‘raw’.

3) As I’ve said before, the journalist (along with their readers) is now the distributor.You cannot leave that job to someone else. The more active, visible and social you are online, the better for your work both commercially and editorially.

Any thoughts? More examples?

[cached] Model for the 21st century newsroom pt.6: new journalists for new information flows

new journalists for new information

new journalists for new information

Information is changing. The news industry was born in a time of information scarcity – and any understanding of the laws of supply and demand will tell you that that made information valuable.

But the past 30 years have seen that the erosion of that scarcity. Not only have the barriers to publishing,  broadcast and distribution been lowered by desktop publishing, satellite and digital technologies, and the web – but a booming PR industry has grown up to provide these news organisations with ‘cheap’ news.

Information is changing. Increasingly, we are not seeking information out – instead, it finds us. The scarcity is not in information, but in our time to wade through it, make meaning of it, and act on it.

Information is changing, and so journalists must too. In the previous parts of this series I’ve looked at how the news process could change in a multiplatform environmenthow to involve the former audiencewhat can now happen after a story is publishedjournalists and readers as distributors; and new media business models. In this part I want to look at personnel – and how we might move from a generic, hierarchy of ‘reporters’, ‘subs’ and ‘editors’ to a more horizontal structure of roles based on information types.

Quality versus quantity

The strategy of many news organisations so far has been to simply require existing journalists and editors to do more – to make videos and podcasts, take photos and write blogs; to scour social networks and forums and video sites; to encourage user generated content and audience participation. Some have created new positions forcommunity editorsFlash developers and even ‘Data Delivery Editors‘, but those positions are still relatively rare – and the skillsets to do those jobs, even rarer.

I’ve identified 6 journalist roles based on 3 core types of information that I see journalists dealing with in a networked environment. Perhaps you can suggest other roles – or other types of information: This is by no means a complete list.

The 3 types of information:

  • Feeds (RSS) – not just from news sites and blogs, but anywhere.This post on Passive Aggressive Newsgatheringhas more.
  •  
  • Social networks – online and offline. You might have called them ‘contacts’ before, but the online element puts things on a different scale and footing. And here’s why: contacts should now be as likely to seek you out, as vice versa.
  •  
  • Databases - publicly available, accessed through processes such as Freedom of Information requests, and built in-house.

The 6 new journalist roles:

The Aggregator-Sub

In the traditional newsroom, the sub sat between the journalist’s content and the reader. In the 21st century newsroom, this is inverted. In a world of information overload, those subbing skills take on a new role to collect feeds together (aggregating), identify the useful and relevant stuff (filtering), publish it (bookmark-blogging), identify legal issues and verify where necessary.

In other words, what many bloggers have been doing for years in providing a ‘pre-filtered web’ by highlighting the good stuff in their RSS feeds – and for this reason, the Aggregator-Sub may be an existing blogger employed part time or paid a syndication fee (presumably with some training in areas of concern such as law and house style).

The Aggregator-Sub could also perform an important role in the newsroom, highlighting useful leads for other journalists to pursue, or building widgets that present selected aggregations of feeds. A good example is Jemima Kiss’s Newsbucket.

The Mobile Journalist (MoJo)

As news organisations cut the budgets and focused on efficiencies, reporters found it harder and harder to justify time outside the office, becoming increasingly reliant on public relations and official sources in their pursuit of regular, reliable copy.

Ironically, one of the most positive developments of networked technologies is to enable journalists to leave the office while still being connected via mobile phone and 3G/wifi-enabled laptop.

The MoJo, then, is permanently ‘on the road’, Twittering as they go, streaming live video from their phone and posting raw audio from the field. They have a brief to dig out the people and stories that are offline – and give them an online presence. Reuters have experimented with this, as have Gannett, and Trinity Mirror are investing in N96s and wifi laptops for their Midlands reporters. As Chuck Myron says:

“It’s a smarter way of doing business. I’m in the field where stories are happening instead of sitting at my desk, waiting for a phone to ring. I don’t miss important calls, either, since I’ve got a cell phone that’s always in my pocket and not ringing away at my desk while I’m out of earshot at the copier. Technology has made people more mobile, and journalism has to react.”

The Data Miner

The investigative journalist of the 21st century is someone who can work with databases and spreadsheets, picking out interesting patterns, pushing the powerful for data, and having an understanding of the vagaries of statistics. Adrian Holovaty’s ChicagoCrime.org is the godfather of the form, while the New York Times recentlylaunched its own Visualisation Lab. More recent examples include Stephen Grey, Heather Brooke, Louise Acford, and Dominic Casciani.

For an idea of the job spec, here is what the Chicago Tribune was asking of applicants, and here is what the Roanoke Times expected the person to do. For examples of database journalism in action, see my Delicious bookmarks on the topic.

The Multimedia Producer

For all the quality versus quantity arguments, there is nothing inherently wrong with some journalists becoming jacks of all trades (after all, that’s what they have had to be editorially). An understanding of how a story or issue can be explored on a range of media makes a significant difference in how you come up with story ideas and gather information.

The Multimedia Producer has this understanding, and most likely technical skills across audio, video and image production, blogging, using databases, mapping and mashups. They may not do all the work themselves – for example, working with Flash developers on database-driven interactives, or asking a MoJo to get a particular piece of video – but they can see the possibilities.

Here’s a job description from the Roanoke Times (again); another at The Day; and here’s an interview with Regina McCombs of the Star Tribune about her Multimedia Producer role.

The Networked Specialist

This is the specialist reporter for the 21st century: now it’s not just about knowing their subject area, and the big names, but also being visibly networked in that environment, blogging, vlogging, bookmarking and commenting across their specialist parts of the blogosphere.

The successful blogs – Mashable, TechCrunch, Daily Kos, Boing Boing, TPM – are past masters at this: not just reporting on what’s happening, but engaging, passing on, and acting as a crossroads of traffic.

Community Editor

I said earlier that the online element puts community contacts on a different scale and footing. Sources become collaborators, co-writers and distributors, and the Community Editor’s role is to manage that, building communities, helping start or fuel conversations, preventing them turning nasty, supporting users, inviting guidance and help, and assisting them in certain projects.

There are plenty of journalists performing a community editor role, including Shane Richmond at the Telegraph, Joanna Geary at the Birmingham Post and Mail andAndrew Rogers, head of UGC at Reed Business Information. I’ve been conducting a series of interviews asking community editors for their top three lessons.

The obligatory conceptual diagram

new journalists for new information

As you can see, the different roles relate to expertise in different types of information.Databases are used particularly by the Data Miner and the Multimedia Producer;feeds by all except the Data Miner (it’s not essential to what they do but could be fed into it, for example a Google Spreadsheet has an RSS feed); and social networks are important in the work of the Community Editor, Networked Specialist and MoJo.

But as always, this is a work in progress. What unusual jobs have you come across as news orgs move to new media? How is information changing, and how does that affect journalists’ roles?

News distribution in a new media world (A model for the 21st century newsroom pt4) [cached]

The fourth post of the Model for the 21st Century Newsroom looks at how distribution is changing from a push/pull model to a tripartite, push-pull-pass, one.

In the 20th century, commercial distribution of news was relatively straightforward: if you worked in print, you published a newspaper or magazine at a particular time, it was transported to outlets, and people picked it up (or it was delivered). If you worked in broadcast, you broadcast it at a particular time, and people watched or listened.

Simple.

In the 21st century, the picture is a little more complicated.

It’s widely recognised that we are all journalists now, and anyone can be a publisher. But less widely publicised is the fact that, at the same time, and for the same reasons,everyone is a paperboy now.

Perhaps that’s because it’s not quite so glamorous.

Nevertheless, it’s a crucial factor in news production to consider. Whereas traditional news publishing and Fordist production processes separated journalism (newsgathering, newswriting, editing), publishing (printing) and distribution (transportation, postage, broadcasting), those areas are blurred in a new media world because we can perform all three functions with the same action. As an online journalist I can gather and write up information, publish it and distribute it, sometimes with a single click.

This creates two problems:

  • Firstly, an online journalist is not typically trained in distribution. An understanding – or at least, exploration – of distribution, therefore, is needed – becauseeverything they do as a journalist, online, is actually an act of distribution.
  • Secondly, the news organisation typically does not devote the same resources to online distribution as it does to physical distribution – and when it does, it does so in an uneven manner. Organisational distribution (tactical, intentional) should be different to journalistic acts of distribution (incidental).

The distribution model for the 21st century newsroom, then, seeks to explicitly identify the range of distribution networks, before looking at how those affect the other two parts of the production chain: journalism and publishing.

The diagram below illustrates how new media combines the distribution models of print and broadcast – ‘picking up’ and ‘tuning in’ – and adds a third: ‘passing on’. We can cutely abbreviate this to the helpful mnemonic Pull-Push-Pass’. I’m going to deal with examples of each of the three – their strengths and weaknesses – in turn.

Distributed_journalism

Picking up

The printed newspaper is not the only format that can be distributed by people picking it up. An obvious equivalent online is the PDF newspaper, typically executed as nothing more than a download of the newspaper - although some notable examples have tried to do more interesting things with the technology, such as regularly updating them throughout the day, or producing editions for a specific area, such as finance, and incorporating multimedia.

The PDF newspaper appears to combine some advantages of print (portability and embedded ads) with some of those of new media (passing on printing costs to the consumer; the ability to update regularly). But generally it underexploits too many other advantages – and this may be why so few people use them, and so many papers have dropped them. There are better ways to spend your money.

Online, it is not your entire newspaper edition that you are distributing, it is each individual page - from today’s edition, and from every edition in your archive (the long tail). Therefore, every internal link is a piece of distribution, and it should be part of standard practice (or your CMS) to ensure that readers are given links to related stories within your site when they are reading an article (and failing that, outside your site. Don’t worry, they’ll come back).

Likewise, any good distribution strategy relies on being where your readers are (and one of the reasons why traditional newspaper distribution has been failing for some time), so email newsletters, while they now seem old-fashioned, remain a useful part of any distribution strategy – the more specific, the betterMobile updatesshould be even more specific, and relevant to the reader – or they’ll unsubscribe.

RSS is similar to email and mobile updates in many ways – and there are services which will convert an RSS feed to email, which has enormous potential for generating ultra-niche personalised email newsletters. But RSS shouldn’t be seen as a replacement for email: it is a different technology with different users and different uses.

The more RSS feeds being offered, the better. At a basic level, there should be RSS feeds for every traditional section of the paper – sportbusiness news, etc. Drilling down, the reader is likely to want RSS feeds for specific areas, e.g. a particular football teamweather in their postcode. Or particular stories – the latest on Big Brother, orMadeleine McCann.

There should be a feed for every journalist on the paper, and needless to say, every column, blog, podcast and video channel. You might also consider RSS feeds for any search results. And we should be thinking beyond the news – jobs, for instance, and dating, are obvious candidates. Once you’ve published those RSS feeds, others can do interesting things with them – which I deal with below under ‘Passing on’.

Finally, streaming video deserves a mention. It isn’t the best way of distributing content – embeddable video, also dealt with below under ‘Passing on’, is much better if you are selling advertising on the video itself – and probably even if you are selling advertising around it (how many times have you clicked through to YouTube from an embedded video on another site?). Bandwidth is less of a reason to stream, while copyright is not a problem with the Flash-based technology that generally constitutes embedded video. So why do we stream? If it’s too big to download, it’s probably too long to watch comfortably online. So split it up. Unless it’s live, make it downloadable, or embeddable.

Tuning in

New media adds relatively few new ways to ‘tune in’, but there is one obvious one – the homepage – and another less so: the live chat.

The homepage as we knew it is dying, but it still serves a purpose: an at-a-glance overview of content. A place for grazing. Most readers are search-driven and will enter your site through a link to a specific page, but a significant minority will search for the newspaper itself, or click on a link, or a bookmark, to the homepage. When they do, they are ‘tuning in’ to your headlines, your featured stories.

But they could be tuning into a whole range of things. It might be dynamically constructed rather than edited six times a day. It could be personalised like the Amazon ‘Page You Made’, or the Facebook widget. It could be aggregated from a gazillion feeds (but do we really need to do that when RSS readers already do it so much better?). Or do we keep it as a statement of what the editors think are the most compelling pieces of content on that day, for what that’s worth?

Live chats promised much, but haven’t delivered, perhaps because they don’t tap into the asynchronous nature of the web, because they ask too much of readers to ‘be there, now’, because the technology or the audience wasn’t fast or big enough. But perhaps the likes of Second Life will resurrect it. It’s one thing to pose questions to your idols on a text interface – meeting them in virtual person appears to be more attractive.

Passing on

While ‘picking up’ and ‘tuning in’ have been central to the early development of news distribution on the web, ‘passing on’ has become central to its development in the web 2.0, social platform stage. And passing on has the potential to become the primary distribution method of the coming decades.

Of course, people always passed on newspapers, or told friends about a story they just heard on the radio, but digital replicability and networked technologies make the process easier, quicker and – crucially – more measurable for advertisers.

Imagine if every newspaper article had a perforated border and a freepost envelope so you could post it to anyone you wanted. Now keep imagining – you’re going to need lots of ideas…

Social distribution strategies

Some strategies to tackle social distribution are technicalusing embeddable video rather than streaming or Flash, for instance, enables people to put your content on their website, blog, or Facebook page.

Setting up your website to ‘ping’ any pages it links to (linkback) means they will be aware of your existence, and may comment on what you’ve said, driving more content back to your site (be prepared: not all comments will be positive).

Creating widgets that people can include on their blogs or social network pages that publish your content or that say something about their identity as a member of your reader community provides a further opportunity for reaching new audiences, or expanding your brand.

Including a ‘Digg this’ or ‘BlogThis‘ button, or an ‘Email to a friend’ field helps automate and facilitate social distribution via blogs, social bookmarking services, and email (there are a number of off-the-shelf solutions for this, such as MediaFed’s widget, or WordPress plugins such as ShareThis and Gregarious).

Other strategies are cultural: the new distribution landscape needs journalists to engage in communities outside the newspaper by commenting on blogs and forums – not only generating return traffic, but also goodwill and trust.

Making sure your pages are ‘Dugg’ in-house, as Trinity Mirror did, or ‘seeding’ content with influential bloggers ahead of publication, as the Economist did, also demonstrates engagement.

One way to encourage these cultural changes is to recognise the distribution work financially: Gawker Media, for instance, are introducing a bonus system based on how many visits a piece gets. This is not a particularly accurate measure of ‘distribution’, nor of effort made to distribute something (it conflates it with journalistic effort, which isn’t necessarily a bad thing), but it’s an interesting attempt.

Another cultural change would be to share page view and sharing data with staff.

Some strategies are both cultural and technical: linking externally, for example, will be dependent on the organisation’s content management system (CMS) as well as the journalist themselves. Even without linkback, this will make site owners aware of incoming traffic and your own site.

Search engine optimisation (SEO) relies on training journalists and editors in the art of the search engine-friendly headline, but also on systems that generate meaningful URLs, heading tags and metatags, linkbacks, and ‘clean’ code. All will contribute to a healthy ranking on search engines – but remember it’s inbound links (translation:being part of the conversation) that really make the difference.

Crowdsourcing and citizen journalism initiatives require both technological and editorial support, but again have massive potential to generate goodwill and engagement with the publisher.

Likewise, mashups require a culture of openness and collaboration with people outside the organisation, as well as technical availability and openness of RSS, APIs, etc. But the potential benefits in terms of new services, applications, and readers, are enormous.

Social distribution strategies

Implications for journalism and publishing

At this point it’s worth highlighting that in social distribution the journalism itself becomes ever more important, and the newspaper or channel less. I’ll repeat: it is not your entire newspaper edition that you are distributing, it is each individual page, i.e. the story.

In a glass-half-empty world, this could mean more panda videos, but a canny broadcaster or publisher will soon realise that, on that front, they cannot compete with YouTube.

In a glass-half-full world, however, it could mean investigative journalismcrowdsourcing, engagement, transparency. Plus niche publishing, community and utility.

In a nutshell, we are moving from a need for ‘news that sells’ to ‘news that moves’: useful news, distinctive news, specific news, news that we’re involved in.

For that reason, the potential of games for storytelling, for multimedia interactives, and for customisation and personalisation, becomes commercially important.

If there’s a story on the election in every paper, what can you do to bring in visitors? If every paper carries a match report, what makes yours distinctive? In a world of infinite information, where’s your ‘wow’ factor to get people talking?

There are further important implications for commercial publishers. Whereas an ad in a newspaper is viewed by whoever picks it up – whether they paid or not – online, content can, and will, be separated from advertising. Some of that content will have to be treated as part of marketing. Other parts will have to look at ways to incorporate advertising – as has happened with RSS feeds. And for others, it may not be advertising at all that makes the money. For this reason business models will need to change – something I deal with in the final part of this model.

As always, this is a work in progress. Please add your comments, analysis, examples, corrections and caveats and I’ll try to address them.

7 ways to get data out of PDFs

A frequent obstacle in data journalism is when the information you want to analyse is locked away in a PDF. Here are 5 ways to tackle that problem, in order of speed:

1) For simple PDFs: Google Docs' conversion facility

Google Docs recently added a feature that allows you to convert a PDF to a 'Google document' when you upload it. It's pretty powerful, and about the simplest way you can extract information.

It does not work, however, if the PDF was generated by scanning - in other words if it is an image, rather than a document that has been converted to PDF.

2) For scanned documents and pulling out key players: Document Cloud

Document Cloud is a tool for journalists to convert PDFs to text. It will also add 'semantic' information along the way, such as what organisations, people and 'entities' such as dates and locations are mentioned within it, and there are some useful features that allow you to present documents for others to comment on. 

The good news is that it works very well with scanned documents, using Optical Character Recognition (OCR). The bad news is that you need to ask permission to use it, so if you don't work as a professional journalist you may not be able to use it. Still, there's no harm in asking.

3) For scanned documents: The Data Science Toolkit

The Data Science Toolkit allows you to do lots of clever things, including converting PDFs using OCR with the File2Text converter. Upload your document, and you're away. Also works on other document formats, and PNGs, TIFFs and JPEGs.

4) For stripping out tables: PDF2XL

If you're willing to shell out around £70 then PDF2XL is recommended as a useful piece of software for stripping out tables from Excel files. 

5) For automating the process: Scrape from PDF to XML using Scraperwiki

Scraperwiki is a collaborative website for scraping all sorts of hard-to-find information into some sort of useful format, so it's no surprise that PDFs are a common problem there. They have a template scraper for converting PDF documents to XML (a more structured format) - if you can understand a little bit of programming then you can try to adapt it to your own purposes.

6) If it's held by a public body and you have time: a well-written FOI request

Do you need all the data in the PDF or just some? Is that data available elsewhere? Try an advanced search using a phrase from the data in quotes and adding filetype:xls to see if you can find the spreadsheet it comes from. Or submit an FOI request for the data stipulating that it be provided in spreadsheet or CSV (comma separated values) format (if the PDF was supplied in response to an FOI request in the first place, go back and ask for the information to be provided in spreadsheet or CSV (comma separated values) format). 

It's a good idea to also ask how the information is stored, including any software used, as you can check with the software vendor how easily the information can be extracted and bat away any excuses the body may come back at you with.

7) Add your own here

There must be others - tell me your own tips.