Feed aggregator
Do Road Improvements *Really* Create Jobs?
Lib Dems in Government have allocated £300,000 to fund the M20 Junctions 6 to 7 improvement, Maidstone, helping to reduce journey times and create 10,400 new jobs. Really? 10,400 new jobs?
In Critiquing Data Stories: Working LibDems Job Creation Data Map with OpenRefine I had a little poke around some of the data that was used to power a map on a Lib Dems’ website, A Million Jobs:
Liberal Democrats have helped businesses create over 1 million new private sector jobs. Click on the map below to find out what we’ve done where you live.
And then there was the map…
One thing we might take away from this as an assumption is that the markers correspond to locations or environs where jobs were created, and that by adding up the number of jobs created at those locations, we would get to a number over a million.
Whilst I was poking through the data that powers the map, I started to think this might be an unwarranted assumption. I also started to wonder about how the “a million jobs” figure was actually calculated?
Using a recipe described in the Critiquing Data Stories post, I pulled out marker descriptions containing the phrase “helping to reduce journey” along with the number of jobs created (?!) associated with those claims, where a number was specified.
Claims were along the lines of:
Summary: Lib Dems in Government have allocated £2,600,000 to fund the A38 Markeaton improvements , helping to reduce journey times and create 12,300 new jobs. The project will also help build 3,300 new homes.
Note that as well as claims about jobs, we can also pull out claims about homes.
If we use OpenRefine’s Custom Tabular Exporter to upload the data to a Google spreadsheet (here) we can use the Google Spreadsheet-as-a-database query tool (as described in Asking Questions of Data – Garment Factories Data Expedition) to sum the total number of jobs “created” by road improvements (from the OpenRefine treatment, I had observed the rows were all distinct – the count of each text facet was 1).
The sum of jobs “created”? 468, 184. A corresponding sum for the number of homes gives 203,976.
Looking at the refrain through the descriptions, we also notice that the claim is along the lines of: “Lib Dems in Government have allocated £X to fund [road improvement] helping to reduce journey times and create Y new jobs. The project will also help build Z new homes.” Has allocated. So it’s not been spent yet? [T]o create X new jobs. So they haven’t been created yet? And if those jobs are the result of other schemes made possible by road improvements, numbers will be double counted? [W]ill also help build So the home haven’t been built yet, but may well be being claimed as achievements elsewhere?
Note that the numbers I calculated are lower bounds, based on scheme descriptions that contained the specified search phrase and (“helping to reduce journey”) and a job numbers specified according to the pattern detected by the following Jython regular expression:
import re
tmp=value
tmp=re.sub(r'.* creat(e|ing) ([0-9,\.]*) new jobs.*',r'\2',tmp)
if value==tmp:tmp=''
tmp=tmp.replace(',','')
return tmp
In addition, the housing numbers were extracted only from rows where a number of jobs was identified by that regular expression, and where they were described in a way that could be extracted using the following the Jython regular expression re.sub(r'.* The project will also help build ([0-9,\.]*) new homes.*',r'\1',tmp)
PS I’m reading The Smartest Guys in the Room at the moment, learning about the double counting and accounting creativity employed by Enron, and how confusing publicly reported figures often went unchallenged…
It also makes me wonder about phrases like “up to” providing numbers that are then used when calculating totals?
So there’s another phrase to look for, maybe? have agreed a new ‘City Deal’ with …
Critiquing Data Stories: Working LibDems Job Creation Data Map with OpenRefine
As well as creating data stories, should the role of a data journalist be to critique data stories put out by governments, companies, and political parties?
Via a tweet yesterday I saw a link to a data powered map from the Lib Dems (A Million Jobs), which claimed to illustrate how, through a variety of schemes, they had contributed to the creation of a million private sector jobs across the UK. Markers presumably identify where the jobs were created, and a text description pop up provides information about the corresponding scheme or initiative.
If we view source on the page, we can see where the map – and maybe the data being used to power it, comes from…
Ah ha – it’s an embedded map from a Google Fusion Table…
We can view the table itself by grabbing the key – 1whG2X7lpAT5_nfAfuRPUc146f0RVOpETXOwB8sQ – and poppiing it into a standard URL (grabbed from viewing another Fusion Table within Fusion Tables itself) of the form:
https://www.google.com/fusiontables/DataSource?docid=1whG2X7lpAT5_nfAfuRPUc146f0RVOpETXOwB8sQ
The description data is curtailed, but we can see the full description on the card view:
Unfortunately, downloads of the data have been disabled, but with a tiny bit of thought we can easily come up with a tractable, if crude, way of getting the data… You may be able to work out how when you see what it looks like when I load it into OpenRefine.
This repeating pattern of rows is one that we might often encounter in data sets pulled from reports or things like PDF documents. To be able to usefully work with this data, it would be far easier if it was arranged by column, with the groups-of-three row records arranged instead as a single row spread across three columns.
Looking through the OpenRefine column tools menu, we find a transpose tool that looks as if it may help with that:
And as if by magic, we have recreated a workable table:-)
If we generate a text facet on the descriptions, we can look to see how many markers map onto the same description (presumably, the same scheme?
If we peer a bit more closely, we see that some of the numbers relating to job site locations as referred to in the description don’t seem to tally with the number of markers? So what do the markers represent, and how do they relate to the descriptions? And furthermore – what do the actual postcodes relate to? And where are the links to formal descriptions of the schemes referred to?
What this “example” of data journalistic practice by the Lib Dems shows is how it can generate a whole wealth of additional questions, both from a critical reading just of the data itself, (for example, trying to match mentions of job locations with the number of markers on the map or rows referring to that scheme in the table), as we all question that lead on from the data – where can we find more details about the local cycling and green travel scheme that was awarded £590,000, for example?
Using similar text processing techniques to those described in Analysing UK Lobbying Data Using OpenRefine, we can also start trying to pull out some more detail from the data. For example, by observation we notice that the phrase Summary: Lib Dems in Government have given a £ starts many of the descriptions:
Using a regular expression, we can pull out the amounts that are referred to in this way and create a new column containing these values:
import re
tmp=value
tmp = re.sub(r'Summary: Lib Dems in Government have given a £([0-9,\.]*).*', r'\1', tmp)
if value==tmp: tmp=''
tmp = tmp.replace(',','')
return tmp
Note that there may be other text conventions describing amounts awarded that we could also try to extract as part of thes column creation.
If we cast these values to a number:
we can then use a numeric facet to help us explore the amounts.
In this case, we notice that there weren’t that many distinct factors containing the text construction we parsed, so we may need to do a little more work there to see what else we can extract. For example:
- Summary: Lib Dems in Government have secured a £73,000 grant for …
- Summary: Lib Dems in Government have secured a share of a £23,000,000 grant for … – we might not want to pull this into a “full value” column if they only got a share of the grant?
- Summary: Lib Dems in Government have given local business AJ Woods Engineering Ltd a £850,000 grant …
- Summary: Lib Dems in Government have given £982,000 to …
Here’s an improved regular expression for parsing out some more of these amounts:
import re
tmp=value
tmp=re.sub(r'Summary: Lib Dems in Government have given (a )?£([0-9,\.]*).*',r'\2',tmp)
tmp=re.sub(r'Summary: Lib Dems in Government have secured a ([0-9,\.]*).*',r'\1',tmp)
tmp=re.sub(r'Summary: Lib Dems in Government have given ([^a]).* a £([0-9,\.]*) grant.*',r'\2',tmp)
if value==tmp:tmp=''
tmp=tmp.replace(',','')
return tmp
So now we can start to identify some of the bigger grants…
More to add? eg around:
- ...have secured a £150,000 grant...
- Summary: Lib Dems have given a £1,571,000 grant...
- Summary: Lib Dems in Government are giving £10,000,000 to... (though maybe this should go in an ‘are giving’ column, rather than ‘have given’, cf. “will give” also…?)
- Here’s another for a ‘possible spend’ column? Summary: Lib Dems in Government have allocated £300,000 to...
Note: once you start poking around at these descriptions, you find a wealth of things like: “Summary: Lib Dems in Government have allocated £300,000 to fund the M20 Junctions 6 to 7 improvement, Maidstone , helping to reduce journey times and create 10,400 new jobs. The project will also help build 8,400 new homes.” Leading to ask the question: how many of the “one million jobs” arise from improvements to road junctions…?
In order to address this question, we might to start have a go at pulling out the number of jobs that it is claimed various schemes will create, as this column generator starts to explore:
import re
tmp=value
tmp = re.sub(r'.* creat(e|ing) ([0-9,\.]*) jobs.*', r'\2', tmp)
if value==tmp:tmp=''
tmp=tmp.replace(',','')
return tmp
If we start to think analytically about the text, we start to see there may be other structures we can attack… For example:
- £23,000,000 grant for local business ADS Group. … – here we might be able to pull out what an amount was awarded for, or to whom it was given.
- £950,000 to local business/project A45 Northampton to Daventry Development Link – Interim Solution A45/A5 Weedon Crossroad Improvements to improve local infastructure, creating jobs and growth – here we not only have the recipient but also the reason for the grant
But that’s for another day…
If you want to play with the data yourself, you can find it here.
Recent Robotics Reviews on OpenLearn…
A few years ago I worked on an OU robotics course ambitiously titled “Robotics and the Meaning of Life” (the working title had been “Joy, Fun, Robotics”), elements of which have been woven into a new OU course Technologies in practice (hmm, thinks – would folk be interested in a course on data in practice?)
Take Our PollAs well as providing a general introduction to robotics technology, the course reviewed a range of social, political and ethical issues that might impact on a society in which mobile, intelligent, autonomous machines were part of our everyday experience. As part of our current co-pro series of the BBC World Service Click radio programme, we’ve been exploring some of the issues associated with recent developments in robotic vehicles. This has also provided an opportunity for me to start scouting around some of the emerging laws that are being considered with a view to regulating the operation -and behaviour – of autonomous intelligent robots. So here’s a quick round up of some of the related articles that I’ve recently posted to OpenLearn…
- A dark future for warehousing? – robots are playing an increasingly important role in the logistics industry, with robot workers increasingly finding a role in warehouses. This post reviews several different ways in which robots can work with – and instead of – human workers in today’s modern warehouses.
- Robot cars, part 1: Parking the future for now – the DARPA robot vehicle challenges demonstrated how autonomous robot vehicles could cope with off-road and urban driving conditions, leading in part to the development of things like the Google autonomous car that is currently being tested on public roads in several US states. Whilst the mass availability of such vehicles is still only a remote possibility for a variety of reasons (from cost and safety issues, to legal and ethical considerations), autonomous driving in certain limited situations is now possible.In this post, we look at one such situation, disliked by many a driver – parking – and see how our cars may soon be managing that aspect of driving on our behalf in the near future.
- Robot cars, part 2: Convoys of the near future – along with the fiddliness of parking, the monotony of stop-start traffic jams and convoy style motorway driving provide another environment in which autopilot systems may be able to improve not only the driving experience, but also road safety. In this post, I review some recent demonstrations in autonomous driver support systems suited to these particular road conditions.
- Naughty robot: Where’s your human operator? – a wealth of regulations at international, national and even regional (state) level cover the operation of our public highways and public airspace. But when the robots start taking control of their own actions and decision-making in these spaces, do we need further regulation to limit the behaviour of robots as distinct from humans? And when it comes to allowing autonomous robots to bear arms, is that a situation we are comfortable with? In this post, I review some of the emerging laws that are developing around not only the testing and use of autonomous robot cars on our public highways, but also in consideration of autonomous flying vehicles – drones – in both domestic and military settings. in part, this sets up the question – will there be one law for humans and other for robots?
Hear the latest episode of Click radio here: #BBCClickRadio, or keep track of the OU supported special editions via OpenLearn: OU on the BBC: Click – A Route 66 of the future
Sony Wins E3, and Possibly the Next Generation of Gaming
As the PS3, Xbox 360, and Nintendo Wii reached the end of their life cycles, the big three video game companies were certainly scrambling to find what innovation would win the next generation of consoles. And who could have guessed that it might be, well, not really changing anything? While the Wii U was released november last year, it has suffered from a small library of games, leaving Microsoft and Sony gearing up for a showdown this June at E3, one of the largest industry expositions, at which they were both expected to reveal their new consoles. (This technically makes the 8th generation of consoles, but we already just started saying “next-gen” last generation so I’m not sure where that leaves us.)

After a lackluster announcement in February in which Sony told us almost nothing about the Playstation 4 – they didn’t even show the console itself – MIcrosoft seemed poised for the kill. But they decided to pre-empt E3, announcing the Xbox One in a press conference in late May, and was immediately panned by critics. While there are some neat features, like integration with Microsoft Smartglass, many fans balked at the new Xbox’s supposed restrictions on used games and game trading – there may or may not be fees associated, zealous digital rights management by requiring the Xbox One to connect to the internet every 24 hours, a high price point at 499$, and no apparent innovations except for a built in Kinect, which would always be on – leading some to voice privacy concerns over having an HD camera with a direct link to microsoft in their living rooms. All of which left the internets calling the thing the “Xbone” and wondering whether Microsoft took one step forward or 359 backwards. Then, yesterday at E3, Sony does this:

That’s one of what fans are calling Sony’s “FU Microsoft” slides. Apparently put together at the last minute in Powerpoint, this part of Sony’s press conference won cheers and thunderous applause – announcing no more than features that everyone had already had in the last generation. Meanwhile, Sony released a video on youtube called the “Official Playstation Used Game Instructional Video” bashing Microsoft’s restrictions by showing how to share games on playstation: hand the game to somebody. Plus, they talked about a few cool features like full integration with the PS Vita and the ability for indie developers to self-publish. And perhaps the best part? The PS4, while featuring basically the same hardware as the Xbox One (minus the Kinect), it’ll be exactly 100$ less at 399$.
Meanwhile, Microsoft brought a press conference to E3 that was “all about the games”. Now, if you’ve just announced a new console and you make your press conference “all about the games” something is clearly amiss. Announcing a new Halo (surprise, Master Chief is back… again) and a slew of other titles, they avoided talking about the new console – probably scared that they’d find some other way to alienate fans. Redditor lolmycat summed up his feelings about Microsoft and the Chief:

So the videogame press and nerds everywhere are hailing Sony as the winners of E3 and the leaders going into the next generation, mostly because they didn’t do anything but wait for their competition to shoot itself in the foot. Microsoft took away features from the Xbox One that gamers expect, Sony didn’t, and priced their console cheaper. More features + cheaper price is a pretty simple marketing win. To recap, as redditor Shadow8P put it: Microsoft has Halo, while Sony has offline play, used games, better hardware and a lower price. The final score? Xbox: one, Playstation: four.
Prism, Communications Metadata and Traffic Analysis
From the glimpses I’ve seen of it over the last few days, the news appears to have been dominated with talk about a US government surveillance operation referred to as “Prism”. I don’t really have much idea what Prism is, or does, nor do I suspect do most of the folk who’ve been wittering on about it. It partly reminded me of Glimmerglass, but there again, I don’t really know what that tech does…; it also made me ponder the extent to which, if there are surveillance taps built in to various systems, they can be co-opted and subverted. As a code word, however, Prism sounds like it could be suitably sinister, although perhaps not quite at the level of “SPECTRE” or “Quantum”, so it’s a great opportunity for the press to play at spooks.
One thing I have noticed is that the reporting has also started referring to the notion of metadata. For example, the Guardian/Observer mention it thus (Boundless Informant: the NSA’s secret tool to track global surveillance data):
The focus of the internal NSA tool is on counting and categorizing the records of communications, known as metadata, rather than the content of an email or instant message.
In the case of email, this could include sender and recipient information, as well as the message timestamp, and maybe data about the size of the email, whether there were any attachments, and so on. For web transactions, the time you viewed a page and the address of that page would count as metadata about that transaction.
One thing I haven’t seen mention of is the signals intelligence (SIGINT) technique known as traffic analysis. In an article on The Origination and Evolution of Radio Traffic Analysis: World War II, a definition of “traffic analysis” from another report is presented as follows:
Traffic analysis comprises the study of enemy communications for the purpose of gathering information of military value without recourse to cryptanalysis of the text of intercepted messages. From such studies a certain amount of special intelligence of a tactical and strategical nature with regard to the enemy order of battle. direction of movements, massing of troops, probable intention, withdrawals. etc., can be derived. In addition … a large amount of technical intelligence valuable to the intercept and cryptanalytic functions of the Signal Security Service is obtained. In general, the technical information obtained from such studies, when applied to global intercept and cryptanalytic problems, must be derived from a global analysis of traffic. For the proper functioning of units collecting data upon which such studies will be based, their administrative control also must parallel the administrative direction of global intercept and cryptanalytic functions.
The local commander can obtain considerable benefit from the results of traffic analysis as regards special tactical and strategical intelligence derived therefrom, because such special intelligence is based primarily upon enemy communications in close proximity to his sphere of activity …
While it is not so far reaching in consequence as that which might be obtained from a successful cryptanalytic study of a high grade enemy cryptographic system, the results may sometimes be available instantaneously, and are subject only to proper interpretation on the part ofthe local staffand prompt coordination ofthe pertinent data bythe central agency.
The focus of traffic analysis is, therefore, an analysis of the metadata associated with a set of communications, rather than an analysis of the actual content of those communications. Traffic analysis (and social network analysis) is one of the reasons why it be useful in intelligence terms to collect metadata around communications.
For some worked examples around traffic analysis, see for example:
- Traffic Analysis of Anonymity Systems – includes a review of how folk might still be able to tell what web pages you’re visiting even if you use an anonymising proxy;
- Exploration of Communication Networks from the Enron Email Corpus
- Some introductory ideas about Inferring Social Network Structure using Mobile Phone Data and a more worked up example: Forensic Analysis of Phone Call Networks
And so on…
Mind Controlled Flying Robots
Yeah, you read that right. Researchers at the University of Minnesota have unveiled their latest project: a non-invasive system which allows a user to pilot a small, commercially available UAV using their thoughts. A student researcher pilots the quadrocopter through seeming telekinesis in the video that the team released earlier today:
The team at University of Minnesota, which published their findings from this project in The Journal of Neural Engineering, have been working on computer-neural interfaces for some time and previously developed a system for identifying what patterns of brain activity correspond to certain imaginary movements, like imagining making a fist with your right hand. An EEG (electroencephalogram) cap can detect and identify that thought. The next step was to develop a simple program to translate that to computer input – at first, this meant moving the paddle in a version of “Pong” up and down. Now, they’ve developed a system that can be used to pilot a drone in 3D space. While of course there’s exciting potential for this technology to be used to help amputees and wheel chair bound persons, there’s tons of other interesting applications for drone technology, like, you know, delivering pizza.
Spring 2013 Student Projects
Here are some of the Media Projects done this past Spring.
Beautifully Fragile, a film by Jane Luceno A Hero’s Best Friend (Odyssey 17.290-304), reading by Lucy McInerneyhttp://blogs.dickinson.edu/homer/files/2…
Part of an on going project created during the class Greek 112: Introduction to Greek Poetry, taught by Christopher Francese that consist of a passage from Homer’s Iliad discussed, translated into English, and then recited in Greek.
Global EconomyMichael Fratantuono’s class create mini video lectures on current global economy topics.
The Keystone XL Pipeline, by Brooke Watson, Christine Gannon, Mike Hughes, and Eleonora Vaccori Qatar 2030 Vision, by Rogelio Cerezo, Abby Glascott, Chloe (Ruijiao) Ma, Danette Moore Megacities: A New Perspective, by Steven Haynes, Mike Adams, and Mike DeVivo Digital ImagingFinal Projects for Todd Arsenault’s Digital Imaging course
Kexin Shu Kalie GarrettPlan X: A Strange Fusion of Cyberwarfare & Gaming
Hearing that the pentagon is teaming up with DARPA to develop a cyber weapons platform that creates a well defined and polished cyber-warfare platform may not be surprising, but the idea that they’re comparing the graphical interface for this platform to World of Warcraft or Angry Birds should at the very least throw you for a loop. But that’s exactly what the new Plan X that DARPA is working on intends to do, it will blend Cyberwarfare and easy to use gaming interfaces to create a platform a non tech-savvy general could still use to carry out an advanced cyberattack. Underneath the GUI is a complex and well-coded system that should more efficiently enable the U.S. to map and deal out cyber-attacks when they deem appropriate against the networks of hostile groups, but the actual interface that the commanding officer will use may remind you more of a Starcraft II interface than some complex Matrix-looking code screen. For more details check out this article or Google Plan X to get some of the details on this whacky new project.
http://www.cbsnews.com/8301-205_162-57586495/darpas-plan-x-looks-to-make-an-app-for-cyberwarfare/
So what is a data journalist exactly? A view from the job ads…
A quick snapshot of how the data journalism scene is evolving at the moment based on job ads over the last few months…
Via mediauk, I’m not sure when this post for a Junior Data Journalist, Trinity Mirror Regionals (Manchester) was advertised (maybe it was for its new digital journalism unit?)? Here’s what they were looking for:
Trinity Mirror is seeking to recruit a junior data journalist to join its new data journalism unit.
Based in Manchester, the successful applicant will join a small team committed to using data to produce compelling and original content for its website and print products.
You will be expected to combine a high degree of technical skill – in terms of finding, interrogating and visualising data – with more traditional journalistic skills, like recognising stories and producing content that is genuinely useful to consumers.
Reporting to the head of data journalism, the successful candidate will be expected to help create and develop data-based packages, solve problems, find and ‘scrape’ key sources of data, and assist with the production of regular data bulletins flagging up news opportunities to editors and heads of content across the group.
You need to have bags of ideas, be as comfortable with sport as you are with news, know the tools to source and turn data into essential information for our readers and have a strong eye for detail.
This is a unique opportunity for a creative, motivated and highly-skilled individual to join an ambitious project from its start.
You will be expected to combine a high degree of technical skill – in terms of finding, interrogating and visualising data – with more traditional journalistic skills, like recognising stories and producing content that is genuinely useful to consumers.
Reporting to the head of data journalism, the successful candidate will be expected to help create and develop data-based packages, solve problems, find and ‘scrape’ key sources of data, and assist with the production of regular data bulletins flagging up news opportunities to editors and heads of content across the group.
You need to have bags of ideas, be as comfortable with sport as you are with news, know the tools to source and turn data into essential information for our readers and have a strong eye for detail.
News International were also recruiting a data journalist earlier this year, but I can’t find a copy of the actual ad.
From March, £23k-26k pa was on offer for a “Data Journalist” role that involved:
Identification of industry trends using quantitative-based research methods
Breaking news stories using digital research databases as a starting point
Researching & Analysing commercially valuable data for features, reports and events
Maintaining the Insolvency Today Market Intelligence Database (MID)
Mastering search functions and navigation of public databases such as London Gazette, Companies House, HM Court Listings, FSA Register, etc.
Using data trends as a basis for news stories and then using qualitative methods to structure stories and features.
Researching and producing content for the Insolvency cluster of products. (eg. Insolvency Today, Insolvency News, Insolvency BlackBook, Insolvency & Rescue Awards, etc.)
Identifying new data sources and trends, relevant to the Insolvency cluster.
Taking news stories published from rival sources and creating ‘follow up’ and analysis pieces using fresh data.
Occasional reporting from the High Court.
Liaising with the sales, events and marketing teams to share relevant ideas.
Sharing critical information with, and supporting sister editorial teams in the Credit and Payroll clusters.
Attending industry events to build contacts, network and represent the company.
On the other hand, a current rather clueless looking ad is offering £40k-60k for a “Data Journalist/Creative Data Engineer”:
Data Journalist/Creative Data Engineer is required by a leading digital media company based in Central London. This role is going to be working alongside a team of data modellers/statistical engineers and “bringing data to life”; your role will specifically be looking over data and converting it from “technical jargon” to creative, well written articles and white papers. This role is going to be pivotal for my client and has great scope for career progression.
To be considered for this role, you will ideally be a Data Journalist at the moment in the digital media space. You will have a genuine interest in the digital media industry and will have more than likely produced a white paper in the past or articles for publications such as AdAge previously. You will have a creative mind and will feel confident taking information from data and creating creative and persuasive written articles. Whilst this is not a technical role by anymeans it would definitely be of benefit if you had some basic technical knowledge with data mining or statistical modelling tools.
Here’s what the Associated Press were looking for from “Newsperson/Interactive Data Journalist”:
The ideal candidate will have experience with database management, data analysis and Web application development. (We use Ruby for most of our server-side coding, but we’re more interested in how you’ve solved problems with code than in the syntax you used to solve them.) Experience with the full lifecycle of a data project is vital, as the data journalist will be involved at every stage: discovering data resources, helping craft public records requests, managing data import and validation, designing queries and working with reporters and interactive designers to produce investigative stories and interactive graphics that engage readers while maintaining AP’s standards of accuracy and integrity.
Experience doing client-side development is a great advantage, as is knowledge of data visualization and UI design. If you have an interest in DevOps, mapping solutions or advanced statistical and machine learning techniques, we will want to hear about that, too. And if you have shared your knowledge through technical training or mentorship, those skills will be an important asset.
Most importantly, we’re looking for someone who wants to be part of a team, who can collaborate and communicate with people of varying technical levels. And the one absolute requirement is intellectual curiosity: if you like to pick up new technologies for fun and aren’t afraid to throw yourself into research to become the instant in-house expert on a topic, then you’re our kind of candidate.
And a post that’s still open at the time of writing – “Interactive Data Journalist ” with the FT:
The Financial Times is seeking an experienced data journalist to join its Interactive News team, a growing group of journalists, designers and developers who work at the heart of the FT newsroom to develop innovative forms of online storytelling. This position is based at our office in London.
You will have significant experience in obtaining, processing and presenting data in the context of news and features reporting. You have encyclopedic knowledge of the current best practices in data journalism, news apps, and interactive data visualisation.
Wrangling data is an everyday part of this job, so you are a bit of a ninja in Excel, SQL, Open Refine or a statistics package like Stata or R. You are conversant in HTML and CSS. In addition, you will be able to give examples of other tools, languages or technologies you have applied to editing multimedia, organising data, or presenting maps and statistics online.
More important than your current skillset, however, is a proven ability to solve problems independently and to constantly update your skills in a fast-evolving field.
While you will primarily coordinate the production of interactive data visualisations, you will be an all-round online journalist willing and able to fulfil other roles, including podcast production, writing and editing blog posts, and posting to social media.
We believe in building people’s careers by rotating them into different jobs every few years so you will also be someone who specifically wants to work for the FT and is interested in (or prepared to become interested in) the things that interest us.
So does that make it any clearer what a data journalist is or does?!
PS you might also find this relevant: Tow Center for Digital Journalism report on Post Industrial Journalism: Adapting to the Present
Pottering Around Council Websites – via Google
Over the last few weeks, I’ve started pondering what sort of data sets might be “almost available” on local council websites, along with the extent to which we might be able to use these datasets to support transparency goals, such as generating signals about the extent of cuts to local council services, or developing data driven local services, such as pub finders;-)
So for example, by chance I came across a page on my local council website detailing property the council is selling off:
Surplus to requirements, eh? I wonder how much property has gone up for sale or lease across other councils over the last year or so, and what sorts of services they used to house along with whether those services have been replaced with alternatives, in any meaningful sense?
As a start for ten, here’s a search to try out on your favourite web search engine:
"property for sale" intitle:council site:gov.uk
This won’t search across all council websites, but it’ll have a stab at ones that are hosted on the .gov.uk domain. For more thoughts on searching council websites by proxy, see Aggregated Local Government Verticals Based on LocalGov Service IDs.
And here’s an example of the sort of local news story that might result… @thisissurrey Surrey County Council makes £68million by selling off land and public buildings
Another area of the IW council website that was new to me was the list of public license registers:
So for example, I can look up establishments with a more than a few gaming machines:
No lat/long data, but there are addresses and postcodes, we means weCanHaz maps easily enough…
public register licenses site:gov.uk intitle:council
Looking for reputable suppliers is something I often turn to the parish magazine for (it’s a proxy for trust…), as well as the local Chamber of Commerce members list. But it seems as if this is also something the trading standards aspect of the council may be able to help out on… in th island’s case, there’s a Buy With Confidence register, for example.
“Trader register” seems to be the phrase to go for?
trader register site:gov.uk intitle:council
For food establishments, whilst the IW council participates in the Food Standards Agency’s ‘Food Hygiene Rating Scheme’, it doesn’t seem to pull any of that data into an access point on the council website? (I think my scraper of the FSA site may have rotted too? Food Standards Agency scraper.)
As well as the statutory disclosure of major spend items, the council also publishes details of local contracts – again, if we’re looking to track evidence about cuts, a log of contracts that don’t get renewed might be interesting over an extended period?
Whilst the council webpages don’t make it easy for you to see all the extant contracts,
another scraper can help….
For the holidaymakers, in part, the council produces a table of Beach Water Quality measures, though not on a map as far as I can tell (which reminds me of an old, old map hack mashup….!)? I suspect some of the beaches may be designated public places (no booze…), but at the last time of looking I couldn’t find any data identifying the extent of such areas on the island, let alone any shapefiles of the same…? I’m not sure if there’s data around showing when and which part of the beaches allow dogs on them, either?
"designated public places" site:gov.uk intitle:council
In terms of advertising local events, the council maintain a major events calendar, although I couldn’t spot an iCal feed so I can’t easily subscribe to it in my own calendar…
If you need to find somewhere to park, the council does publish lists of car parks – sort of:
Ooh – my mistake – they do a Google Map too [on which I also spy a KML link]…
As well as accommodation for holiday folk, the island has its fair share of care homes. Quality inspections, it seems, aren’t a council thing – data for that is handled by the Care Quality Commission.
The Isle of Wight Council doesn’t publish FOI disclosure logs as a matter of course, though some other councils do, along with responses:
(foi OR freedom information) +"disclosure log" site:gov.uk intitle:council
And why are FOI disclosure logs interesting? Well for one thing, they allow us to take the FOI Route to Real (Fake) Open Data.
Okay – that’s enough for now, methinks…
New 3D Scanner
Announcing the newest edition to the Media Center: our new NextEngine 3D scanner! To complement the Makerbot 3D printer, we now have the capability to produce high definition 3D meshes of small objects within around two hours. The NextEngine software also allows us to export in the .stl format – a format that can be printed on the 3D printer – so in due time we should be able to scan an object and then immediately start turning out plastic copies. I like to think that it brings us just one step closer to having Star Trek replicators.
After running a few calibration and test runs, we decided that our first victim for scanning and subsequent replication would be this miniature Buddha figurine. The scanner uses the combination of a camera and an array of lasers to scan objects, meaning that the easiest objects to scan aren’t too dark, light or shiny, and of course finer details and textures are harder to pick up. Ignoring that advice completely, we went ahead and scanned the Buddha figure.
Scans take about an hour to two hours to complete depending on the detail of the scan – for the Buddha, I used two 360° scans, one at a 0° tilt and one a around a 20° positive tilt to get some of the details on top of Buddhas hands and arms. Each 360° scan family consists of six to sixteen rotations – for this one I used twelve. Once the scans are complete, the software patches them together into a single 3D model, but sometimes it needs a little manual adjustment to get it just perfect.
After some toying with the scans on the NextEngine software we went ahead and printed a copy of Buddha on the Makerbot! Now I would draw your attention to the surprising level of detail on Buddha 2.0′s upper body, and not the fact that his lower half is slightly completely mutilated. Then again, we learned the importance of insuring that there are no holes in the 3D mesh or Makerbot kind of freaks out. Now, we think we’ve figured out a method for getting a 3D scan that is watertight and should produce printings that aren’t bisected.
Audio Mixers
The Running Man ProFX12 is a complete audio mixer station that is still relatively portable for easy setup and fairly simple to use. In addition to the standard audio connections, it also comes with built in USB I/O, allowing for easy recording or streaming music from a laptop.
We also have the Zoom R24 available, which is smaller and more portable than the Running Man ProFX12. This model also comes complete with an SD card which can be recorded to.
Pondering Bibliographic Coupling and Co-citation Analyses in the Context of Company Directorships
Over the last month or so, I’ve made a start reading through Mark Newman’s Networks: An Introduction, trying (though I’m not sure how successfully!) to bring an element of discipline to my otherwise osmotically acquired understanding of the techniques employed by various network analysis tools.
One distinction that made a lot of sense to me came from the domain of bibliometrics, specifically between the notions of bibliographic coupling and co-citation.
Co-citation
The idea of co-citation will be familiar to many – when one article cites a set of other articles, those other articles are “co-cited” by the first. When the same articles are co-cited by lots of other articles, we may have reason to believe that they are somehow related in a meaningful way.
In graph terms, we might also represent this as simpler graph within which edges between two articles indicate that they have been co-cited by documents within a particular corpus, with the weight of each edge representing the number of documents within that corpus that have co-cited them.
Bibliographic coupling
Bibliographic coupling is actually an earlier notion, describing the extent to which two works are related by virtue of them both referencing the same other work.
Again, in graph terms, we might think of a simpler undirected network in which edges between two articles act as an indicator that they have cited or referenced the same work, with the weight of the edge representing the number of documents that they have co-cited.
A comparison of co-citation and bibliographic coupling networks shows one to be “retrospective” and the other to be “forward looking”. The articles referenced in bibliographic coupling network can be generated directly from a corpus set of articles, and to this extent bibliographic coupling looks to the past. In a co-citation network, the edges that connect two articles can only be generated when a future published article cites them both.
Co-citation, Bibliographic Coupling and Company Director Networks
For some time I’ve been tinkering with the notion of co-director networks, using OpenCorporates data as a data source (eg Mapping Corporate Networks With OpenCorporates). What I’ve tended to focus on are networks built up from active companies and their current directors, looking to see which companies are currently connected by virtue of currently sharing the same directors. On the to do list are timelines showing the companies that a particular director has been associated with, and when, as well as directorial appointments and terminations within a particular company.
In both co-citation and bibliographic analyses, the nodes are the same type of thing (that is, works that are citated, such as articles). A work cites a work. (Note: does author co-citation analysis rely on mappings from works to cited authors, or citing authors to cited authors?). In company-director networks, we have bipartite representation, with directors and companies representing the two types of node and where edges connect companies and directors but not companies and companies or directors and directors; unless a company is a director, but we generally fudge the labelling there.
If we treat “companies that retain directors” as “articles that cite other articles”:
- under a “co-citation” style view, we generate links between companies that share common directors;
- under a “bibliographic coupling” style view, we generate links between directors of the same companies.
I’ve been doing this anyway, but the bibliographic coupling/co-citation distinction may help me tighten it up a little, as well as improving ways of calculating and analysing these networks by reusing analyses described by the bibliometricians?
Pondering the “future vs. past” distinction, the following also comes to mind:
- at the moment, I am generating networks based on current directors of active companies;
- could we construct a dynamic (temporal?) hypergraph from hyperedges that connect all the directors associated with a particular company at a particular time? If so, what could we do with this graph?! (As an aside, it’s probably worth noting that I know absolutely nothing about hypergraphs!)
I’ve also started wondering about ‘director pathways’ in which we define directors as nodes (where all we require was that a person was a director of a company at some time) and directed “citation” edges. These edges would go from one director to other director nodes under the condition that the “citing” director was appointed to a particular company within a particular time period t1..t2 before the appointment to the same company of a “cited” director. If one director follows another director into more than one company, we increase the weight of the edge accordingly. (We could maybe also explore modes in which edge weights represent the amount of time that two directors are in the same company together.)
The aim is… probably pointless and not that interesting. Unless it is… The sort of questions this approach would allow us to ask would be along the lines of: are there groups of directors whose directorial appointments follow similar trajectories through companies; or are there groups of directors who appear to move from one company to another along with each other?
Green Screen
We also have green screens available which can be used to provide a green background for your video or photography shots in order to facilitate digital effects.
Webcams
Webcams which can be connected to computers via USB are available both for checkout and installed in the various workstations at the Media Center. These cameras are easily configurable and useful for video conferencing.

TurningPoint Response System
The TurningPoint response system is a wireless system used to collect responses from your audience during a class or a presentation. Once connected to a computer via USB, the system’s included software can be used to display a question or poll. Audience members then respond on their remotes and the answers are recorded and can be displayed in real time. We have a total of around two hundred remotes and the system can be checked out in with any number of remotes.












































