This is a valid RSS feed.
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
<description><![CDATA[“The problem with agile is scope creepâ€\x9d, he s ...
^
line 32, column 81: (2 occurrences) [help]
... œThe problem with agile is scope creepâ€\x9d, he said</p>
^
line 33, column 0: (20 occurrences) [help]
<p>“We run a tight ship here. Weâ€<img src="https://s.w.org/images/core/em ...
line 61, column 0: (31 occurrences) [help]
<content:encoded><![CDATA[<p><a href="https://www.flickr.com/photo ...
line 61, column 0: (24 occurrences) [help]
<content:encoded><![CDATA[<p><a href="https://www.flickr.com/photo ...
<p><a data-flickr-embed="true" href="https://www.flickr.com/photos/psd/84515 ...
line 76, column 0: (3 occurrences) [help]
<p><iframe src="https://www.youtube.com/embed/nMqxNPsfN50" frameborder="0" a ...
line 362, column 3: (6 occurrences) [help]
]]></content:encoded>
^
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<channel>
<title>Paul Downey</title>
<atom:link href="http://blog.whatfettle.com/feed/" rel="self" type="application/rss+xml" />
<link>http://blog.whatfettle.com</link>
<description>Whatfettle, marras?</description>
<lastBuildDate>Wed, 02 Jan 2019 13:41:49 +0000</lastBuildDate>
<language>en-US</language>
<sy:updatePeriod>
hourly </sy:updatePeriod>
<sy:updateFrequency>
1 </sy:updateFrequency>
<generator>https://wordpress.org/?v=6.2.6</generator>
<item>
<title>Scope creep</title>
<link>http://blog.whatfettle.com/2015/10/13/scope-creep/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Tue, 13 Oct 2015 08:05:53 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1295</guid>
<description><![CDATA[“The problem with agile is scope creepâ€\x9d, he said “We run a tight ship here. We’re on a budget, spending public money and can’t afford for things to slip. We need a detailed plan and a fixed contract to hold our suppliers’ feet to the fire when they deviate from it. We have to deliver […]]]></description>
<content:encoded><![CDATA[<p>“The problem with agile is scope creepâ€\x9d, he said</p>
<p>“We run a tight ship here. Weâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />re on a budget, spending public money and canâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t afford for things to slip. We need a detailed plan and a fixed contract to hold our suppliersâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" /> feet to the fire when they deviate from it. We have to deliver all of the features, and on-time!â€\x9d</p>
<p>Iâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />m used to people challenging agile as if itâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s unconventional. As if itâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s a new, untried, untested thing. The present is here, itâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s just not evenly distributed.</p>
<p>But somehow on this day I wasnâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t prepared for this challenge. I was flummoxed. </p>
<p>Maybe he did run a tight ship and just hadnâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t seen the horrors Iâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />d seen: the service delivered feature complete even though most of the features werenâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t needed, the service so complicated it was unusable, feature complete lest the supplier invoked penalty clauses. The system which cost too much to change, which needlessly instructed people to post their passports to an office, needing operational staff post them back unopened. Systems procured with a fixed 10 year contract, and which were already obsolete before they were completed. </p>
<p>Maybe he just hadnâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t seen agile in action, how well it works even in <a href="https://gds.blog.gov.uk/2015/10/09/how-to-be-agile-in-a-non-agile-environment/">non-agile environments</a>.</p>
<p>Then on the train back it hit me. <a href="https://en.wikipedia.org/wiki/L%27esprit_de_l%27escalier">L'esprit de chemin de fer</a>:</p>
<p>The whole point of agile is scope creep!</p>
<p>Any worthwhile thing you work on is bound to be a <a href="https://en.wikipedia.org/wiki/Wicked_problem">wicked problem</a>: building changes how you think, changes how your users think, changes the world around it.</p>
<p>Anyone can envisage something, plan it out, get someone else to make it, but thatâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s missing a trick; software is now so trivial to make you can <a href="http://blog.memespring.co.uk/2014/11/10/product-land-1/">make lots of things</a> and learn from them. </p>
<p>Anyone can now make a thing. But making the right thing, thatâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s the trick.</p>
<p>Start small, discover what your users really need, learn from what you build, and iterate as you learn. Those features? Chances are your users ainâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t gonna need ‘em.</p>
<p>Agile: make it up as you go along.</p>
<p>Waterfall: make it all up before you start, live with the consequences.</p>
]]></content:encoded>
</item>
<item>
<title>Note to self: write more</title>
<link>http://blog.whatfettle.com/2015/10/11/write-more/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Sun, 11 Oct 2015 18:26:45 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1268</guid>
<description><![CDATA[This is me kicking myself up the backside. You need to write more and here's why. You haven't written anything here for almost a year. You had a go at kick-starting this blog but gave yourself a ridiculously hard task which became hard to juggle when the day-job got super-interesting. You also lost heart because […]]]></description>
<content:encoded><![CDATA[<p><a href="https://www.flickr.com/photos/psd/22079841012/" title="MOAR BLOGGING!"><img decoding="async" loading="lazy" src="https://farm6.staticflickr.com/5826/22079841012_59444e8d6a_h.jpg" width="1440" height="1600" alt="MOAR BLOGGING!"></a></p>
<p>This is me kicking myself up the backside.</p>
<p>You need to write more and here's why.</p>
<p>You haven't written anything here for almost a year. You had a go at <a href="http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/">kick-starting</a> this blog but gave yourself a ridiculously hard task which became hard to juggle when the day-job got <a href="https://gds.blog.gov.uk/2015/09/01/registers-authoritative-lists-you-can-trust/">super-interesting</a>.</p>
<p>You also lost heart because a couple of people had a little wobble about some of the stuff you said. That's a shame because you had far more positive than negative comments and you thrive on good feedback (don't we all).</p>
<p><a href="https://www.flickr.com/photos/psd/413601734/" title="Intelligence (v) Drive"><img decoding="async" loading="lazy" src="https://farm1.staticflickr.com/173/413601734_b98b682e06.jpg" width="406" height="375" alt="Intelligence (v) Drive"></a></p>
<p>You are incredibly lazy but are not being as smart as you like to think. You like meeting people and are happy to explain an idea over and over but then suddenly get bored and need to hide. Rather than turn yourself into one of the "<a href="https://en.wikipedia.org/wiki/Fahrenheit_451_(film)">Book People</a>" you need to write stuff down.</p>
<p><a href="https://www.flickr.com/photos/psd/13992905267" title="Characters for an epic team"><img decoding="async" loading="lazy" src="https://farm3.staticflickr.com/2937/13992905267_06633bb20c_h.jpg" width="1600" height="1129" alt="Characters for an epic team"></a></p>
<p>You work in government and that can make things hard. You were a little spooked by <a href="https://en.wikipedia.org/wiki/Purdah_(pre-election_period)">purdah</a> and other <a href="http://www.telegraph.co.uk/news/politics/conservative/11479858/Civil-servants-to-be-sacked-if-they-talk-to-journalists-without-government-approval.html">noises-off</a>. You know the hardest, most frustrating job in government is communications, but you are surrounded by people who not only can write, do write all day professionally. Don't waste this brilliant opportunity to learn from them; take all the help you can get.</p>
<p><a href="https://twitter.com/davegray/status/558337267815948289" title="CULT (v) CULTURE from Dave Gray Gamestorming"><img decoding="async" src="http://40.media.tumblr.com/c80e076ee7c9ec8c8a06df8dcbbd2cef/tumblr_nkg2xnuxfx1qz4bcto1_500.png"></a></p>
<p>Your friend and mentor <a href="http://confusedofcalcutta.com/">JP</a> turned you onto the <a href="http://www.cluetrain.com/">Cluetrain Manifesto</a> and you swallowed it whole. You know it's telling human stories using a human voice that makes the difference. Make sure you get your voice out there. Write!</p>
<p><a data-flickr-embed="true" href="https://www.flickr.com/photos/psd/8451589322/in/photolist-dSQB4h" title="The Unit of Delivery is The Team"><img decoding="async" loading="lazy" src="https://farm9.staticflickr.com/8374/8451589322_e9f612cf5b.jpg" width="500" height="329" alt="The Unit of Delivery is The Team"></a></p>
<p>There are times at work when it helps for everybody to sing from the same song-sheet to avoid cacophony, but you should still find ways of giving others the space to tell the story you want to get out there using their own voice, and relax when it isn't exactly what you'd have said. You don't have to do it all yourself but you should be leading by example. So write!</p>
<p><a href="https://www.flickr.com/photos/psd/21470746114/" title="make-things open it makes them better"><img decoding="async" loading="lazy" src="https://farm1.staticflickr.com/700/21470746114_1fdedb1435_h.jpg" width="1600" height="1136" alt="10-make-things-open-it-makes-them-better"></a></p>
<p>The work you do depends on discourse. Writing helps you engage with other people. <a href="https://en.wikipedia.org/wiki/Kite-flying_(politics)">Fly kites</a>. Learn from <a href="https://gds.blog.gov.uk/2015/08/26/gov-wheres-my-stuff/">others</a> and get stuff out there sooner. Ask better questions sooner rather than holding off until you have all the answers.</p>
<p><iframe src="https://www.youtube.com/embed/nMqxNPsfN50" frameborder="0" allowfullscreen=""></iframe></p>
<p>Being cloaked can sometimes be cool, but it's not safe. You work in government and are spending public money, so have a duty to be more transparent. You have to be cautious about what you say and write but not saying anything at all isn't an option. Saying nothing is a great way to make people worry.</p>
<p>You might think you are safely locked away in your shed making Chitty-Chitty Bang-Bang, but from the outside it's easy to imagine you're cooking up The Pie Machine from Chicken Run.</p>
<p>You know it's much simpler if you're open as you go along, so do that rather than bottling suff up and then giving yourself the hard work of needing a big reveal.</p>
<p><iframe frameborder="0" src="https://www.bbc.co.uk/programmes/p034nn2q/player"></iframe></p>
<p>You read a lot and like the idea of writing. You know the best way to get better at writing is to write, so write!</p>
<p>Above all take it easy. Don't set yourself impossible goals and then kill yourself trying. Don't beat yourself up when it fails, just start writing again and let's see how it goes.</p>
]]></content:encoded>
</item>
<item>
<title>One CSV, thirty stories: 21. Mistakes were made</title>
<link>http://blog.whatfettle.com/2014/12/02/mistakes-were-made/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Tue, 02 Dec 2014 06:55:04 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1243</guid>
<description><![CDATA[This is day 21 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data I now know quite a few people who work for the Land Registry, regularly hang out […]]]></description>
<content:encoded><![CDATA[<p><em>This is day 21 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data</em></p>
<p>I now know quite a few people who work for the Land Registry, regularly hang out with some of the digital team, follow "@LandRegGov":https://twitter.com/LandRegGov on Twitter, and here I am writing these thirty love-notes to their open data. I also have something of an ego. So it was a little disappointing to have to hear third-hand about a recent Land Registry blog post:</p>
<p>bq. "Price Paid Data – Improving data quality":http://blog.landregistry.gov.uk/price-paid-data-improving-data-quality/ </p>
<p>That's a fairly jaunty title I thought, and entirely relevant to my interests!</p>
<p>bq. While we strive to release data of the highest quality we know sometimes that we could do better. This month we are improving our Price Paid Dataset by removing historic transactions that were added in error.</p>
<p>My heart sank.</p>
<p>bq. Recently a customer reported some duplicate entries in our 2003 and 2004 Price Paid Dataset. After investigation we found there had been an internal error with a process used to cancel applications. Price paid entries were not removed when they should have been. That process changed early in 2005. Weâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />ve now corrected the data and will be removing around 48,000 transactions from a dataset that contains over 19 million. There were approximately 18,000 duplicates in 2003, 30,000 in 2004 and less than 100 from 2005.</p>
<p>Less than 0.25% of records over twenty years doesn't sound too bad, but if 30,000 of the 1,261,448 transactions in 2004 are duplicate, that implies a 2.4% increase in volume and means roughly 1 in every 40 transactions are ones which in wouldn't be present in other years. That could be quite bad news for these posts.</p>
<p>bq. The invalid entries will be removed from each version of the yearly files that we publish through GOV.UK and from the single complete file of all Price Paid transactions. The change will also be applied to the open data used by Price Paid Report Builder in the same month.</p>
<p>OK, so that's cool.</p>
<p>bq. We will be publishing a file on GOV.UK that contains details of all the invalid entries on 28 November 2014. The file will be in the same form as the monthly update, which can be used to update data stores. Each record in the update files will have a record status set to ‘Dâ€<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</p>
<p>Well the 28th was last Friday, so I created a fresh clone of "the repository":https://github.com/psd/price-paid-data and ran "make" to download the latest version of the price-file and rebuild the data. That took a while.</p>
<p>Looking at the status, there are no 'D' records in new complete file:</p>
<p>bc. cat old/data/status.tsv new/data/status.tsv<br />
19325571 A<br />
19455964 A</p>
<p>It did occur to me there might be duplicate records within the CSV file, so checked, firstly by looking for duplicate transaction identifiers:</p>
<p>bc. $ awk '!_[$1]++' < pp-complete.csv > pp-deduped.csv<br />
$ cmp pp-complete.csv pp-deduped.csv<br />
[no difference]</p>
<p>and then again by uniquely sorting the entire file:</p>
<p>bc. $ cat pp-complete.csv | iconv -f ISO-8859-1 -t UTF-8 | sort -u > pp-complete-uniq.csv<br />
$ wc -l pp-complete.csv pp-complete-uniq.csv<br />
19325571 pp-complete.csv<br />
19325571 pp-complete-uniq.csv</p>
<p>I then wanted to compare the old CSV with the new version. Ordinarily I'd use "opendiff":http://blog.whatfettle.com/2005/12/05/i-opendiff/ to visually compare two versions of text, but these files are way too big for that. Also the records could be in a different order, so I sorted them by date and then ran them through the good old Unix "diff":http://en.wikipedia.org/wiki/Diff_utility:</p>
<p>bc. $ cat old/data/pp-complete-old.csv | iconv -f ISO-8859-1 -t UTF-8 | sort -t, -k3 > old/data/pp-complete-sorted.tsv<br />
$ cat new/data/pp-complete.csv | iconv -f ISO-8859-1 -t UTF-8 | sort -t, -k3 > new/data/pp-complete-sorted.tsv<br />
$ diff old/data/pp-complete-sorted.csv new/data/pp-complete-sorted.csv > diffs.txt</p>
<p>On my quite constrained, otherwise busy laptop, running on battery whilst I was "sat on a delayed train":https://twitter.com/psd/status/539342955065405441, that took quite a while, but not long enough to make me feel a need to provision and spin up a Hadoop cluster.</p>
<p>bc. real 8m9.073s<br />
user 0m44.637s<br />
sys 3m57.099s</p>
<p>I put the output into a "gist":https://gist.github.com/psd/81f49b1429318fcdb2c2. We can then get a feel for how much has changed using "diffstat":http://invisible-island.net/diffstat/:</p>
<p>bc. $ diffstat diffs.txt<br />
unknown |243657 +++++++++++++++++++++++++++++++++++++++++++++++++---------------<br />
1 file changed, 187025 insertions(+), 56632 deletions(-)</p>
<p>Though that doesn't quite tell the story. So I split the CSV into a file for each year:</p>
<p>bc. $ awk -F, '{ file="years/" substr($3,2,4) ".csv"; print > file}' pp-complete-sorted.tsv</p>
<p>then looked at the difference for each year:</p>
<p>bc. $ for i in *; do diff $i ../../../old/data/years/$i > $i.txt ; done<br />
$ ls *txt | xargs -L 1 -t diffstat</p>
<p>bc. 1995 | 140 +++++++++++++++++++++++++++++++++++++---------------------------<br />
1 file changed, 81 insertions(+), 59 deletions(-)<br />
1996 | 202 ++++++++++++++++++++++++++++++++++++++--------------------------<br />
1 file changed, 120 insertions(+), 82 deletions(-)<br />
1997 | 194 ++++++++++++++++++++++++++++++++++++++--------------------------<br />
1 file changed, 116 insertions(+), 78 deletions(-)<br />
1998 | 205 ++++++++++++++++++++++++++++++++++++++--------------------------<br />
1 file changed, 122 insertions(+), 83 deletions(-)<br />
1999 | 244 ++++++++++++++++++++++++++++++++++++++--------------------------<br />
1 file changed, 147 insertions(+), 97 deletions(-)<br />
2000 | 282 +++++++++++++++++++++++++++++++++++-----------------------------<br />
1 file changed, 156 insertions(+), 126 deletions(-)<br />
2001 | 366 +++++++++++++++++++++++++++++++++++-----------------------------<br />
1 file changed, 203 insertions(+), 163 deletions(-)<br />
2002| 365 ++++++++++++++++++++++++++++++++++------------------------------<br />
1 file changed, 194 insertions(+), 171 deletions(-)<br />
2003 |19166 ----------------------------------------------------------------<br />
1 file changed, 199 insertions(+), 18967 deletions(-)<br />
2004 |30190 ----------------------------------------------------------------<br />
1 file changed, 209 insertions(+), 29981 deletions(-)<br />
2005 | 396 +++++++++++++++++++++++++++-------------------------------------<br />
1 file changed, 169 insertions(+), 227 deletions(-)<br />
2006 | 415 +++++++++++++++++++++++++++++++++-------------------------------<br />
1 file changed, 220 insertions(+), 195 deletions(-)<br />
2007 | 494 ++++++++++++++++++++++++++++++++++------------------------------<br />
1 file changed, 265 insertions(+), 229 deletions(-)<br />
2008 | 229 ++++++++++++++++++++++++++++++++--------------------------------<br />
1 file changed, 118 insertions(+), 111 deletions(-)<br />
2009 | 233 +++++++++++++++++++++++++++++++++++-----------------------------<br />
1 file changed, 129 insertions(+), 104 deletions(-)<br />
2010 | 278 +++++++++++++++++++++++++++++++++-------------------------------<br />
1 file changed, 147 insertions(+), 131 deletions(-)<br />
2011 | 244 ++++++++++++++++++++++++++++++++++------------------------------<br />
1 file changed, 131 insertions(+), 113 deletions(-)<br />
2012 | 370 +++++++++++++++++++++++++++++++++++-----------------------------<br />
1 file changed, 208 insertions(+), 162 deletions(-)<br />
2013 | 1434 +++++++++++++++++++++++++++++++++++++++++++++-------------------<br />
1 file changed, 1014 insertions(+), 420 deletions(-)<br />
2014 |188210 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--<br />
1 file changed, 183077 insertions(+), 5133 deletions(-)</p>
<p>That's a lot of changes, and matches the number of changes for the years outlined in the announcement, but there are quite a few other changes as well.</p>
<p>Summaries are useful, but there's no replacement for digging into the detail. Mostly the differences seem to be changes to addresses, such as:</p>
<p>bc. 9715c9715<br />
< ...,"533","FLAT ABOVE SHOP","BATTERSEA PARK ROAD","LONDON","LONDON","WANDSWORTH","GREATER LONDON","A"^M
---
> ... ,"533A","","BATTERSEA PARK ROAD","","LONDON","WANDSWORTH","GREATER LONDON","A"^M</p>
<p>and:</p>
<p>bc. 8447592c8447867<br />
< ... ,"ROSEWAITE, 1","","PANTON ROAD","BENNIWORTH","MARKET RASEN","EAST LINDSEY","LINCOLNSHIRE","A"^M
---
> ... ,"ROSEWAITE","","","BENNIWORTH","MARKET RASEN","EAST LINDSEY","LINCOLNSHIRE","A"^M</p>
<p>or even:</p>
<p>bc. 8605627c8605907<br />
< ... ,"GLANLLYN","FLAT 2","WATER STREET","BARMOUTH","BARMOUTH","GWYNEDD","GWYNEDD","A"^M
---
> ... ,"BRON LLYN","","WATER STREET","","BARMOUTH","GWYNEDD","GWYNEDD","A"^M</p>
<p>As mentioned on "Day 13":http://blog.whatfettle.com/2014/11/06/one-csv-thirty-stories-13-postcodes/ addresses are a fluffy, movable feast, but I'm suprised to see changes like this to records from 1995. This has further reduced my confidence in the use of informal lines of text as a means of identifying a location. I'll dig deeper into addresses in a subsequent post, but my position remains: addresses are just an attribute, not a key. What we need in the price-paid data file is a stable identifier for each property transacted upon, and a stable identifier for a street address, with links between the two identities.</p>
<p>Then there are records where other fields such as the property-type has changed, as in this example where a Semi-detached house sold in January 1995 is now recorded as being Detached:</p>
<p>bc. 25601c25601<br />
< "{1FB96B78-6395-4C6F-9A7C-F1D8ABC78EB6}","65000","1995-01-20 00:00","NR14 7SX","S","N","F","22","","CAWSTONS MEADOW","PORINGLAND","NORWICH","SOUTH NORFOLK","NORFOLK","A"^M
---
> "{1FB96B78-6395-4C6F-9A7C-F1D8ABC78EB6}","65000","1995-01-20 00:00","NR14 7SX","D","N","F","22","","CAWSTONS MEADOW","PORINGLAND","NORWICH","SOUTH NORFOLK","NORFOLK","A"^M</p>
<p>Hmmm. I wondered if a more recent transaction might have impacted this early record:</p>
<p>bc. $ grep "NR14 7SX" old/data/pp-complete-sorted.tsv | grep '"22"'<br />
"{1FB96B78-6395-4C6F-9A7C-F1D8ABC78EB6}","65000","1995-01-20 00:00","NR14 7SX","S","N","F","22","","CAWSTONS MEADOW","PORINGLAND","NORWICH","SOUTH NORFOLK","NORFOLK","A"<br />
"{B78457EE-8921-4211-837F-27A3EE2F7895}","177500","2007-09-17 00:00","NR14 7SX","S","N","F","22","","CAWSTONS MEADOW","PORINGLAND","NORWICH","SOUTH NORFOLK","NORFOLK","A"</p>
<p>bc. $ grep "NR14 7SX" new/data/pp-complete-sorted.tsv | grep '"22"'<br />
"{1FB96B78-6395-4C6F-9A7C-F1D8ABC78EB6}","65000","1995-01-20 00:00","NR14 7SX","D","N","F","22","","CAWSTONS MEADOW","PORINGLAND","NORWICH","SOUTH NORFOLK","NORFOLK","A"<br />
"{B78457EE-8921-4211-837F-27A3EE2F7895}","177500","2007-09-17 00:00","NR14 7SX","D","N","F","22","","CAWSTONS MEADOW","PORINGLAND","NORWICH","SOUTH NORFOLK","NORFOLK","A"</p>
<p>It seems not. That's even more worrying. This kind of detail shouldn't just change arbitrarily in what should after all be an immutable, tamper-proof register.</p>
<p>bq. Despite the number of transactions affected, we can confirm that there is no impact on the House Price Index figure published each month. However sales volume figures will change.</p>
<p>That does indeed sound hopeful, and is borne out by my difficulty of spotting differences in the regenerated visualisations from those in previous posts. I was particularly worried about losing some of the more interesting anomalies in the heatmap we made into a poster and pinned up on the Land Registry canteen wall on "Day 14":http://blog.whatfettle.com/2014/11/07/one-csv-thirty-stories-14-hackday/:</p>
<p><a href="https://www.flickr.com/photos/psd/15303511564" title="New data by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7470/15303511564_4bed91160f_o.png" width="635" height="249" alt="New data"></a></p>
<p>Not much of a visible difference between the two plots. Phew!</p>
<p>So after being initially irritated by the title of this post, the "non-apology apology":http://en.wikipedia.org/wiki/Non-apology_apology tone and disappointed about how I came to hear about the issue I think this is actually a very cool story for open data, and I have to applaud the Land Registry for their openness.</p>
<p>People working in Government are often expected to be infallible and get a lot of criticism when they're not. Often that's because there's no alternative but to use their information or service, which is one of the reasons being called a "customer" when you're "the user":http://russelldavies.typepad.com/planning/2014/06/consumers-users-people-mammals.html of a public service without any alternative can feel quite insulting.</p>
<p>As the canonical data provider, such pressures can make it very easy to let perfection become the enemy of good, and limit the amount of data released. Although I'm pushing the Land Registry to do better in these posts, I remain best pleased the Land Registry releases this data openly, and in convenient form. Mistakes will happen, but admitting fault like this quickly and openly is great, and exactly the right thing to do.</p>
<p>This issue is a great example of how publishing open data can really help improve its quality. I suspect the "customer feedback" must have been based on knowledge of particular transactions. </p>
<p>"Richard":http://www.memespring.co.uk/ has suggested ways of "programatically testing regulatory data":http://blog.memespring.co.uk/2014/07/16/programatically-testing-regulatory-data/ and enabling greater scrutiny is exactly why we should open up more data. Being open allows anyone to independently cross-check the validity and veracity of public data, building a better relationship with users and encouraging improving the quality of our public data at source. Establishing such feedback loops is why one of my favourite Government Digital Service design principles is to "make things open, it makes them better":https://www.gov.uk/design-principles#tenth</p>
<p><img decoding="async" src="https://raw.githubusercontent.com/psd/design-principles-doodles/master/postcards/10-make-things-open-it-makes-them-better.png"></p>
<p>One thing publishers of data should do is to maintain files such as this under "revision control":http://en.wikipedia.org/wiki/Revision_control and publish differences, so as consumers we can track changes and assess their impact.</p>
<p>bq. If you have any queries or concerns over this correction please contact us at commercial.services@landregistry.gsi.gov.uk. We welcome your feedback.</p>
<p>I'm a little put off by mention of "commercial services", but I have sent mail to this address with a link to this post.</p>
<p>This blip did take the wind out of my sails a little, but I'll aim to pick up with another post in the series "tomorrow".</p>
<p><i>"Owen Boswarva":http://www.owenboswarva.com/ has shared similar thoughts on this issue in his post "how far can we trust open data":http://mapgubbins.tumblr.com/post/103854046790/how-far-can-we-trust-open-data</i></p>
<p><i>Update: I had a very nice response to my mail from "Lorna Jordan":http://blog.landregistry.gov.uk/author/lorna/ who I know has been busy supporting users and working on releasing more data. Lorna confirmed the 'D' status appears to the monthly files, and are then removed when compiling complete file and also explained the Land Registry continue to receive cases as old as 1995 even now which are being registered voluntarily for the first time, which results in additions. She also explained the process for publishing price-paid data, which can introduce corrections: "The price paid data does not come from the register, but is captured separately at the beginning of a case, therefore the register is not incorrect for property type etc when these changes occur. However, if there are changes to the price or date of a transaction we will always check against the register and amend if needed."</i></p>
]]></content:encoded>
</item>
<item>
<title>One CSV, thirty stories: 20. Unknown prices</title>
<link>http://blog.whatfettle.com/2014/11/27/one-csv-thirty-stories-20-unknown-prices/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Thu, 27 Nov 2014 09:42:06 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1224</guid>
<description><![CDATA[This is day 20 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data "Yesterday":http://blog.whatfettle.com/2014/11/25/one-csv-thirty-stories-19-bubblepleth/ was hard work for no avail, so today for a quick-win I've fallen back on […]]]></description>
<content:encoded><![CDATA[<p><em>This is day 20 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data</em></p>
<p>"Yesterday":http://blog.whatfettle.com/2014/11/25/one-csv-thirty-stories-19-bubblepleth/ was hard work for no avail, so today for a quick-win I've fallen back on something reasonably obvious, "using d3 to plot":http://psd.github.io/price-paid-data/html/unknown-prices.html the average price-per-postcode to form a well-known album cover:</p>
<p><a href="https://www.flickr.com/photos/psd/15884902641" title="Unknown Prices by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7470/15884902641_21b1fdc32c_c.jpg" width="800" height="575" alt="Unknown Prices"></a></p>
<p>Er, why trot out that old trope, you might ask. Well I was 16 when Joy Division released "Unknown Pleasures":http://en.wikipedia.org/wiki/Unknown_Pleasures and it spoke to me, not least because of Peter Saville's iconic album cover which had no text, but what appeared to be an intriguing, futuristic, computer rendered landscape. I particularly remember trying to recreate the image using 6502 machine code on my friend's dad's "Rockwell AIM 65":http://en.wikipedia.org/wiki/AIM-65 the only computer I'd seen at that point with any kind of output beyond a seven-segment display. The results on ephemeral, thermal printer paper are sadly long-gone. </p>
<p>I also remember being thrilled to see the same image appear later the same year, as the lander descended in "Alien":http://en.wikipedia.org/wiki/Alien_%28film%29:</p>
<p><a href="http://psd.tumblr.com/post/103707198587/alien-1979-snagged-by-eric-carl"><img decoding="async" src="http://31.media.tumblr.com/755e1569670eaf5ca639d748dd30adbb/tumblr_nfos4nfkBT1qz4bcto1_500.jpg"></a></p>
<p>Growing up in the north-east of England, we were very aware of "Ridley Scott's":http://en.wikipedia.org/wiki/Ridley_Scott local heritage (he's a "monkey hanger":http://en.wikipedia.org/wiki/Monkey_hanger), and since then I've felt a similar, strange bond with other people who have found ways of paying homage to this image which includes a number of people I know including "Dan Catt":http://revdancatt.com/ whose "Project CAT820":http://revdancatt.com/2013/03/06/project-cat820-joy-divisualization uses Processing.js, a great little library which I've experimented with myself, and "written about here, before":http://blog.whatfettle.com/2008/05/11/tiddlyprocessing/. I have also lost far too much time exploring "Elevation Lines":http://mroctopus.net/geo/elevation-lines.html but it was "Brian Suda":http://suda.co.uk/'s original image of Iceland which really captured my imagination:</p>
<p><a href="https://www.flickr.com/photos/suda/5384299394" title="Unknown Pleasures: Iceland by Brian Suda, on Flickr"><img decoding="async" loading="lazy" src="https://farm6.staticflickr.com/5217/5384299394_d6c52bb84f_z.jpg" width="640" height="452" alt="Unknown Pleasures: Iceland"></a></p>
<p>The research tools available to today's 16 year old are very different to those in 1979, nevertheless, I'm still somewhat embarrassed to admit I had little idea where this visualisation originated until Jeremy wrote about the data behind the design of "FACT 10":https://adactio.com/journal/1489 and the way "Jocelyn Bell Burnell's":https://adactio.com/journal/6531 ground-breaking pulsar discovery was explained using the original visualisation published in the Cambridge Encyclopaedia of Astronomy. I've written before about how "I'm not bothered about attribution":http://blog.whatfettle.com/2008/10/24/on-the-vanity-of-demanding-attribution/ and ultimately I'm very down with remix culture, of which this is a great, early digitalish example, but I'm less sanguine about the how the work of great women like Jocelyn goes mundanely unacknowledged. </p>
<p>Anyway, we hopefully, are moving towards a different age. Here's Peter on the story behind the cover, and the impact of his design:</p>
<p><iframe loading="lazy" width="640" height="360" src="//www.youtube.com/embed/BxyDT11RD04" frameborder="0" allowfullscreen></iframe>.</p>
<p>So it might be a lazy trope, and it's definitely very late for me to jump on the bandwagon, but I still find it an aesthetically pleasing image, and it speaks directly to my inner teenager. "Tomorrow":http://blog.whatfettle.com/2014/12/02/mistakes-were-made/ I'll start to dig into addresses.</p>
]]></content:encoded>
</item>
<item>
<title>One CSV, thirty stories: 19. Bubblepleth</title>
<link>http://blog.whatfettle.com/2014/11/25/one-csv-thirty-stories-19-bubblepleth/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Tue, 25 Nov 2014 13:43:32 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1211</guid>
<description><![CDATA[This is day 19 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data "Yesterday":http://blog.whatfettle.com/2014/11/20/one-csv-thirty-stories-18-choropleth/ I made a simple choropleth map of average prices. Today I wanted to iterate on […]]]></description>
<content:encoded><![CDATA[<p><em>This is day 19 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data</em></p>
<p>"Yesterday":http://blog.whatfettle.com/2014/11/20/one-csv-thirty-stories-18-choropleth/ I made a simple choropleth map of average prices. Today I wanted to iterate on this hack. Once again this took me longer than expected, this time because I didn't like the results.</p>
<p>First-off it was a little remiss of me not to call out one of the design decisions in yesterday's post. The colours are scaled across the entire range of yearly maps, illustrating how house prices have hotted up over twenty years. There is an alternative to scale the prices within each year to show how the distribution of prices have moved over twenty years: </p>
<p><a href="https://www.flickr.com/photos/psd/15233560543" title="Hotting-up"><img decoding="async" loading="lazy" src="https://farm9.staticflickr.com/8587/15233560543_55268933ff_c.jpg" width="800" height="252" alt="Hotting-up"></a></p>
<p>I wondered about changing the squares to match the Land Registry's marvellously retro logo:</p>
<p><img decoding="async" src="http://upload.wikimedia.org/wikipedia/en/1/12/HM_Land_Registry.png"></p>
<p>This wasn't too tricky thanks to the "CSS tricks":http://jtauber.github.io/articles/css-hexagon.html outlined by "James Tauber":http://jtauber.com/ which uses adjacent blocks with enlarged boarders to create a mesh of hexagonal divs which tessellate across a plane:</p>
<p><a href="https://www.flickr.com/photos/psd/15853161675" title="Atomic cauliflowers"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7503/15853161675_64f16c1777_c.jpg" width="565" height="800" alt="Atomic cauliflowers"></a></p>
<p>Using the grid values with these shapes was a bit of cheat; I really should have recalculated the averages based on the geometry of each hexagon, and worked harder to make them work in any browser beyond Firefox and Chrome, but this experiment was enough to convince me I really didn't like the look of where the hack was heading. Hexagons are just not my bag, unless I'm playing Settlers of Catan:.</p>
<p><a href="https://www.flickr.com/photos/psd/1175737778" title="Settlers of Catan by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm2.staticflickr.com/1233/1175737778_2682a7af16_z.jpg" width="640" height="274" alt="Settlers of Catan"></a></p>
<p>So I decided to try a different tack and experimented with turning each square div into a circle using a single line of CSS:</p>
<p>bc. .circle { border-radius: 50% }</p>
<p>I then resized each div to show both the average price and number of transactions for each postcode:</p>
<p><a href="https://www.flickr.com/photos/psd/15868805122" title="Blobs"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7500/15868805122_041255ea76_c.jpg" width="713" height="800" alt="Blobs"></a></p>
<p>This looked more promising, but not great, so I played quite a bit, experimenting with the size, shape and colour of the bubbles:</p>
<p><a href="https://www.flickr.com/photos/psd/15681656470" title="Futzing"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7569/15681656470_349021b762_c.jpg" width="800" height="314" alt="Futzing"></a></p>
<p>The biggest difficulty was finding a way of revealing the map, illustrating the massive difference in the price-paid and number of transactions within London as opposed to the immediately surrounding area. A logarithmic scale might have helped, but in the end I settled on spheres, which meant taking the cubed-root of the number of transactions at each postcode and applying a small amount of border-shadow on each sphere:</p>
<p><a href="https://www.flickr.com/photos/psd/15869121901" title="CSS spheres"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7480/15869121901_150a20721e_c.jpg" width="800" height="517" alt="CSS spheres"></a></p>
<p>I continued to try, but couldn't get this visualisation to work. I elected to make the spheres transparent, but that created darker colours when bubbles overlap, which say nothing about the price at that location, and blurs both the discrepancy in the number of transactions and the price which can vary greatly in adjacent postcodes:</p>
<p><a href="https://www.flickr.com/photos/psd/15685185209" title="pricegridtx"><img decoding="async" loading="lazy" src="https://farm9.staticflickr.com/8616/15685185209_56802393a3_c.jpg" width="566" height="800" alt="pricegridtx"></a></p>
<p>And, as mentioned previously, transparency and gradients don't work well in postscript, making the resultant PDFs large and unprintable. So I spent even more time futzing with this page, trying to flog a dead-tree, to no avail:</p>
<p><a href="https://www.flickr.com/photos/psd/15875557435" title="pricegridtx2 by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7540/15875557435_8a0b9f21b7_c.jpg" width="565" height="800" alt="pricegridtx2"></a></p>
<p>Literally the bottom line: today I iterated wildly, but failed to improve on "yesterday":http://blog.whatfettle.com/2014/11/20/one-csv-thirty-stories-18-choropleth/. I should probably move along, but I've still at least one more idea I want to try out with this data "tomorrow":http://blog.whatfettle.com/2014/11/27/one-csv-thirty-stories-20-unknown-prices/.</p>
]]></content:encoded>
</item>
<item>
<title>One CSV, thirty stories: 18. Choropleth</title>
<link>http://blog.whatfettle.com/2014/11/20/one-csv-thirty-stories-18-choropleth/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Thu, 20 Nov 2014 22:39:43 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1207</guid>
<description><![CDATA[This is day 18 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data Following on from "yesterday":http://blog.whatfettle.com/2014/11/17/one-csv-thirty-stories-17-scattermap-calendar/ I wanted to create a "choropleth map":http://en.wikipedia.org/wiki/Choropleth_map to show how prices are […]]]></description>
<content:encoded><![CDATA[<p><em>This is day 18 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data</em></p>
<p>Following on from "yesterday":http://blog.whatfettle.com/2014/11/17/one-csv-thirty-stories-17-scattermap-calendar/ I wanted to create a "choropleth map":http://en.wikipedia.org/wiki/Choropleth_map to show how prices are distributed across the country. A number of people have constructed shapefiles for postcodes which can be used in "d3":https://github.com/roblascelles/uk-postcode-map/wiki/Displaying-the-data but as discussed on "day 13":http://blog.whatfettle.com/2014/11/06/one-csv-thirty-stories-13-postcodes/ the licensing of this data isn't clear.</p>
<p>So I wrote a small "Perl script":https://github.com/psd/price-paid-data/blob/master/bin/pricegrid.pl to use the points in the "OS OpenData™ Code-Point® Open":http://www.ordnancesurvey.co.uk/business-and-government/products/opendata-products.html dataset to place each price into one of 1024 squares on a 32x32 grid, then a used a small "PHP template":https://github.com/psd/price-paid-data/blob/master/bin/pricegrid.php to present the average price of each square as a coloured grid on an HTML page. Re-running the script for each year also shows how property prices have heated up over time:</p>
<p><a href="https://www.flickr.com/photos/psd/15812904406" title="pricegrid"><img decoding="async" loading="lazy" src="https://farm9.staticflickr.com/8273/15812904406_ac460f0ac5_c.jpg" width="554" height="800" alt="pricegrid"></a></p>
<p>["PDF":https://github.com/psd/price-paid-data/blob/master/posters/pricegrid.pdf]</p>
<p>A choropleth map made without a map. I've an idea about iterating on this hack for "tomorrow":http://blog.whatfettle.com/2014/11/25/one-csv-thirty-stories-19-bubblepleth/.</p>
]]></content:encoded>
</item>
<item>
<title>One CSV, thirty stories: 17. Scattermap Calendar</title>
<link>http://blog.whatfettle.com/2014/11/17/one-csv-thirty-stories-17-scattermap-calendar/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Mon, 17 Nov 2014 16:27:09 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1196</guid>
<description><![CDATA[This is day 17 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data The "previous post":http://blog.whatfettle.com/2014/11/11/one-csv-thirty-stories-16-mapination/ resulted in a scatter map of the property transactions in the price-paid dataset. […]]]></description>
<content:encoded><![CDATA[<p><em>This is day 17 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data</em></p>
<p>The "previous post":http://blog.whatfettle.com/2014/11/11/one-csv-thirty-stories-16-mapination/ resulted in a scatter map of the property transactions in the price-paid dataset. Having made the images, I thought it would be a simple matter to string them into a single page to make a calendar of transactions per-postcode for each day.</p>
<p><a href="https://www.flickr.com/photos/psd/15805520555" title="Volume of Land Registry Transactions 1995-2014 by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm9.staticflickr.com/8661/15805520555_6dcc380b3e_c.jpg" width="800" height="565" alt="Volume of Land Registry Transactions 1995-2014"></a></p>
<p>["75MB A1 PDF":https://github.com/psd/price-paid-data/blob/master/posters/scattermap-calendar.pdf]</p>
<p>It turned out the plot took me a lot longer to develop than expected. As much as I like this poster, in particular the techniques used for its production, I am now several days older and a more than a little wiser.</p>
<p>I should start by thanking everyone who completed my "short, anonymous survey":http://tinyurl.com/1csv-30stories-survey or "contacted me":http://blog.whatfettle.com/about/ about the series thus far. I'll incorporate people's suggestions into the hacks and give a wrap-up of the survey in the final post. Your feedback is really helping keep me motivated, even when I've been blocked, and guilt-tripping me into not kittening and dropping this project in favour of shiny new ventures, or indeed playing with our new kittens.</p>
<p><a href="https://www.flickr.com/photos/psd/15589998268" title="Blame @matwall by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm6.staticflickr.com/5610/15589998268_30627a33bc_c.jpg" width="800" height="528" alt="Blame @matwall"></a></p>
<p>One actionable suggestion came from my brilliant colleague, "Anna":http://anna.ps, who nudged me to change the scatter maps from plotting the radius to the area. That was a simple matter of using the square-root of the value when scaling the circle for each postcode, and removed some of the splodgy outliers in the daily plots.</p>
<p>A question a number of people have asked is what happens when the Land Registry realise more data in a month or so's time? All of the hacks thus far have been built using a "Makefile":https://github.com/psd/price-paid-data/blob/master/Makefile which downloads the dataset from GOV.UK, re-runs all of the scripts and recreates all the images. All apart from the hackday posters, which "Michael":https://twitter.com/mikiee_t built by hand, applying his lovely design fu in Adobe InDesign. Whilst it might be possible to script and automate InDesign, I'm disinclined to have to fathom how to manage Adobe software licences on my machine, and it doesn't help others who can't afford such luxuries as professional publishing tools.</p>
<p>So I turned to recreating the posters as HTML, and used the browser to design and composite the page. There are a lot of different ways of generating HTML from our data, but I wanted to use a "templating language":http://en.wikipedia.org/wiki/Comparison_of_web_template_engines for which there's again a lot of options. I wrote a "PHP script":https://github.com/psd/price-paid-data/blob/master/bin/scattermap-calendar.php, mostly because selecting something like Ruby or Python means using yet another language for the templating, but mostly because PHP is ubiquitous. There's a certain amount of snobbery about PHP, but I've no truck for such snark. PHP is the Web's BASIC. It's everywhere, beginner-friendly, and I like it.</p>
<p>This approach seemed hopeful, but then I hit an issue. Chrome in particular struggled, flaking out where there were more than 5,000 images on a page. I experimented with ways of reducing the number of individual files, including making a single, massive image file, and only showing weekdays, but neither really helped. So I turned to tiling the images, and displaying a portion of each image in each square using <a href="http://en.wikipedia.org/wiki/Sprite_(computer_graphics)#Sprites_by_CSS">CSS sprites</a>. This is where much of my time was lost. Front-end development remains a tricky craft, and creating a page of 7,034 tiles on a grid using responsive images was a bit of a faff. Getting these images to scale and inside table cells was beyond my skill. I really should have grabbed one of my many front-end specialist friends to help, but after a Swan load of failed attempts, I landed on something that worked: a series of divs, each with a spacer image:</p>
<p>bc. .day {<br />
float: left;<br />
overflow: hidden;<br />
height: 5mm;<br />
max-width: 5mm;<br />
}<br />
...</p>
<div class='day y2007 m02 d01 _S'>
<img class='spacer' src='mapination/blank.gif' title='2007-02-01'><br />
<img class='sprite' src='mapination/sprites-2007-02.gif' style='left:0%' title='2007-02-01'>
</div>
<div class='day y2007 m02 d02 _F'>
<img class='spacer' src='mapination/blank.gif' title='2007-02-02'><br />
<img class='sprite' src='mapination/sprites-2007-02.gif' style='left:-100%' title='2007-02-02'>
</div>
<div class='day y2007 m02 d03 _S'>
<img class='spacer' src='mapination/blank.gif' title='2007-02-03'><br />
<img class='sprite' src='mapination/sprites-2007-02.gif' style='left:-200%' title='2007-02-03'>
</div>
<p>...</p>
<p>Finally, printing such a large page reliably crashed Chrome. Fortunately the "wkhtmltopdf":http://wkhtmltopdf.org/ command can be used to generate a PDF from HTML, works reliably for our page and can be driven from a Makefile:</p>
<p>bc. posters/scattermap-calendar.pdf: html/scattermap-calendar.html<br />
wkhtmltopdf -q --page-size a1 --orientation landscape html/scattermap-calendar.html $@</p>
<p>I have to continue to learn how to deal with getting stuck like this if I'm to regain any kind of momentum on this venture. Let's see how well I do "tomorrow":http://blog.whatfettle.com/2014/11/20/one-csv-thirty-stories-18-choropleth/.</p>
]]></content:encoded>
</item>
<item>
<title>One CSV, thirty stories: 16. Mapination</title>
<link>http://blog.whatfettle.com/2014/11/11/one-csv-thirty-stories-16-mapination/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Mon, 10 Nov 2014 23:04:54 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1187</guid>
<description><![CDATA[This is day 16 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data "Yesterday":http://blog.whatfettle.com/2014/11/07/one-csv-thirty-stories-15-hotspots/ we made a map with the total volume of transactions over 20 years. I wanted […]]]></description>
<content:encoded><![CDATA[<p><em>This is day 16 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data</em></p>
<p>"Yesterday":http://blog.whatfettle.com/2014/11/07/one-csv-thirty-stories-15-hotspots/ we made a map with the total volume of transactions over 20 years. I wanted to see how that distribution changed over time. A spot of knife-and-forking:</p>
<p>bc. cut -d'⋯' -f2,3 data/pp.tsv |<br />
sed -e 's/ //' |<br />
awk '$2' |<br />
sort |<br />
uniq -c |<br />
sort -rn |<br />
sed -e 's/^ *//' -e 's/ */⋯/' -e 's/ *$//'<br />
sort -k2 > daily-postcodes.tsv</p>
<p>gives a count of the number of transactions for each postcode on each date:</p>
<p>bc. 1⋯1995-01-01⋯B297NS<br />
1⋯1995-01-01⋯B315DF<br />
1⋯1995-01-01⋯B458LY<br />
1⋯1995-01-01⋯BB99RQ<br />
1⋯1995-01-01⋯BS110JH<br />
1⋯1995-01-01⋯BS16XF<br />
1⋯1995-01-01⋯BS81BY<br />
1⋯1995-01-01⋯CA119JD<br />
1⋯1995-01-01⋯CO70BZ<br />
1⋯1995-01-01⋯CR35SU<br />
...</p>
<p>To make a version of the map, one for each of the days in the price-paid dataset requires an awfully large number of postcode lookups. Rather than sorting and merging individual files 7181 times I elected to write some code to read the OS OpenData™ Code-Point® into a Perl hash table:</p>
<p>bc. my $geocodes = "data/codepo_gb.tsv";<br />
my %postcode = ();<br />
open my $file, "<", $geocodes or die "unable to open $geocodes";
while (my $line = <$file>) {<br />
my ($postcode, $easting, $northing) = split /\t/, $line;<br />
$postcode{$postcode} = { easting => $easting, northing => $northing };<br />
}</p>
<p>Which we can use to look-up the easting and northing to draw a circle for each postcode:</p>
<p>bc. my $c = $postcode{$p->{postcode}};<br />
my $x = $width * $c->{easting} / $max_easting;<br />
my $y = $height - ($height * $c->{northing} / $max_northing);<br />
my $size = $p->{count};<br />
printf($fp "circle %d,%d,%d,%d\n", $x, $y, $x+$size, $y+$size);</p>
<p>The complete script generates an individual image for each day, here's the 365 images for 2007:</p>
<p><a href="https://www.flickr.com/photos/psd/15759722375" title="Daily volume of Land Registry transactions by postcode 2007"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7475/15759722375_4812e88a43_z.jpg" width="640" height="394" alt="Daily volume of Land Registry transactions by postcode 2007"></a></p>
<p>Stringing these individual images using "gifsicle":http://www.lcdf.org/gifsicle/man.html gives a rather large animated gif:</p>
<p><a href="https://github.com/psd/price-paid-data/blob/master/out/mapination/daily-2012.gif"><img decoding="async" src="https://raw.githubusercontent.com/psd/price-paid-data/master/out/mapination/daily-2012.gif"></a></p>
<p>If you are minded, you can upload these to "gifprint.com":http://gifprint.com/ to make a flip book. A partial success:</p>
<p><img decoding="async" src="https://raw.githubusercontent.com/psd/price-paid-data/master/out/flipbook.gif"></p>
<p>The daily images are quite noisy and should benefit from some polishing, so more iteration on this hack "tomorrow":http://blog.whatfettle.com/2014/11/17/one-csv-thirty-stories-17-scattermap-calendar/.</p>
]]></content:encoded>
</item>
<item>
<title>One CSV, thirty stories: 15. Hotspots</title>
<link>http://blog.whatfettle.com/2014/11/07/one-csv-thirty-stories-15-hotspots/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Fri, 07 Nov 2014 14:20:59 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1180</guid>
<description><![CDATA[This is day 15 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data Following on from "yesterday":http://blog.whatfettle.com/2014/11/07/one-csv-thirty-stories-14-hackday/'s hackday I had a spot of breakfast with "Michael":https://twitter.com/mikiee_t. I tweaked the […]]]></description>
<content:encoded><![CDATA[<p><em>This is day 15 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data</em></p>
<p>Following on from "yesterday":http://blog.whatfettle.com/2014/11/07/one-csv-thirty-stories-14-hackday/'s hackday I had a spot of breakfast with "Michael":https://twitter.com/mikiee_t. I tweaked the map from "Day 13":http://blog.whatfettle.com/2014/11/06/one-csv-thirty-stories-13-postcodes/ to "scale the size of each postcode":https://github.com/psd/price-paid-data/commit/38bedba7c6d07d8397380780a1a62d81c48c2236 using the total volume of transactions over 20 years:</p>
<p><a href="https://www.flickr.com/photos/psd/15545225717" title="Volume Scattermap by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm8.staticflickr.com/7491/15545225717_8c021454fb_c.jpg" width="652" height="800" alt="Volume Scattermap"></a></p>
<p>Michael quickly turned this image into a second poster [<a href="https://github.com/LandRegistry/hackday/raw/master/price-paid-data-scattermap.pdf">PDF</a>] before dashing to the station.</p>
<p>I'm cock-a-hoop how this image has turned out. The detail looks like bacteria on a petri-dish, but zoomed out it's apparent not just where people live, but where people buy and sell houses the most, with the coast of East Anglia, Eastbourne and Cornwall darker than you'd expect. I can guess this may be due to turnover of retirement homes or holiday cottages but it's plain to see there are many more interesting stories begging to be discovered from this simple map weighted with other open data from the likes of the "ONS":http://www.ons.gov.uk/.</p>
<p>This afternoon I'm on a train travelling back from Plymouth which means several hours "without access to the interwebs":en.wikipedia.org/wiki/Communications_blackout — an ideal opportunity to "iterate this hack":http://blog.whatfettle.com/2014/11/11/one-csv-thirty-stories-16-mapination/.</p>
]]></content:encoded>
</item>
<item>
<title>One CSV, thirty stories: 14. Hackday</title>
<link>http://blog.whatfettle.com/2014/11/07/one-csv-thirty-stories-14-hackday/</link>
<dc:creator><![CDATA[Paul Downey]]></dc:creator>
<pubDate>Thu, 06 Nov 2014 23:47:19 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://blog.whatfettle.com/?p=1174</guid>
<description><![CDATA[This is day 14 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data "Yesterday":http://blog.whatfettle.com/2014/11/04/one-csv-thirty-stories-12-stacked/ I started to look into where all the houses listed inside the CSV are located, […]]]></description>
<content:encoded><![CDATA[<p><em>This is day 14 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data</em></p>
<p>"Yesterday":http://blog.whatfettle.com/2014/11/04/one-csv-thirty-stories-12-stacked/ I started to look into where all the houses listed inside the CSV are located, but today I was down in Plymouth to help run an internal "Land Registry Hackday":https://github.com/landregistry/hackday</p>
<p><img decoding="async" src="https://raw.githubusercontent.com/LandRegistry/hackday/master/poster.png"></p>
<p>I think it was a great event, with some amazing hacks presented. My two favourite hacks had multidisciplinary teams build something, try it with users, realise it was the wrong thing, so built something better as a result. All in a single day!</p>
<p><a href="https://www.flickr.com/photos/psd/15538436767" title="Let the hacking begin! by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm4.staticflickr.com/3952/15538436767_7889cab7b1_c.jpg" width="800" height="279" alt="Let the hacking begin!"></a></p>
<p>For my part I tried to help one of the hacks consume the "Land Registry INSPIRE index polygons":https://www.gov.uk/inspire-index-polygons-spatial-data by converting the "GML":http://en.wikipedia.org/wiki/Geography_Markup_Language format shapefiles into "TopoJSON":http://en.wikipedia.org/wiki/GeoJSON#TopoJSON files. Unfortunately this took a lot of compute time, and both "topojson":https://github.com/mbostock/topojson and "mapshaper":https://github.com/mbloch/mapshaper kept running out of memory on my huge EC2 machine. You can see the code and some of the polygons in "github.com/psd/landregistry-inspire-data":https://github.com/psd/landregistry-inspire-data</p>
<p><a href="https://www.flickr.com/photos/psd/15726079761" title="City of London topojson by Paul Downey, on Flickr"><img decoding="async" loading="lazy" src="https://farm6.staticflickr.com/5614/15726079761_4b42b4b67c_c.jpg" width="800" height="480" alt="City of London topojson"></a></p>
<p>Close to the final show and tell I paired up with "Michael":https://twitter.com/mikiee_t and we turned the heatmap from "Day 8":http://blog.whatfettle.com/2014/10/25/one-csv-thirty-stories-8-heatmap-meh/ into a poster which we framed and presented to the Land Registry, to be hung on their canteen wall:</p>
<p><a href="https://github.com/LandRegistry/hackday/raw/master/price-paid-data-heatmap.pdf"><img decoding="async" loading="lazy" src="http://farm8.staticflickr.com/7553/15108565183_9f04cfc2a9_c.jpg" width="800" height="534" alt="Poster"></a></p>
<p>[<a href="https://github.com/LandRegistry/hackday/raw/master/price-paid-data-heatmap.pdf">PDF</a>]</p>
<p>Tomorrow I hope to make another, similar poster based on the geographical data.</p>
]]></content:encoded>
</item>
</channel>
</rss>
If you would like to create a banner that links to this page (i.e. this validation result), do the following:
Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src
attribute if necessary):
If you would like to create a text link instead, here is the URL you can use:
http://www.feedvalidator.org/check.cgi?url=http%3A//blog.whatfettle.com/feed/