This story was originally published on my blog on June 19th, 2009. I’m moving it here for posterity.
In the spring, I was asked by Wired UK if I would be interested in producing something for the two-page ‘infoporn’ spread that runs in every issue. They had seen my experimentations with the NYTimes APIs, and were interested in the idea of non-conventional data visualizations. After a bit of research, I proposed an piece about the UK’s National DNA Database. It was a subject that interested me and I felt that there would be some interesting political territory to cover. Luckily, Wired agreed.
By searching through Parliamentary minutes, and sifting over annual reports, I was able to put together a fair amount of information about the NDNAD and I settled on a few key points that I wanted to convey with the piece. First, I wanted to somehow demonstrate how large the database is — with over 4.5M individuals profiled, it’s the largest DNA database in the world. It holds profiles for more than 7% of the UK’s population. As well as the size of the database, I wanted to show how it broke down — in racial groups, in age groups, and in terms of those who have been charged versus those who are ‘innocent’. Finally, I wanted to talk about the difference between the UK’s population demographics and the demographics represented by the profiles in the NDNAD.
The central graphic, then, is a DNA strand with one dot for each of the profiles in the database — more than 5M! Of course, I didn’t do this by hand. I wrote a program in Processing that would generate a single, continuous strand that filled up a certain size area. I was inspired by electron microscope images that I had seen of real DNA in which it looks like a loop of thread:
The nice looping threads were rendered using Perlin noise — I had a few parameters inside the program which allowed me to control how ‘messy’ the tangle became, and how much variation in thickness each strand had. While I was at it, I colour-coded each DNA dot to indicate the database’s ethnic breakdown. The result was a giant tangle, which was pretty much exactly what I wanted:
Here, you can see the individual dots, and the colour breakdown:
The next step was to break down the big tangle into three parts — one representing the bulk of the database, one representing the 948,535 profiles that were taken from people under the age of 18, and one representing the ~500,000 profiles from people who had never been charged, convicted, or warned by police. The original image had a static centre-point for the DNA loop; to break the tangle apart, I modified the program so that the centrepoint could move to pre-determined points once certain counts had been reached. The final graphic changes centre-points three times. What was nice about this set-up what that it was easy to move and adjust the positioning of the graphic to fit the page layout. Rendering out a new version of the main image took just a few minutes.
Working with these kinds of generative strategies meant that I could explore many variations. As you can see from the graphics posted here, I went through a variety of compositional and colour changes, all of which were relatively painless. Using Processing, I built a mini-application whose entire purpose was to create these DNA systems. I also built a second min-app, which rendered out a set of pie-charts that were used to display related information along with the main graphic in the spread. I wanted these pie charts to fit in visually with the main graphic, so I created a very simple sketch to output charts from any set of data:
There ended up being 11 of these little pie-charts that accompanied the main graphic. Again, by building tools, I was able to do some interesting things, while at the same time avoiding large amounts of manual labour. Just how I like it! You can see the final result in the image at the top of this post, and of course, in Wired UK — the July issue hit newsstands a couple of weeks ago. If you are in the UK, go out and buy a copy!