Lecture notes on internet method N°1743

It seems one of the key questions in the contemporary philosophy of science: now that it is possible to share the data that underlie research studies, is there any reason not to? If a scientific idea is one that can be refuted or replicated, how can a scientist claim the right to hide methods and workings? Of course, before the internet and electronic data collection, such audit and sharing may have been notional. But if some keen German postgrad wrote to the department explaining how perusing an archive of old data might be useful to her, one imagines that, bona fides established, academic etiquette would have been to allow access. How pleasant to welcome a visiting colleague! And now that the internet is here, we can welcome colleagues from all over the world, right? Well, it's not so simple, as today's excellent seminar hosted by the University of Sussex Research Hive made clear. Louise Corti, a director at the UK Data Archive, sensibly bypassed the difficult ethical considerations that often underlie making data publicly available, and got on to that other substantial matter: the practicalities of actually doing it [3.1MB PDF] for your datasets. Many are the benefits; many are also the pitfalls. One has the impression of best practice in IT and archivism filtering out more widely into the body of researchers, as they avoid proprietary formats and adopt sensible file-naming conventions.

Janet Boddy's talk was more challenging, addressing as she did, the vexèd question of, given it can be done, should it be done? She's a social researcher who's worked with many vulnerable groups, and is also something of a research ethics guru (she is ambivalent about describing herself as an ethics "expert"). Thus is she fully conscious of all of the difficulties implicit in making data derived from personal interviews ("disclosive data") world-readable; she recalled her formative days in the not-so-distant past when destruction of the data at the conclusion of any study would be the norm. But she has also been greatly influenced by her pleasure and privilege in returning to Townsend's marginalia, revisiting his classic studies of poverty.

So there are no easy answers: citing Mauther (2012) that 'research works because respondents trust us,' Boddy pointed out that even the issue of consent for release of data was fraught with difficulty. For example, in a qualitative study where an interview transcript will form part of the raw data of the study, should consent for its release be sought as part of the overall consent prior to agreement to participate in the study? Or should the participant be shown their transcript before giving consent? One cannot even assume that the participant would wish for the data to be anonymous: the example of the proud oral history interviewee devoutly wishing that their name be attached to the record for posterity was posited. All this is tricky indeed: will an "extrovert bias" result from research participants having a post hoc right to censor themselves from a research dataset?

Of course ethical consideration must proceed any scientific act, but my own intervention, in the discussion after the talks, raised the question as to whether data that cannot be fully published are indeed scientific? What considerations of reproducibility and fidelity apply if a dataset is to be sealed from public view, or even destroyed forever? We are all too conscious of the data burial/publication bias problem in the pharmaceutical industry. Scientific fraud and research misconduct are well-recognised problems. Surely greater transparency of method, and opening datasets to outside scrutiny will be part of the solution?

A simple response from Boddy: since post-modernism struck, the concept of a "social scientist" is a highly doubtful one, and that ethical practice must be "reflexive, situated, negotiated." Peter Moss's notions of rigorous subjectivity make him the hep cat to quote apparently.

I found all this fair enough (I recalled wiping my consultation analysis tapes, for example), though I remain sceptical that the researchers formerly known as social scientists do not claim special privilege for their writings, which are, after all the result of method, and serious attempts to be thorough and avoid the worst pitfalls of all-too-human sketchiness and the jaundiced eye of prejudice...

But yeh, it made me realised that I've been charging along with internet now for the best part of twenty years, and the radical openness that is commonplace online is certainly meeting resistance in the social sciences, just as it does (rightly) in medicine. "Post it on the net and let the net decide!" may embody a certain hip hivemind egolessness, but it's hardly an archiving strategy; it's hardly even a claim to be a grown-up.  But in the era of silver surfers, ubiquitous smartphones, and pretty much universal network access, I bet research participants are more clued-up about the implications of data publication than researchers give them credit for.

After the formal part of the seminar I had a wee chat with the IT services guys at the back, who, once the fretful intellectuals have agonised sufficiently, must keep lights on and hard disk platters spinning. And backups. Tested backups. Backups in fireproof cabinets. Offsite backups. Backups! This costs £1400/TB/yr apparently: you can put it in your grant proposal. The hard sciences aren't holding back: the physicists here are about to take possession of 160TB from Manchester apparently. And, just as a CD in the post used to be quicker than dialup, so is it still quicker to stick a server in a van up the M6 to go and get it, than max out the university's bandwidth for months on end. The biologists have just booked 4TB for 200 Drosophila genomes. Moar! At least I hope we can all agree that Drosophila are so stupid and shortlived that they could care less who, what, where, how and why someone is hosting their data. Humans, naturally, are more tricky beasts. If you care to study them—and publish the results—some closely argued rhetoric for the ethics committee will be required, before working through a significant list of technical chores. Which is all as it should be, I suppose.

