Information Overload Part 2
Friday, January 16, 2009 at 9:55 am UTC by David Crotty permalink
Given some of the comment reactions from my last posting, perhaps I wasn’t clear enough in what I was trying to say, so a bit more here. As many have pointed out, scientists have long been bombarded with large amounts of potentially useful information, and have developed a sophisticated set of filters to deal with it, both on and offline. That’s not the issue. The issue is that, due to the exponential growth in the amount of research being done and published, even with highly effective filters that eliminate everything extraneous, one is often still left with more information than can be dealt with in a reasonable amount of time. Let me try to explain with a hypothetical example:
I’m a professor at University X. I have a busy schedule, between doing my own bench research, writing grants, managing my students/postdocs and my faculty duties. I have time in my schedule to read (choosing this number randomly) 10 papers a week in depth for a full understanding. 25 years ago, this was fine. The filters I had built pointed me toward 4 quality papers a week directly relevant to my research, and this allowed me to read 6 other papers in other fields. I had complete knowledge of the important work in my own field, plus a good working knowledge of many other fields that could be applied to my own. Fast forward to today, using even better filters, including Connotea, Digg, Science Blogs, what-have-you, I am now pointed toward 12 quality papers a week directly relevant to my research. This is not a filter failure–my filters are better than ever. They’re discarding more than ever before. But the quantity of research published has increased so much that even with more powerful filters, there’s more directly relevant information out there that I need to take in. I have no time for papers outside of my own field, not even enough time for the papers within my field.
That’s what most scientists I know mean by “information overload”. They’re filtering like crazy, but due to the exponential growth in research and journals, there’s more knowledge to assimilate. The solutions available seem to be:
1) Specialization–this is basically the answer I’m being given by those who just say that we merely need to improve our filters and eliminate more material. Doing so means a shallower knowledge of our own field, and a much shallower knowledge of other fields. This is not good for science, and seems contradictory to the cross-disciplinary world that science has become, where the skill set required is much bigger than ever. The more one filters, the more one narrows one’s focus.
2) Spend more time with the literature–this seems to be the approach most scientists are taking, and other parts of their careers and lives are suffering for it. Either their students, their universities or their families end up neglected.
Yes, it’s true, as AJ Cann notes, “every scientist since Aristotle has suffered from information overload,” but the quantity of that overload has grown exponentially. It’s one thing to follow the dozens of labs doing molecular biology in the late 1950’s, it’s another to follow the tens of thousands (if not hundreds) of molecular biology labs today. At some point, even the most sophisticated filters become overwhelmed, or at least they return more information than one can read without sacrificing elsewhere. And many are finding this frustrating, finding that it takes away from some other part of their research/lives. Solving the problem with more filters just means more specialization, which is also a sacrifice, and a way toward doing less important, less interesting science.
Posted in Online Tools, Science Publishing, Social Software, Web 2.0 | 8 Comments »
RSS feed for comments on this post. | TrackBack URI
Add to:
Del.icio.us
Digg
Technorati
Blinklist
Furl
reddit

Friday, January 16, 2009 at 10:38 am UTC
Here, I’m going to contend that the amount of high quality (=essential) research has not increased significantly since Aristotle’s day. Arguably, it has declined. How many “me too” papers do you need to read? How many mutations in how many amino acids do you need to spend half an hour reading about (as opposed to pulling the information from a database)?
I can see three alternatives:
1. Give up science and shut yourself away in a cave.
2. Fail.
3. Filter. Filter more, filter smarter.
I’m not suggesting we have all the tools we could use for filtering at the present time, but I am saying that filtering is the ONLY way forward, unless the system changes to stop rewarding people for publishing low impact rubbish. Maybe that’s the answer, maybe we should only be allowed to publish one paper (of specified length) a year (one a career)?
Friday, January 16, 2009 at 10:50 am UTC
Wow. I have to vehemently disagree with that statement. That implies that there is only a constant, limited number of people on earth capable of doing quality research, regardless of the size of the population. That seems ridiculous. Let me put it to you this way–when Nobel Prize winners Andrew Fire and Craig Mello first started working on RNAi, it was easy for them to keep up with all the research on RNAi–there were very few people working on this unproven phenomenon. Once RNAi was established as an incredibly useful technique and interesting biological process, the number of labs working on it increased exponentially. Are you saying that even with this increase, the number of quality results being derived in RNAi research remains exactly the same as it was before the field was fully established? Or are there more quality results to track?
Those are the papers you were already filtering out. Now what do you do with the stack of “non-me too” papers? Again, if you really believe there’s the exact same amount of good science being done today as there was 30 years ago, there’s no point in continuing this conversation. If you accept that more researchers means more total results, and a similar percentage as always are going to be of high quality, then that means a greater amount to assimilate.
Friday, January 16, 2009 at 11:51 am UTC
Both your posts seem pretty spot-on to me, David. I thoroughly enjoyed reading both, and I think you make a very good point about why so few scientists read science blogs.
Friday, January 16, 2009 at 12:20 pm UTC
I still feel you’re conflating volume and quality of research.
Friday, January 16, 2009 at 12:31 pm UTC
While they’re not the same thing, there is a relationship between the two. If there are only 10 labs in the world doing research on subject X, and only 2 of them produce quality results, must that number stay static if the field expands and there are now 10,000 labs working in that field? Is it possible that more than 2 labs could be doing quality work? What about all the students and postdocs who came out of those 2 labs over the years? Could they add to the volume of high quality research being published? Or is there an absolutely constant limited number of quality results that can be produced, regardless of the size of a field?
My argument would be that a small percentage of researchers in any given field produce high quality results. As the field increases, that percentage may remain constant, or even decrease somewhat, but that still results in a larger quantity of good results produced. Which means more to read.
Friday, January 16, 2009 at 4:08 pm UTC
Thanks Maxine. I read a lot of science blogs, but then again, keeping up with the zeitgeist of science and watching how new technologies are used is an important part of my job. These things are obviously going to be lower priorities to doing research for those who are actively at the bench.
Sunday, January 18, 2009 at 2:11 pm UTC
David, I think in this follow up post you’ve actually made a more sophisticated and science specific re-statement of Shirky’s case at least as I (thought I) understood it. Where you are differing is on the consequences of his conclusion if I’m reading you and him right? He sees it mainly as a technical problem to be solved (build better filters) whereas you are concerned about the consequences of applying strong er filters
Any researcher has a limited amount of time to read papers – and we can for convenience sake agree that for many researchers, even for the papers they have to be well acquainted with, we have reached saturation. So we either have to give up, or improve filters. It seems there are really only four options:
1. Publish less (simply reduce the amount of science being done until we can cope)
2. Publish smarter – by which I mean make the medium appropriate for the information that people need to get at so they can get at it more efficiently. I’m thinking mainly of mechanisms for e.g. data publication here – I don’t need to read a paper if all I want is a single number, or a single data file, or the details of a method.
3. Filter harder and finer – become more specialized – and I agree this has potentially bad consequences – or at least would require changes in the way the research effort is managed.
4. Distribute the reading process amongst a trusted network and re-aggregate the information in summarised form. Do you really need to personally read all of those papers? Could you get away with an executive summary for, say 50% of them?
Monday, January 19, 2009 at 1:45 pm UTC
Cameron–
I think the comments helped me to clarify my thinking a bit.
You’re generally on the right track here. I think the big difference is that someone like Shirky would assume that better filters would solve the problem, whereas I don’t agree. If the problem is defined as an inability to take in the necessary information, and more filtering means throwing out vital information, then no, it’s not a solution. That’s more of a surrender than a solution.
I think these two are intertwined in some ways. Publishing less is probably not a practical option, given the business models of the vast and expanding number of journals out there (they have to publish something in order to exist), and the need for an expanding number of scientists to have concrete markers of their research achievements (to push for graduation, tenure, grants, etc). Limiting the number of publications would hurt the open access/reader pays type journals the most–numbers so far show that they need to publish higher quantities of work with less editorial overhead in order to be financially viable (although for papers that are just indexed data as you suggest, open access would be a great fit).
The semantic technological improvements many are predicting would also hopefully create efficiencies in taking in the literature.
Probably my main point here–more filtering means less-rounded knowledge. Some of the most interesting science is done cross-disciplinary, and lessons learned in one area can be highly valuable in another. Cutting yourself off is a bad move.
While I’d agree, the problem is that you can’t know in advance which 50% prove to be important. Sometimes you won’t know for years. When you read paper B, you put that together with the results of paper A that you read 2 years ago and you’re onto something new and unique. The devil is often in the details, and what one scientist reads into a paper is very different from what another reads. With a background in developmental biology, I would read something very different from the first GFP papers (a vital dye for studying cell migration) than would someone like Roger Tsien (a means of measuring protein:protein interactions via FRET). One key to your comment is the phrase “trusted network”. We need to think more in terms of the groups where scientists work, labs, collaborators and departments, than in the free-for-all worldwide sorts of networks so many are coming up with. If you’re working on that level, online exchanges of information aren’t a big step beyond what scientists already do with lab meetings, journal clubs, etc. They’re just a different presentation method, albeit perhaps a more efficient one. There’s a freedom to speak openly and critically in those small private groups that is lost when they become open public forums. It’s unclear if that small gain in efficiency though, is enough to overcome the current expansion of the literature.
There’s also the problem of the time needed to write those executive summaries taking away from either doing more research or reading more papers. How one balances the altruism for the group/community versus the need to advance one’s own career is going to be a problem for a very long time.
There’s a real limit though, in how much you want to turn over to have the group do for you. I think what many proponents of Web 2.0 for science are underestimating is how much of the nature of science is individual effort. The great breakthroughs, the creative leaps, come from individual minds, not from the hive mind of executive summaries. There’s a level that no matter how much you want to automate things and distribute things, you still need to put in deep individual intellectual effort to fully understand anything and then to creatively move beyond that. It can’t be avoided, it can’t be fobbed off on others.