Sister blog of Physicists of the Caribbean. Shorter, more focused posts specialising in astronomy and data visualisation.

Tuesday 20 November 2018

Risky research means failure is always an option

In astronomy we often have to do repeat observations of potential detections to confirm they're real. A good confirmation rate is about 50%. Much less than this and we'd be wasting telescope time, and we'd start to worry that some of the sources we thought were real might not be so secure. Conversely, a much higher fraction would also be a waste of time, and would imply that we hadn't been as careful in our search as we thought - there'd still be other interesting things hidden in the data that we hadn't seen.

I suggest that this is also true to some extent in psychology. There seems a science-wide call for more risky, controversial research. Well, risky, controversial research requires a certain failure rate : if every finding was replicated, that would suggest the research wasn't been risky enough; if none of them were, that would imply lousy research practises. The actual replication rate turns out to be, by happy coincidence, about 50%.

But likewise, in astronomy we don't write a paper in which we consider sources we haven't confirmed yet (or at least it's a very bad idea to do so). We wait until we've got those repeat observations before drawing any conclusions. Risky, preliminary pilot studies ought to have a failure rate by definition, otherwise they wouldn't be risky at all. The big "end-result" studies on the other hand, the ones that are actually used to draw secure conclusions and, in the case of psychology, influence social policy, well those you'd want at least their basic results to be on a secure footing.

The Many Labs 2 project was specifically designed to address these criticisms. With 15,305 participants in total, the new experiments had, on average, 60 times as many volunteers as the studies they were attempting to replicate. The researchers involved worked with the scientists behind the original studies to vet and check every detail of the experiments beforehand. And they repeated those experiments many times over, with volunteers from 36 different countries, to see if the studies would replicate in some cultures and contexts but not others.

Despite the large sample sizes and the blessings of the original teams, the team failed to replicate half of the studies it focused on. It couldn’t, for example, show that people subconsciously exposed to the concept of heat were more likely to believe in global warming, or that moral transgressions create a need for physical cleanliness in the style of Lady Macbeth, or that people who grow up with more siblings are more altruistic. And as in previous big projects, online bettors were surprisingly good at predicting beforehand which studies would ultimately replicate. Somehow, they could intuit which studies were reliable.

Maybe anecdotes are evidence, after all... :P

Many Labs 2 “was explicitly designed to examine how much effects varied from place to place, from culture to culture,” says Katie Corker, the chair of the Society for the Improvement of Psychological Science. “And here’s the surprising result: The results do not show much variability at all.” If one of the participating teams successfully replicated a study, others did, too. If a study failed to replicate, it tended to fail everywhere.

Many researchers have noted that volunteers from Western, educated, industrialized, rich, and democratic countries—weird nations—are an unusual slice of humanity who think differently than those from other parts of the world. In the majority of the Many Labs 2 experiments, the team found very few differences between weird volunteers and those from other countries. But Miyamoto notes that its analysis was a little crude—in considering “non-weird countries” together, it’s lumping together people from cultures as diverse as Mexico, Japan, and South Africa. “Cross-cultural research,” she writes, “must be informed with thorough analyses of each and all of the cultural contexts involved.”

Sanjay Srivastava from the University of Oregon says the lack of variation in Many Labs 2 is actually a positive thing. Sure, it suggests that the large number of failed replications really might be due to sloppy science. But it also hints that the fundamental business of psychology—creating careful lab experiments to study the tricky, slippery, complicated world of the human mind—works pretty well. “Outside the lab, real-world phenomena can and probably do vary by context,” he says. “But within our carefully designed studies and experiments, the results are not chaotic or unpredictable. That means we can do valid social-science research.”

https://www.theatlantic.com/science/archive/2018/11/psychologys-replication-crisis-real/576223/

2 comments:

  1. I am not sure I completely follow the reasoning here. Whilst I understand that not every attempt to confirm some result will be successful, I don't understand why repetition would imply that previous attempts were not careful. Just because someone wants to revisit your results does not mean they would necessarily doubt them, and if they did that's how science is supposed to work. Besides, tools and methodologies improve so revisiting older studies armed with the latest techniques and new information can make a difference.

    Human studies are not necessarily directly comparable, especially when they are not so much about instruments, computation, and/or technical progress. Physical observations tend to be far more technical by nature compared to human interviews or behavioral studies. I wouldn't be surprised if some of the example hypotheses given above were simply misplaced and the failures to reproduce the results simply because of weak hypotheses. Some of them sound pretty simplistic or even naïve generalizations. In psychology few things have simple reasons. People are quite multivariate subjects and can't be easily defined by single variable explanations.

    ReplyDelete
  2. It would definitely be helpful to have a few case studies here. There's just not enough information here to say why the replication failed - it could be that some research was done in an obviously flawed way, or it could be something else entirely.

    ReplyDelete

Back from the grave ?

I'd thought that the controversy over NGC 1052-DF2 and DF4 was at least partly settled by now, but this paper would have you believe ot...