Former rector Sijtsma: Turn to statistician to fight fraud and sloppiness
As interim Dean of the Tilburg School of Social and Behavioral Sciences (TSB), Professor Klaas Sijtsma witnessed the Stapel affair up close and in its wake he wrote a book to prevent both academic fraud and clumsy but inadvertent mistakes: Never Waste a Good Crisis. Lessons Learned from Data Fraud and Questionable Research Practices. It has just been released by Routledge.
Why this book?
“Without the Stapel affair I would never have written this book. In it, I draw on my management experience and on what I as a statistician know about the use of statistics. The drama that unfolded in September 2011 shook the entire Dutch academic community to its core, and the international community was affected as well. Stapel, a scientist and the Dean, was summarily dismissed and the Rector Magnificus, Philip Eijlander, asked me to become interim Dean of TSB. It was not what I wanted; I was Vice-Dean for Research at the time and I had no desire to become Dean, but in the face of this emergency I felt I could not decline.
I had come across academic fraud before, albeit indirectly. In Groningen, in 1989, the Kingma affair caused great turmoil, and I witnessed how my former PhD supervisor struggled with it. He kept me posted and I think that’s what made me more alert to other cases, like those of Buck in Eindhoven and Diekstra in Leiden, both in the 1990s. These cases made me fully aware of the havoc academic fraud wreaks.
People are capable of coping with major upsets by simply carrying on
When it broke, the Stapel affair spelt disaster for the School and for the university. All credit to Rector Magnificus Philip Eijlander for managing the whole sorry affair the way he did. In dealing with the situation, we both took control and I became his right-hand man at TSB, because at the end of the day, a Dean is but a cog in the wheel. What I think proved decisive for TSB was that nearly all staff, once they had recovered from the initial shock, carried on and kept the ship on course. That response is similar to how a few years ago people reacted when the impact of the Covid-19 pandemic became clear. People are capable of coping with major upsets by simply carrying on.
In all large organizations there will be people who go beyond the pale
Even if the beginning was disastrous, I realized that as an organization we should not allow ourselves to get carried away by the affair; in all large organizations there will be people who go beyond the pale. At the time, the Dutch universities had 55,000 staff and that number alone makes it inevitable that there will occasionally be trouble. That is a statistical probability. But the fallout of this affair was enormous. In nearly one hundred book chapters and papers published in journals, including Science, data and research had been tampered with. And to my knowledge there are no other cases of a supervisor making their unsuspecting PhD students complicit in fraud. A very grave situation.
In the summer of 2012, when the final report on the affair had yet to be released, I was on holiday in the US, and one night, when we were staying in a log cabin in a nature reserve, I thought to myself: I should start writing things down. And so I did, partly because of a lecture I was due to give at Carnegie Mellon University in Pittsburgh that fall, to which I had been invited as a manager and statistician who had dealt with data fraud. My lecture notes laid the foundation for this book. It was a ‘workplace accident’ that brought it on and I felt we ought to learn from it: never waste a good crisis.”
As a statistician I know that many researchers struggle with statistics. Many honest mistakes are made.
Stapel had fabricated data and he had confessed, but what about honest statistical mistakes?
“As a statistician I know that many researchers struggle with statistics. Many honest mistakes are made; there is no evil intent, it’s just that mistakes are easy to make. There is a difference between fraud and clumsy use of statistics. Fraud comes in three main forms, one of which is falsifying data, changing genuine data, to achieve a desired result. The second form concerns inventing, faking entire sets of data for the same purpose. And there’s plagiarism, copying other people’s work without source references. That kind of fraud is also wrong, but scientifically not as bad as the other two forms, because the result still checks out. All three forms are intentional and must be opposed.
Then there’s the use of statistics that is flawed, incorrect, but not intentionally so. This clumsy but harmful use of statistics occurs because most researchers have not been trained as statisticians but must use statistics. It’s a situation we’re used to, but it really should give us pause. Statisticians are working hard to design better software and to make researchers aware of how easy it is to make mistakes. Kudos to them for their commitment, but it doesn’t go to the heart of the problem. Real solutions must come from executive measures.”
Not publishing research data is completely at odds with the scientific tenet that everything must be verifiable
Universities, ours included, are taking measures left, right, and center to ensure data are used carefully. Do these not suffice?
“These measures often concern students, PhD students, course participants, discussion groups, and PhD supervisors. That’s good, but the question is: what impact do these measures have on the workplace? How many data are actually published? Not all that many. Ten to fifteen years ago, scientists did not publish their data as a matter of fact; most often they would simply keep them on their own computers. Today, there is a growing awareness that data ought to be publicly available. Universities could make it a condition for researchers to act accordingly, for example by including a data release obligation in their contracts. In the corporate sector, it is quite common for companies to own all research data, and universities could take a leaf from that book. If something were then to go wrong, it would be easier to take steps. It’s a good thing that the number of initiatives is growing, like in the area of open science. And I’m not cynical about how things are going, but sometimes management should exert a little pressure, otherwise things, as they tend to do at universities, will remain optional.
I do recognize that there is a privacy issue: people must be protected. Their personal data cannot be disclosed just like that. At the same time, it is important to realize that not publishing research data is completely at odds with the scientific tenet that everything must be verifiable. This is a core principle and an urgent one. It won’t do to take someone’s word for a claim simply because they look reliable. So even if sharing data is sometimes hard, it is something to strive for, not to be obstructed on the basis of self-serving arguments.”
If my car doesn’t work, I take it to a repair shop. Data processing issues should be resolved in a statistics workshop, by statisticians
What do you think needs to be done?
“First of all, all research data must be publicly available. That accords with the essence of science: when someone makes a claim, others should be able to review the underlying data. Sadly, all too often no such data review is possible: there might be privacy issues, but researchers sometimes claim they have lost the data or don’t have access to them, and sometimes it’s a simple case of unwillingness to share. And some researchers are concerned that others might use the data for publications of their own, but as there are practical solutions to that particular issue, it is not a good reason to block release of the data.
Secondly, we should acknowledge that many researchers are not competent statisticians. Psychologists, sociologists, health researchers, biologists – to them, statistics is a course they take. But can anyone who has studied chemistry for only a year honestly claim to be an accomplished chemist? Those who are competent often see statistical errors straightaway. Those who are not take what they see for granted because everybody they know does.
The best thing would be this: if the going gets tough, call in a statistician. That may not feel like an obvious step, but if my car doesn’t work, I take it to a repair shop. Data processing issues should be resolved in a statistics workshop, by statisticians. So researchers should be sure to have a statistician join their research team.”
Date of publication: 28 November 2023