Avoiding the formation of unwanted clusters of similar elements when dividing data into groups is of great importance for the analysis of medical data. Psychologists and computer scientists from Heinrich Heine University Düsseldorf (HHU) developed a new method to solve this “anticlustering” problem in 2020. Together with researchers from the University of California, San Francisco (UCSF), they have now developed an extension, which is important for analysis of high-throughput sequencing data and more. The researchers describe their new tool in the context of an application to the chronic disease endometriosis in the scientific journal Cell Reports Methods.
Endometriosis is a complex and often painful condition, which affects millions of women worldwide. Tissue similar to the lining of the uterus grows outside the uterus, for example on the ovaries or even on the intestine. The tissue can change over the course of the menstrual cycle.
In order to investigate the cellular and molecular factors that play a role in the development and severity of endometriosis, multidisciplinary experts from UCSF and Stanford University are conducting analyses of data from hundreds of women as part of the ENACT Center, led by Professors Drs Linda C. Giudice and Marina Sirota (UCSF) and Brice Gaudilliere and David K. Stevenson (Stanford). A team headed by UCSF Associate Professor Dr Tomiko T. Oskotsky is leading efforts to ensure robust experimental design for investigations involving high-throughput technologies, including single-nucleus RNA-sequencing.
The samples have to be processed in batches for technical reasons. If these batches are not carefully balanced – e.g. with regard to disease stage or age of the patients – so-called batch effects can distort the results and ultimately make it difficult to judge whether observed differences have a biological cause or are simply artefacts from the technical process.
This is where the anticlustering method comes in, which Dr Martin Papenberg from the Department of Experimental Psychology and Professor Dr Gunnar Klau, holder of the Chair of Algorithmic Bioinformatics – both from HHU – presented in the journal Psychological Methods in 2020. The researchers have made the “anticlust” module available free of charge.
“Addressing the technical needs of the ENACT team requires, in addition to the previous scope of anticlust, that related samples – such as multiple tissue samples from the same patient – are grouped in the same batch to enable meaningful comparisons to be drawn for individual patients,” says Dr Papenberg, explaining the new challenge, which he was able to solve by developing the so-called “Must-Link Method”.
Professor Klau: “We were able to expand our approach successfully to enable samples that need to remain together to be sorted into one batch, while maintaining a good balance of samples across different batches. This prevents methodological bias and medical colleagues can draw conclusions from the data, which relate specifically to the influences of the genes on endometriosis.”
Professor Oskotsky: “By using anticlust to minimise batch effects through better experimental design, we are confident that the findings in our molecular data genuinely reflect the underlying biology. This approach contributes to gaining new insights into endometriosis and demonstrates how well thought-out computational methods can significantly improve biomedical research.”
The research work was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development, one of the National Institutes of Health of the USA, grant ID P01HD106414. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Martin Papenberg, Cheng Wang, Maïgane Diop, Syed Hassan Bukhari, Boris Oskotsky, Brittany R. Davidson, Kim Chi Vo, Binya Liu, Juan C. Irwin, Alexis Combes, Brice Gaudilliere, Jingjing Li, David K. Stevenson, Gunnar W. Klau, Linda C. Giudice, Marina Sirota, Tomiko T. Oskotsky. Anticlustering for Sample Allocation To Minimize Batch Effects. Cell Reports Methods (2025).
DOI: 10.1016/j.crmeth.2025.101137
Dr Martin Papenberg (left) from the Department of Experimental Psychology and Professor Dr Gunnar Kl ...
Copyright: HHU / Nicolas Stumpe
Criteria of this press release:
Journalists, Scientists and scholars
Information technology, Medicine, Psychology
transregional, national
Research results, Scientific Publications
English
Dr Martin Papenberg (left) from the Department of Experimental Psychology and Professor Dr Gunnar Kl ...
Copyright: HHU / Nicolas Stumpe
You can combine search terms with and, or and/or not, e.g. Philo not logy.
You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).
Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.
You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).
If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).