DupChecker

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates would make study results questionable. We developed a R package DupChecker (https://github.com/shengqh/DupChecker) that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data.

Manuscript https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-323