Background:
Many studies on human microbiomes have demonstrated the great interpersonal variability of microbial communities, as well as the potential for specific aspects of the microbiome to uniquely tie to an individual. The significance of this study lies both in criminal forensic applications, as well as in privacy concerns for individuals that participate in microbiome research studies. Despite criminal forensic’s history of personal identification through fingerprints, DNA, and blood type, there have been no real efforts to establish microbial data as a method of personal identification.
The scientific community has already begun brainstorming how microbial data could be leveraged for forensic use. Using knowledge on how an individual’s microbiota changes depending on diet, lifestyle, medication, and pathology, forensic analysts may be able to trace suspects from their bacterial sheddings at a crime scene. Even without direct identification, the aforementioned lifestyle information could assist in apprehension of an assailant (Hampton-Marcell et al. 2017). This study’s purpose is to investigate the capabilities of “fingerprinting” individuals using their microbiome. Microbial fingerprinting (MF) will be defined as using a set of microbial data to trace and identify a unique individual from a larger population. The benefits of microbial fingerprinting in forensics would be numerous, allowing for suspect identification when human DNA is not usable. This pro to microbial fingerprinting comes from the resilience of bacterial DNA; it is not as easily destroyed as human DNA (Nema 2018). While researchers in the past have used metagenomic shotgun sequencing to identify microbial populations, they found that increases in data set size decreased efficacy for this profiling method. (Segata et al. 2012). For this reason, this study uses a method described in a publication from Segata et al. (2012), where clade-specific marker genes are used to identify microbial clades in larger data sets.
Central question:
Can individuals be uniquely identified using solely their microbial data?
Evidence:
The researchers, (Franzosa et al. 2015), constructed metagenomic codes, which were unique to each subject. These codes were values created by the authors that would later be used as identifiers for the individual. Four different metagenomic features were used to construct these codes. Bacterial abundance (from two different sequencing methods), as well as species specific marker genes (from two different databases). Researchers collected samples from subjects across multiple visits, and cross referenced the collected samples with the constructed codes. Codes were cross referenced using “hitting set” scripts developed by the researchers. Results showed that certain metagenomic “features” making up the codes were more stable over time than others. The most stable feature identified by the researchers was the marker gene-based code. This feature was not only robust over time, but also remained unique as population size increased.Researchers conducted statistical analysis on the observed false positive occurrence, as well as probable occurrence in increased population sizes. The results of this analysis was that codes using the aforementioned marker gene-based feature in stool were the most resistant to false positives. This was due to this feature’s stability against loss of uniqueness over time, and showed only a 2% false positive probability. The authors were able to determine that the stool microbiota’s stability contributes to maintenance of unique features over time, as well as inhibition of new feature acquisition. The observed stability led the authors to the conclusion that the human microbiome is stable enough to identify a unique individual, when using the properly targeted code(s) for identification. It is important to note that researchers did express that they would expect a much greater possibility of false positives in a more realistically sized human population. Although this appears to decrease the uses of microbiome data in forensics, the researchers do lay out future technological advancements that could reinvigorate the possibility. Without forensic applications directly in front of us, this study does still illustrate potential issues with microbiome studies. In the author’s discussion, focus shifts from forensics to privacy concerns in microbiome research studies. A direct quote from the study states, ”This finding has important ethical ramifications for microbiome study design, particular those involving stool, as we have shown conclusively that metagenomic samples from a variety of body sites can be linked to individuals without additional identifying information.” The researchers claim that while it is not 100% accurate, microbial data was still specific enough to be considered as identifiable information. This would require the establishment of new regulations on how microbial data is handled in labs, and who it is handled by. Individual identification through microbial data would be of specific concern in cases where sample collection is linked to sensitive phenotypes including but not limited to: health status, sexually transmitted infections, and cohabitation.
My questions:
How can we, in light of this information, protect the identity of microbiological research subjects moving forward? Will stricter regulations be developed on the security of microbiome information within the lab? Researchers proposed in the discussion section that microbial fingerprinting’s accuracy might be increased through technological advances pertaining to detection of rare features. I wondered how single nucleotide polymorphism (SNP) detection technology changed since the publishing of this paper in 2015, which brought me into a segway for the “next step”. I believe it would be beneficial to repeat this study, but incorporate the use of SNP identification outlined in a very recent study from Pirmoradi et al. (2020) That study, published September 2020, analyzed the efficiency of a proposed SNP data analysis method. The results indicated that this analysis method’s range of efficiency across multiple different samples was from 94.4% effective to 100% effective. Using this new technology, false-positive probability could be greatly reduced; this could possibly strengthen the case for the incorporation of microbiome data in forensics.
Overall, I found this type of data and analysis to be difficult to read. It contained a lot of computer sciences, which is outside my scope of education. It did, however, make me more interested in the computer science aspects of bioinformatics. I wonder how complex this paper might read to a computer sciences major, and if they would find the specific scripts/codes the researcher ran simple, complex, or somewhere in the middle.
Further reading:
- What are SNPs?: This is a youtube video explaining what an SNP (Single Nucleotide Polymorphism) is. These SNPs are referenced in the paper, as well as in discussion on how the research might continue with current technological developments.
- How the Microbiome is used in Forensics: This article describes some of the existing trends seen in microbial data usage in criminal forensics. Special detail should be paid to the conversation on skin microbiome data’s role in forensics, as that is most relevant to the article discussed in the blog post.
- How the Microbiome has raised privacy concerns: This is an article from a scientific news journal highlighting privacy concerns of microbial data. The article even refers to the [gut] microbiome as a “gut print”, and discusses the possibility of identifying specific individuals from gut samples.
References:
- Franzosa, E. A., Huang, K., Meadow, J. F., Gevers, D., Lemon, K. P., Bohannan, B. J. M., & Huttenhower, C. (2015). Identifying personal microbiomes using metagenomic codes. Proceedings of the National Academy of Sciences, 112(22), E2930–E2938. doi: 10.1073/pnas.1423854112
- Hampton-Marcell, J. T., Lopez, J. V., & Gilbert, J. A. (2017). The human microbiome: an emerging tool in forensics. Microbial biotechnology, 10(2), 228–230. doi: 10.1111/1751-7915.12699
- Nema, V. (2018). Microbial Forensics: Beyond a Fascination. DNA Fingerprinting: Advancements and Future Endeavors, 295–306. doi: 10.1007/978-981-13-1583-1_17
- Pirmoradi, S., Teshnehlab, M., Zarghami, N., & Sharifi, A. (2020). A Self-organizing Deep Auto-Encoder approach for Classification of Complex Diseases using SNP Genomics Data. Applied Soft Computing, 106718. doi: 10.1016/j.asoc.2020.106718.
- Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., & Huttenhower, C. (2012). Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods, 9(8), 811–814. doi: 10.1038/nmeth.2066.
- Franzosa, E. A., Huang, K., Meadow, J. F., Gevers, D., Lemon, K. P., Bohannan, B. J. M., & Huttenhower, C. (2015). Identifying personal microbiomes using metagenomic codes. Proceedings of the National Academy of Sciences, 112(22), E2930–E2938. doi: 10.1073/pnas.1423854112