Stromatolites are layers of sediment made by microbes which started disappearing, paleontologists had believed, about 560 million years ago once those tiny organisms started getting eaten up en masse by newly-evolving multi-celled creatures. But after deploying their new tool, researchers found a greater correlation with seawater chemistry — specifically, whether there was an abundance of the sediment dolomite. They started looking not just for where stromatolites showed up in rocks across history, but where they could have appeared and didn’t.
The researchers built two systems to collect and parse through the colossal range of data. First was GeoDeepDive, a digital library that could rapidly read millions of papers and pluck out particular nuggets. The massive computing it requires is generated by UW-Madison’s Center for High Throughput Computing and HTCondor systems. The second, Macrostat, is a database that tracks the geological properties of North America’s upper crust at different depths and across time.
The undertaking started when one of the study‘s authors fresh out of undergrad at Princeton, Julia Wilcots, took it on as a project in summer 2015. By the end, it pioneered a new way to parse through a massive volume of academic publications and pick out particular references.
“Doing this study without GeoDeepDive would be all but impossible,” The study’s first author Shanen Peters told UW-Madison’s newsroom. “Reading thousands of papers to pick out references to stromatolites, and then linking them to a certain rock unit and geologic period, would take an entire career, even with Google Scholar. Here we got started with a talented undergrad working on a summer project. GeoDeepDive has greatly lowered the barrier to compiling literature data in order to answer many questions.”