Diehard Coders Just Rescued NASA’s Earth Science Data
ON SATURDAY MORNING, the white stone buildings on UC Berkeley’s campus radiated with unfiltered sunshine. The sky was blue, the campanile was chiming. But instead of enjoying the beautiful day, 200 adults had willingly sardined themselves into a fluorescent-lit room in the bowels of Doe Library to rescue federal climate data.
Like similar groups across the country—in more than 20 cities—they believe that the Trump administration might want to disappear this data down a memory hole. So these hackers, scientists, and students are collecting it to save outside government servers.
But now they’re going even further. Groups like DataRefugeand the Environmental Data and Governance Initiative, which organized the Berkeley hackathon to collect data from NASA’s earth sciences programs and the Department of Energy, are doing more than archiving. Diehard coders are building robust systems to monitor ongoing changes to government websites. And they’re keeping track of what’s already been removed—because yes, the pruning has already begun.Tag It, Bag It
The data collection is methodical, mostly. About half the group immediately sets web crawlers on easily-copied government pages, sending their text to the Internet Archive, a digital library made up of hundreds of billions of snapshots of webpages. They tag more data-intensive projects—pages with lots of links, databases, and interactive graphics—for the other group. Called “baggers,” these coders write custom scripts to scrape complicated data sets from the sprawling, patched-together federal websites.
It’s not easy. “All these systems were written piecemeal over the course of 30 years. There’s no coherent philosophy to providing data on these websites,” says Daniel Roesler, chief technology officer at UtilityAPI and one of the volunteer guides for the Berkeley bagger group.
One coder who goes by Tek ran into a wall trying to download multi-satellite precipitation data from NASA’s Goddard Space Flight Center. Starting in August, access to Goddard Earth Science Data required a login. But with a bit of totally legal digging around the site (DataRefuge prohibits outright hacking), Tek found a buried link to the old FTP server. He clicked and started downloading. By the end of the day he had data for all of 2016 and some of 2015. It would take at least another 24 hours to finish.