Project aims to make LibGen, with 33 terabytes of scientific papers, much more stable
by Matthew Gault / Dec 2 2019

“It’s hard to find free and open access to scientific material online. The latest studies and current research huddle behind paywalls unread by those who could benefit. But over the last few years, two sites—Library Genesis and Sci-Hub—have become high-profile, widely used resources for pirating scientific papers. The problem is that these sites have had a lot of difficulty actually staying online. They have faced both legal challenges and logistical hosting problems that has knocked them offline for long periods of time. But a new project by data hoarders and freedom of information activists hopes to bring some stability to one of the two Pirate Bays of science.

Library Genesis (LibGen) contains 33 terabytes of books, scientific papers, comics, and more in its scientific library. That’s a lot of data to host when countries and science publishers are constantly trying to get you shut down. Last week, redditors launched a project to better seed, or host, LibGen’s files. “It’s the largest free library in the world, servicing tens of thousands of scientists and medical professionals around the world who live in developing countries that can’t afford to buy books and scientific journals. There’s almost nothing else like this on Earth.”

[Meta] Mission to seed Library Genesis: donations pour in to preserve and distribute the entire 30 terabyte collection from Scholar

“They’re using torrents to fulfill World Health Organization and U.N. charters. And it’s not just one site index—it’s a network of mirrored sites, where a new one pops up every time another gets taken down,” user shrine said on Reddit. Shrine is helping to start the project. Two seedbox companies (services that provide high-bandwidth remote servers for uploading and downloading data), and UltraSeedbox, stepped in to support the project. A week later, LibGen is seeding 10 terabytes and 900,000 scientific books thanks to help from and UltraSeedbox.

Charitable seeding for nonprofit scientific torrents from seedboxes

LibGen also teamed up with another massive online archiving project, The-Eye, to facilitate the tracking, storage, and seeding of LibGen’s scientific archive. The-Eye is run by a user named -Archivist, who has previously tried to archive a petabyte of porn and the entirety of Instagram. and has archived 80 gigabytes of Apple videos deleted by YouTube in addition to the terabytes of data archived on The-Eye, which include conspiracy theory documents, old software, video game roms, books, and a lot more.

“We’re not only trying to get the Library Genesis main collection torrents healthier, but also trying to get the complete collection so that The-Eye can properly back it up AND distribute it out in all its glory,” shrine said on Reddit. “There is currently no one doing that, so I think it’s a big step towards keeping the collection safe as well as making it available to more developers who want to do something with the collection.”

Library Genesis is powered by Sci-Hub, an embattled website that provides users free access to scientific papers. Created in 2011 by hacker and scientific researcher Alexandra Elbakyan, Sci-Hub scrapes data from behind the paywalls of the world’s scientific journals and posts them for free online. Governments and private companies have attempted repeatedly to shut down Sci-Hub and sue Elbakyan, but the site remains.”

Archivists Are Saving the History of Internet Piracy
by Karl Bode / Dec 13 2019

“An ongoing project to catalog the history of piracy has just topped the 300 gigabyte mark, with a goal of offering a searchable index of more than 5 terabytes of piracy-related metadata once complete. It is now the largest searchable index of piracy metadata in the history of the internet. The “warez” scene is an old as the internet itself. From the earliest days of BBS (bulletin board systems) to the rise of BitTorrent, the piracy community is as vibrant as any on the internet. From the ASCII and other art included in the .nfo files that accompany group releases, to piracy group logos and brands, there’s decades of residual data documenting the rise and fall of an ocean of different groups and subcultures that might otherwise be lost to the sands of time.


Enter The Eye: a pet project of a man who calls himself the Archivist, whose obsession with cataloging the ever-shifting, impermanent history of the internet has ranged from archiving a petabyte of porn and the entirety of Instagram to preserving 80 gigabytes of old Apple videos deleted by YouTube. The Archivist told Motherboard his efforts are funded entirely by community donations. “These files contain unique artworks, information about scene groups, the trials and tribulations of those groups be it inter-personal feuds, issues with being raided by law enforcement, law trying to infiltrate the groups, how the groups acquire media, how they crack games and software, how the work on early movie releases to get then looking the best they can for wider release and so on,” he said. “Without archives like this so much history of a huge online world vanishes and that’s simply not acceptable,” he added.

As with the YouTube metadata and Instagram archives, the Archivist says the biggest obstacle to tracking and cataloging such content is the sheer volume of files involved. The initial release of this latest piracy dataset included upwards of 13,000,000 files, and while the total size of this metadata was just under 400 GB, organizing them remains extremely time consuming. “The most recent milestone in this endeavour came only yesterday, when I finally finished the unpacking and compressing of 4,000,000+ SRR files from—a site which now tries their best to thwart scraping,” he said. “This addition comes in at a fair 1.2TB and is still not everything from the site; I’ve yet to grab files released after February of this year.”

A moderate level of piracy can have a positive impact on the bottom line for both manufacturer and retailer – and not at the expense of consumers – finds a new study. Because piracy can affect pricing power of manufacturer and retailer, it injects "shadow" competition into a monopolistic market. from science

Other internet archival efforts tend to get far more attention, in large part because the press treats piracy as the black sheep of the internet. When piracy is mentioned, it’s usually portrayed by analysts and the media exclusively as a nefarious, irredeemable phenomenon. But it’s not that simple. Data suggests that piracy is better viewed as an expression of consumer dissatisfaction. Studies indicate that piracy can often act as a form of “invisible competition,” prompting everyone from the cable TV sector to the video game industry to try a little harder, be it offering better streaming TV services, or  backing off obnoxious DRM or monetized DLC.

Regardless of one’s opinion on piracy itself, the surrounding internet subculture’s long history—estimated to have begun somewhere around 1975—is well worth preservation. That said, piracy isn’t the only thing the Archivist and the folks at The Eye are working on. “Recently YouTube is back at their bullshit again removing or forcing the mass removal of content as well as straight up undeniable censorship of opinion, so myself and friends at The-Eye are working on a service to take care of this issue to the best of our underfunded abilities,” he said. The fruit of those efforts should pop up next week, when the website is expected to release more than 10 billion YouTube video metadata files, and the launch of a new service that should allow the public to lend a hand in organizing vast troves of data.”




Leave a Reply