Today I read an interesting post at TextYard where they’ve made an open-source platform for students (and other programmers) to scrape bookstore course-to-book data for their own uses. How is this possible? Well, it’s already happening (the post provides examples with links) and it’s not as difficult as one might think given that majority of college bookstores use one of six software systems. Open those six systems and you’ve really opened up almost all individual bookstores. As for legality . . . the guys at TextYard who developed the scrape technique and ported it believe that it is legal (and ethical) and have presented their case.
Data scraping is not a new topic so much as an extension and outgrowth of P2P and file-sharing technology and conversations about who owns what information and what is sharing and what is stealing. Now we are talking providing the middle information access-point in code rather than the end product such as a PDF of a scanned book. My first brush with bookstore scrapes was in 2007 with the company Ugenie. They collected bookstore information and provided it to merchants across the Web so that students could purchase books in a shop-by-course fashion. Almost all of their data came from Follett stores and they were forced to drop the service after a legal battle with the bookstore giant ensued.
There is no doubt that many sites have been, are, and will continue collecting data by scraping college bookstore systems. It’s been an unspoken known for some time, something that just sort of floated around in the ether, and while seen as enough of a problem that store systems put anti-scraping technology in place, it wasn’t seen as that much of a threat. But now with TextYard pulling together all of the code and making it open source and providing documentation, the technology becomes concerted and a much more powerful tool and thus a much larger threat to the status quo. While I salute TextYard for taking this on and putting it out there, I question what’s in it for them and if they’re not asking for trouble by publicizing the project.