Today I read an interesting post at TextYard where they’ve made an open-source platform for students (and other programmers) to scrape bookstore course-to-book data for their own uses. How is this possible? Well, it’s already happening (the post provides examples with links) and it’s not as difficult as one might think given that majority of college bookstores use one of six software systems. Open those six systems and you’ve really opened up almost all individual bookstores. As for legality . . . the guys at TextYard who developed the scrape technique and ported it believe that it is legal (and ethical) and have presented their case.

Data scraping is not a new topic so much as an extension and outgrowth of P2P and file-sharing technology and conversations about who owns what information and what is sharing and what is stealing. Now we are talking providing the middle information access-point in code rather than the end product such as a PDF of a scanned book. My first brush with bookstore scrapes was in 2007 with the company Ugenie. They collected bookstore information and provided it to merchants across the Web so that students could purchase books in a shop-by-course fashion. Almost all of their data came from Follett stores and they were forced to drop the service after a legal battle with the bookstore giant ensued.

There is no doubt that many sites have been, are, and will continue collecting data by scraping college bookstore systems. It’s been an unspoken known for some time, something that just sort of floated around in the ether, and while seen as enough of a problem that store systems put anti-scraping technology in place, it wasn’t seen as that much of a threat. But now with TextYard pulling together all of the code and making it open source and providing documentation, the technology becomes concerted and a much more powerful tool and thus a much larger threat to the status quo. While I salute TextYard for taking this on and putting it out there, I question what’s in it for them and if they’re not asking for trouble by publicizing the project.

Tagged with →  
Share →

3 Responses to Is Scraping Bookstore Data Wrong? Is It Legal? Is It a Game Changer?

  1. Jeff,

    Thanks for the write-up and sharing your thoughts. I am confident that it is both ethical and legal to get students unfettered course-book information. That was the intent of the Higher Education Opportunity Act, but the schools, and chains like Follett and Barnes and Nobles that they partner with, have found ways to interpret it so that the spirit of the law is not followed.

    Even if they somehow managed to shutdown TextYard.com (which I barely maintain at this point), the source code is now out there and there is nothing they can do to stop the spread of the class-book data… especially if we build an open-source project around keeping these scrapers up to date.

  2. Rich Hershman says:

    Ben is wrong with respect to the Higher Education Opportunity Act and I have told him this several times over the last two years. I know this because I was directly involved in the drafting and negotiations of the law. The law was written with the intent in part of arming students with their course material information. The main intent was to hold faculty more accountable for the cost of the materials they assigned. Institutions are not obligated under HEOA to make this information available to commercial entities like TextYard, unless they are somehow affiliated with the institution -i.e. are they supporting or offsetting the cost of producing the information.

    The idea that the significant cost of collecting, research, validating and posting course material adoption information and then turning it over to commercial entities to exploit was significantly discussed in Congress and it was rejected early on in the House drafting of the language. There is direct language in the law, in the conference report, and in the Congressional Budget Office scoring, that does not support Ben’s assertion on HEOA law or intent.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>