How to Recover Deleted Coursera and Berkeley Courses from Archive.org

In June 2016, Coursera.org deleted 472 open online courses as it migrated from an old system to a new system; from March to August 2017, UC Berkeley made all 350+ courses on Webcast.berkeley private. Fortunately, most of these courses have been archived by the Archive Team. This post provides the instructions for downloading and opening the archived courses.

Recovering UC Berkeley courses

Recovering Berkeley courses is easy. Go to this link:

http://www.archiveteam.org/index.php?title=UC_Berkeley_Course_Captures#Status

and find the course you want to watch and click on the link to archive.org in the right column. You can watch the videos online or download them. You can find archived course descriptions in this page or search in Berkeley’s academic guide.

Recovering Coursera courses

This Google Doc lists all the removed courses with dead course links. Course descriptions can be found on MOOC search engines such as Class Central.  Removed Courses were packed into 50GB files and uploaded to archive.org. Unfortunately, the Archive Team only preserved 1/3 of the removed courses (even though they claim to have preserved all). To know which courses were preserved, you need to download these files and extract them, which can be cumbersome.

 

Easier alternatives

About 50% of the removed courses can be found in other sources:

  • A clone of the Coursera course on the professor’s personal website or YouTube channel or a dedicated course website. This happens more often for technology related courses.
  • Bittorrent or file hosting services
  • YouTube or Github uploaded by past students.
  • The course has migrated to other MOOC platforms.
  • The course is still on Coursera but uses a different title, often remodeled as a specialization.
  • Courses by University of Wisconsin are found on an official webpage (click “10 channels” for a better view).

Recover from archive.org

It’s harder to browse the contents using this method. Use a reliable download manager to reduce errors. To download the files that might contain the courses of interest, go to this link:

https://gist.github.com/mihaitodor/b0d8c8dd824ab936c057508edec377ad

Any removed course is listed in 1 to 2 of the files, but only 1/3 of the courses are preserved in at least one file. The other 2/3 of the courses were actually not archived. To extract the files after downloading, you need the following tools:

  • Download and install Python 3
  • Download Warcat by clicking the green button

Installing Python 3 should be straightforward. To install Warcat on PC, either follow the instructions on the Warcat webpage (may contain typos) or as follows:

  1. Search “cmd” in start menu and run.
  2. Type pip install warcat into the window and enter.
  3. Type py -m warcat –help and enter to view help. Displaying help indicates successful installation of Warcat.
  4. Assuming your file is stored at d:\download\filename.megawarc.warc.gz, type cd /d d:\download and enter.
  5. Type py -m warcat extract filename.megawarc.warc.gz –output-dir /coursera/ –progress and enter to extract the file into a folder named coursera. The process should take less than 30 minutes.

In the extracted folder, almost all of the courses are in the folder named d396qusza40orc.cloudfront.net. The rest occupy 40% of the space and are either low resolution versions of the same videos or files irrelevant to courses. You can use space visualization tools like SpaceSniffer to see what’s occupying most of the space.

For advanced users, the original archive.org page that contains more metadata files is here.

Downloading current Coursera courses

Coursera has deleted at least another 60 courses since June 2016. To download current Coursera courses before they disappear, you can use a different Python tool:

https://github.com/coursera-dl/coursera-dl

The instructions on the page are detailed and working.

The author has another tool for downloading edX courses. EdX has closed over 100 courses so far but many of them are remodeled under new titles. Also, edX rarely cuts course access (for enrolled students) when courses become indefinitely closed and/or unlisted.

Downloading current FutureLearn courses

Since March 2017, FutureLearn has decided to only let paid students access course materials after the sessions are over. There is another Python tool for downloading current FutureLearn courses, but the tool has some bugs and is not being updated:

https://github.com/mjbright/futurelearn-dl

Other dead or endangered courses

With the decline of traditional open courses, many opencourseware platforms are closed, including NYU, Berkeley, and Notre Dame. Others have stopped adding new materials, including Harvard, Stanford (on iTunes U), and Yale. NYU’s past open courses are still on its YouTube channel. UC San Diego posts many public lecture webcasts online but most of which are deleted shortly after the quarter is over or within a year.

 

One thought on “How to Recover Deleted Coursera and Berkeley Courses from Archive.org”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s