In June 2016, Coursera.org deleted 472 open online courses as it migrated from an old system to a new system; from March to August 2017, UC Berkeley made all 350+ courses on Webcast.berkeley private. Fortunately, most of these courses have been archived by the Archive Team. This post provides the instructions for downloading and opening the archived courses.
Recovering UC Berkeley courses
Recovering Berkeley courses is easy. Go to this link:
http://www.archiveteam.org/index.php?title=UC_Berkeley_Course_Captures#Status
and find the course you want to watch and click on the link to archive.org in the right column. You can watch the videos online or download them. You can find archived course descriptions in this page or search in Berkeley’s academic guide.
Recovering Coursera courses
This Google Doc lists all the removed courses with dead course links. Course descriptions can be found on MOOC search engines such as Class Central. Removed Courses were packed into 50GB files and uploaded to archive.org. Unfortunately, the Archive Team only preserved 1/3 of the removed courses (even though they claim to have preserved all). To know which courses were preserved, you need to download these files and extract them, which can be cumbersome.
Easier alternatives
About 50% of the removed courses can be found in other sources:
- A clone of the Coursera course on the professor’s personal website or YouTube channel or a dedicated course website. This happens more often for technology related courses.
- Bittorrent or file hosting services available for about 150 courses
- YouTube or Github uploaded by past students.
- The course has migrated to other MOOC platforms.
- The course is still on Coursera but uses a different title, often remodeled as a specialization.
- Courses by University of Wisconsin are found on an official webpage (click “search channels” for a better view).
Recover from archive.org
It’s harder to browse the contents using this method. Use a reliable download manager to reduce errors. To download the files that might contain the courses of interest, go to this link:
https://gist.github.com/mihaitodor/b0d8c8dd824ab936c057508edec377ad
Any removed course is listed in 1 to 2 of the files, but only 1/3 of the courses were actually preserved in at least one file and it’s hard to know which unless you download and extract the file. The other 2/3 of the courses were not archived. To extract the files after downloading, you need the following tools:
Installing Python 3 should be straightforward. To install Warcat on PC, either follow the instructions on the Warcat webpage (may contain typos) or as follows:
- Search “cmd” in the start menu and run.
- Type pip install warcat into the window and enter.
- Type py -m warcat –help and enter to view help. Displaying help indicates successful installation of Warcat.
- Assuming your file is stored at d:\download\filename.megawarc.warc.gz, type cd /d d:\download and enter.
- Type py -m warcat extract filename.megawarc.warc.gz –output-dir /coursera/ –progress and enter to extract the file into a folder named coursera. The process should take less than 30 minutes.
In the extracted folder, almost all of the courses are in the folder named d396qusza40orc.cloudfront.net. The rest occupy 40% of the space and are either low resolution versions of the same videos or files irrelevant to courses. You can use space visualization tools like SpaceSniffer to see what’s occupying most of the space.
For advanced users, the original archive.org page that contains more metadata files is here.
Downloading current Coursera courses
Coursera has deleted at least another 60 courses since June 2016. To download current Coursera courses before they disappear, you can use a different Python tool:
https://github.com/coursera-dl/coursera-dl
The instructions on the page are detailed and working.
The author has another tool for downloading edX courses. EdX has closed over 100 courses so far but many of them are remodeled under new titles. Also, edX rarely cuts course access (for enrolled students) when courses become indefinitely closed and/or unlisted.
Downloading current FutureLearn courses
Since March 2017, FutureLearn has decided to only let paid students access course materials after the sessions are over. There is another Python tool for downloading current FutureLearn courses, but the tool has some bugs and is not being updated:
https://github.com/mjbright/futurelearn-dl
Other dead or endangered courses
With the decline of traditional open courses, many OpenCourseWare platforms are closed, including NYU, Berkeley, and Notre Dame. Others have stopped adding new materials, including Harvard, Stanford (on iTunes U), and Yale. NYU’s past open courses are still on its YouTube channel. UC San Diego posts many public lecture webcasts online but most of which are deleted shortly after the quarter is over or within a year.
One thought on “How to Recover Deleted Coursera and Berkeley Courses from Archive.org”