Interesting read: Backing up Geocities: Lessons so far.
A side-effect of the whole process is I now know way, way, way too much about Geocities than I ever expected to. We’ve had to dissect every aspect of how the site functions to understand how to mirror things, from its history through how it does crazy javascript ads. Some of it is stupid and some is hilarious, but this contextual bit is important to understanding the data we have.