Today’s half-finished code hack will scrape ffffound, downloading the original image URL and site along with the ffffound id, URL, image and related IDs, and various metadata, and then put them into a big JSON file.
- posts newer than about two days don’t get a date. Will try chronic later
- dates may not be completely accurate: I’m not sure which TZ ffffound uses
populate database not JSON see update below
- argument driven
- re-run-able (incremental running; abort when matches ID)
- determine post vs found (cleverly or by brute force - see ‘type’)
Update OK, I hacked in a database. Create it with ffffound_create_db and back up to it with ffffound_mirror_db. It handles interrupts gracelessly; use the sqlite command line tool to ‘delete images’ for now.