2008-05-11
ffffound_mirror.rb
link 20:21:00
Today’s half-finished code hack will scrape ffffound, downloading the original image URL and site along with the ffffound id, URL, image and related IDs, and various metadata, and then put them into a big JSON file.
Known bugs:
- posts newer than about two days don’t get a date. Will try chronic later
- dates may not be completely accurate: I’m not sure which TZ ffffound uses
To do:
-
populate database not JSONsee update below - argument driven
- re-run-able (incremental running; abort when matches ID)
- determine post vs found (cleverly or by brute force - see ‘type’)
Harder
- something like flickrtouchr?
- delish style UI
Question
- Does the presence of javascript:FoundAPI in the page source mean anything?
Update OK, I hacked in a database. Create it with ffffound_create_db and back up to it with ffffound_mirror_db. It handles interrupts gracelessly; use the sqlite command line tool to ‘delete images’ for now.