NYT: 24 hours to convert 11 millions images to PDF

Derek Gottfrid of the New York Times on how they converted 11 millions TIFF images of the NYT archives to PDF using Amazon EC2/S3 services.

I had been using Amazon S3 service for some time and was quite impressed. And in late 2006 I had begun playing with Amazon EC2. So the the basic idea I had was this: upload 4TB of source data into S3, write some code that would run on numerous EC2 instances to read the source data, create PDFs, and store the results back into S3. S3 would then be used to serve the PDFs to the general public. It all sounded pretty simple, and that is how I got the folks in charge to agree to such an idea — not to mention that Amazon S3/EC2 is pretty easy on the wallet.

Derek Gottfrid, Self-service, Prorated Super Computing Fun!, nytimes.com

