[colug-432] Photo digitization recommendations?
jim at rossberry.com
Tue Jan 17 11:22:53 EST 2017
If you have the negatives, negative scanning might be an option as well. My wife worked at Walmart photo for about a year and managed to
scan all of our pre-digital negatives to CD on her lunch breaks.
On Mon, 16 Jan 2017, Rick Hornsby wrote:
> I'll give you my feedback as a photographer, but not necessarily as someone who has tried to come up with a shared/distributed end-to-end
> system like you're aiming at here.
> On January 12, 2017 at 12:32:19, Peter Kukla (fruviad at yahoo.com) wrote:
> Hi COLUGers,
> I have hundreds -- maybe thousands -- of photographs that I'd like to scan. My goal is to digitize the whole batch and distribute them
> to the family members of those in the pictures.
> For the "thousands" volume you're looking at, you won't want a flatbed scanner. It is going to take you freaking forever. Even if you put
> multiple images on the scanner bed at once, you still have to now digitally cut them back into individual photographs.
> You definitely _do_ want a flatbed scanner for old, damaged, or delicate prints that can't be put through an automatic document feeder type
> system. If you use an ADF, make sure it is one designed for photographs. A normal sheet feeder may mangle your one-of-a-kind print.
> Photographers digitizing images will sometimes use an actual camera to photograph the print. This is requires a light box, and is labor
> intensive, but can provide a very high quality result and can reduce the risk to the print.
> With that said, I strongly suggest finding a local vendor that can do the digitizing for you. Unless you just have a lot of time on your
> I have a Linux-based household, so the hardware & software would need to be Linux-friendly. Hence my bugging you guys.
> The project will require a fast scanner, given the number of photos in question. Waiting 2-3 minutes for each scan to complete is
> fine if I only have a few pages to scan, but not for a large project like this.
> You can't do a fast _and_ quality scan of a print photograph with off the shelf gear. Remember that a print is an analog medium with (a
> comparatively) massive resolution. Your best bet, honestly, is to give your prints to a vendor that specializes in digitizing them. From
> there, you can take the digital versions and do whatever you like with them - cataloging, setting metadata, allowing others to set metadata
> through a web UI, etc. You can ask the vendor to provide you the original TIFF images (they're going to be huge files), and then you can use
> something like imagemagick to convert them down to JPEG for the next steps in your process.
> If you decide to do it yourself, you're going to have to be patient and it is going to take a long time to scan them all.
> You can save yourself filesize by converting the TIFFs down to monitor-ready 72dpi JPEGs once they're scanned, but to transfer the analog into
> digital initially, you want as much resolution as your scanner will offer for a color print. If you set the scanner to capture at 72dpi, your
> images will look like garbage. For preserving as much quality as possible, I'd also probably not let the scanner software do the JPEG
> conversion even if it offers that feature.
> Once scanned, I'll need to catalog the photos, describing who is depicted in each one. In the interest of ensuring that the
> metadata is associated with the photos, one thought is to embed the metadata via IPTC tags, although I haven't explored that
> option very deeply yet.
> IPTC tags are a good choice. Take a look at the tags and tools for managing EXIF/IPTC data. For library and CLI usage, exiftool is one of the
> best I know of:
> However, the photos being scanned also are mostly of people I don't know (many pictures are from the wife's side of the family)
> and the family members who would know are widely dispersed, geographically. It would be nice to have some sort of web-based, open
> source solution where I could load the images into a database of some sort and allow users to tag the pictures with details that I
> may not know ("Hey...that's Uncle Gump at our 1973 Grand Penguin Ball!")
> Anyone ever done anything along these lines?
> I'm looking for recommendations & advice for:
> * Good, reliable, and fast flat-bed scanners that are Linux-friendly
> The best I'm able to give you there is to point you at something like this: http://www.pcmag.com/article2/0,2817,2362752,00.asp
> Here's what else I'd say about that: you are free to insist on Linux if you wish. However, you may have difficulty finding software that
> provides an efficient workflow pattern. That is, I think you can use GiMP to scan things, but you'll probably want to smash your mouse with a
> big hammer trying to use GiMP for the volume you need. It simply wasn't designed for that. Chances are you'll have much better luck finding
> the right tool for volume scanning in Windows or MacOS. You may have to bite the bullet and buy a Windows license for this project. (I think
> you can d/l and use Windows 10 for 30 days before it starts harassing you.)
> To put it another way, I can use a screwdriver to apply and sand drywall mud because damit everyone in the house has a screwdriver and it's
> the best tool there is. But holy crap it's going to be a pain in the ass to fix a head-sized dent this way.
> * Preferred image formats (the IPTC aspect may reduce the number of options?)
> TIFF for the archived originals, JPEG for everything else. JPEG without question, whether you decide to keep the TIFF originals or not. As far
> as JPEG compression, 85% quality is right around the breaking point where the larger filesize penalty starts to hit without a discernible
> image quality retention. You can go up to 90%, but at 72dpi I don't think it will matter. You could go down to 80%, but it's really not worth
> it. Anything lower and your photos will start to look bad.
> I should mention that once you drop the image resolution down to 72dpi, the resulting file will not be suitable for printing. Yes, the local
> drug store will still print it for you, but the quality is going to be poor. Creating prints from digital images, especially anything larger
> than small sizes like 4x6, needs an image resolution of at least 240dpi.
> Side note: if you want a quality print, please don't go to the local drug store. They produce awful results. I mean, really awful. Use a
> professional print service like mpix or one of the others.
> I suggested dropping down to 72dpi because that's typically the highest useful dpi that can be displayed on a tv or monitor. Anything higher
> is just wasted filesize - including on a website where people are just manipulating the metadata. You can always copy the metadata from the
> screen-resolution images to your high res images on the backend.
> JPEG is an excellent format for storing photographs. TIFF is superior quality (lossless compression), but a very large file size. PNG
> compression is lossless as well, but to handle the larger color palette of a photograph means a significantly larger file size, or a smaller
> color depth which is bad for a photograph. PNGs are much better suited as GIF replacements or for screengrabs. It's not good for photos.
> JPEG can handle the EXIF/IPTC data. However, here's something important to remember about JPEG images: because the compression is lossy, every
> image edit incurs a quality penalty in the resulting saved file. If you need to make image edits (crop, color correction, etc) - do those in
> the original TIFF image and then export to JPEG. Don't open the JPEG, edit the image, save/close it, and then come back later to do more
> edits. At a low compression ratio (high quality), you can get away with doing it a few times. Even at 100% quality setting, there's still
> lossy compression and thus still a quality penalty.
> This penalty does not apply to editing the metadata using things like exiftool. It only applies to the image itself.
> * A web-based digital archive system that supports user feedback and possibly downloading of the original images
> Feedback and downloading the original, yes. There are tons of both self-hosted and "cloud-based" solutions. Of these, which allow unlimited
> access to edit the IPTC fields? That may have to be something more home grown, I'm not sure.
> If you plan to allow them to download the original from the website and you're rolling your own solution, then plan to have two different JPEG
> image files - one a medium resolution (1600px on the long edge should be good) for displaying in the webpage, and the high resolution JPEG
> file they can download. You may want the high res image to have a higher preserved dpi from the TIFF as well. 4000px on the long edge at
> 200dpi should be sufficient for most people's small print needs.
> * Any pitfalls I may encounter that I haven't yet thought about
> Remember to think about your audience when designing a UI for them. These will almost certainly be non-technical people who won't understand
> the IPTC labels, and who will probably make mistakes when entering the data. It might be helpful to design it in such a way that there's some
> editorial control. That is, if possible for example, try to ensure grandma can't accidentally overwrite Aunt June's valid/good description
> with a copy/paste of your email address. Figuring out ways to manage the data and manage the user input of your system will probably end up
> being more work than you anticipate. Then again, it's very possible someone has already solved this very thing and there's a suitable product
> out there.
> To preserve your sanity, it might be helpful to spend some time with your wife or your geographically proximate family to knock out what you
> can, before crowdsourcing to the family what you can't.
> Good luck, and let us know what you come up with.
Jim Wildman, CISSP, RHCE jim at rossberry.com http://www.rossberry.net
"Society in every state is a blessing, but Government, even in its best
state, is a necessary evil; in its worst state, an intolerable one."
More information about the colug-432