[colug-432] Photo digitization recommendations?

Rick Hornsby richardjhornsby at gmail.com
Mon Jan 16 21:40:04 EST 2017


I'll give you my feedback as a photographer, but not necessarily as someone
who has tried to come up with a shared/distributed end-to-end system like
you're aiming at here.

On January 12, 2017 at 12:32:19, Peter Kukla (fruviad at yahoo.com) wrote:

Hi COLUGers,

I have hundreds -- maybe thousands -- of photographs that I'd like to
scan.  My goal is to digitize the whole batch and distribute them to the
family members of those in the pictures.

For the "thousands" volume you're looking at, you won't want a flatbed
scanner. It is going to take you freaking forever. Even if you put multiple
images on the scanner bed at once, you still have to now digitally cut them
back into individual photographs.

You definitely _do_ want a flatbed scanner for old, damaged, or delicate
prints that can't be put through an automatic document feeder type system.
If you use an ADF, make sure it is one designed for photographs. A normal
sheet feeder may mangle your one-of-a-kind print.

Photographers digitizing images will sometimes use an actual camera to
photograph the print. This is requires a light box, and is labor intensive,
but can provide a very high quality result and can reduce the risk to the
print.
With that said, I strongly suggest finding a local vendor that can do the
digitizing for you. Unless you just have a lot of time on your hands.

I have a Linux-based household, so the hardware & software would need to be
Linux-friendly.  Hence my bugging you guys.

The project will require a fast scanner, given the number of photos in
question.  Waiting 2-3 minutes for each scan to complete is fine if I only
have a few pages to scan, but not for a large project like this.

You can't do a fast _and_ quality scan of a print photograph with off the
shelf gear. Remember that a print is an analog medium with (a
comparatively) massive resolution. Your best bet, honestly, is to give your
prints to a vendor that specializes in digitizing them. From there, you can
take the digital versions and do whatever you like with them - cataloging,
setting metadata, allowing others to set metadata through a web UI, etc.
You can ask the vendor to provide you the original TIFF images (they're
going to be huge files), and then you can use something like imagemagick to
convert them down to JPEG for the next steps in your process.

If you decide to do it yourself, you're going to have to be patient and it
is going to take a long time to scan them all.

You can save yourself filesize by converting the TIFFs down to
monitor-ready 72dpi JPEGs once they're scanned, but to transfer the analog
into digital initially, you want as much resolution as your scanner will
offer for a color print. If you set the scanner to capture at 72dpi, your
images will look like garbage. For preserving as much quality as possible,
I'd also probably not let the scanner software do the JPEG conversion even
if it offers that feature.

Once scanned, I'll need to catalog the photos, describing who is depicted
in each one.  In the interest of ensuring that the metadata is associated
with the photos, one thought is to embed the metadata via IPTC tags,
although I haven't explored that option very deeply yet.


IPTC tags are a good choice. Take a look at the tags and tools for managing
EXIF/IPTC data. For library and CLI usage, exiftool is one of the best I
know of:

http://www.sno.phy.queensu.ca/~phil/exiftool/

However, the photos being scanned also are mostly of people I don't know
(many pictures are from the wife's side of the family) and the family
members who would know are widely dispersed, geographically.  It would be
nice to have some sort of web-based, open source solution where I could
load the images into a database of some sort and allow users to tag the
pictures with details that I may not know ("Hey...that's Uncle Gump at our
1973 Grand Penguin Ball!")

Anyone ever done anything along these lines?

I'm looking for recommendations & advice for:

    * Good, reliable, and fast flat-bed scanners that are Linux-friendly

The best I'm able to give you there is to point you at something like this:
http://www.pcmag.com/article2/0,2817,2362752,00.asp
Here's what else I'd say about that: you are free to insist on Linux if you
wish. However, you may have difficulty finding software that provides an
efficient workflow pattern. That is, I think you can use GiMP to scan
things, but you'll probably want to smash your mouse with a big hammer
trying to use GiMP for the volume you need. It simply wasn't designed for
that. Chances are you'll have much better luck finding the right tool for
volume scanning in Windows or MacOS. You may have to bite the bullet and
buy a Windows license for this project. (I think you can d/l and use
Windows 10 for 30 days before it starts harassing you.)

To put it another way, I can use a screwdriver to apply and sand drywall
mud because damit everyone in the house has a screwdriver and it's the best
tool there is. But holy crap it's going to be a pain in the ass to fix a
head-sized dent this way.

    * Preferred image formats (the IPTC aspect may reduce the number of
options?)

TIFF for the archived originals, JPEG for everything else. JPEG without
question, whether you decide to keep the TIFF originals or not. As far as
JPEG compression, 85% quality is right around the breaking point where the
larger filesize penalty starts to hit without a discernible image quality
retention. You can go up to 90%, but at 72dpi I don't think it will matter.
You could go down to 80%, but it's really not worth it. Anything lower and
your photos will start to look bad.

I should mention that once you drop the image resolution down to 72dpi, the
resulting file will not be suitable for printing. Yes, the local drug store
will still print it for you, but the quality is going to be poor. Creating
prints from digital images, especially anything larger than small sizes
like 4x6, needs an image resolution of at least 240dpi.

Side note: if you want a quality print, please don't go to the local drug
store. They produce awful results. I mean, really awful. Use a professional
print service like mpix or one of the others.

I suggested dropping down to 72dpi because that's typically the highest
useful dpi that can be displayed on a tv or monitor. Anything higher is
just wasted filesize - including on a website where people are just
manipulating the metadata. You can always copy the metadata from the
screen-resolution images to your high res images on the backend.

JPEG is an excellent format for storing photographs. TIFF is superior
quality (lossless compression), but a very large file size. PNG compression
is lossless as well, but to handle the larger color palette of a photograph
means a significantly larger file size, or a smaller color depth which is
bad for a photograph. PNGs are much better suited as GIF replacements or
for screengrabs. It's not good for photos.

JPEG can handle the EXIF/IPTC data. However, here's something important to
remember about JPEG images: because the compression is lossy, every image
edit incurs a quality penalty in the resulting saved file. If you need to
make image edits (crop, color correction, etc) - do those in the original
TIFF image and then export to JPEG. Don't open the JPEG, edit the image,
save/close it, and then come back later to do more edits. At a low
compression ratio (high quality), you can get away with doing it a few
times. Even at 100% quality setting, there's still lossy compression and
thus still a quality penalty.

This penalty does not apply to editing the metadata using things like
exiftool. It only applies to the image itself.

    * A web-based digital archive system that supports user feedback and
possibly downloading of the original images

Feedback and downloading the original, yes. There are tons of both
self-hosted and "cloud-based" solutions. Of these, which allow unlimited
access to edit the IPTC fields? That may have to be something more home
grown, I'm not sure.

If you plan to allow them to download the original from the website and
you're rolling your own solution, then plan to have two different JPEG
image files - one a medium resolution (1600px on the long edge should be
good) for displaying in the webpage, and the high resolution JPEG file they
can download. You may want the high res image to have a higher preserved
dpi from the TIFF as well. 4000px on the long edge at 200dpi should be
sufficient for most people's small print needs.

    * Any pitfalls I may encounter that I haven't yet thought about

Remember to think about your audience when designing a UI for them. These
will almost certainly be non-technical people who won't understand the IPTC
labels, and who will probably make mistakes when entering the data. It
might be helpful to design it in such a way that there's some editorial
control. That is, if possible for example, try to ensure grandma can't
accidentally overwrite Aunt June's valid/good description with a copy/paste
of your email address. Figuring out ways to manage the data and manage the
user input of your system will probably end up being more work than you
anticipate. Then again, it's very possible someone has already solved this
very thing and there's a suitable product out there.

To preserve your sanity, it might be helpful to spend some time with your
wife or your geographically proximate family to knock out what you can,
before crowdsourcing to the family what you can't.

Good luck, and let us know what you come up with.

-rj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.colug.net/pipermail/colug-432/attachments/20170116/2abb22a0/attachment-0001.html 


More information about the colug-432 mailing list