[colug-432] file replication across data centers

Angelo McComis angelo at mccomis.com
Wed Feb 1 18:34:11 EST 2017


Thanks for the kind words, Rick.

After hitting send and seeing your addition, a couple things came to mind
that are important to point out.


On Wed, Feb 1, 2017 at 5:22 PM, Rick Troth <rmt at casita.net> wrote:

>
> *I must recommend RSYNC* as a first course for proving things or for
> in-a-pinch replication needs. It's like option C since RSYNC is just an
> application. One feature of RSYNC is that it can avoid sending bits which
> match what the receiver already has. I also like that it can be told to not
> step on a newer copy (if source and target may each have newer versions of
> some files).
>
>
RSYNC is great. It's been around for years, it's well-documented, and super
reliable. But the only downside of RSYNC (or on Windows, for those that
care, RSYNC's distant cousin, ROBOCOPY) is you have to kick it off each
time you want to replicate.  One could create a wrapper that 1) checks if
rsync is already running (exit if so), then ​2) starts the rsync process to
do the replication. Bi-Di replication works too, but you'll have to go out
of your way if you ever want to delete a file (which you also mentioned).
RSYNC is tried and proven, and I've seen more than one organization use a
script ran by a cron job to replicate their filesystems across a distance.
Spacing a timed RSYNC out at like 15 minute intervals would expose you to
some data loss, but at the same time, provide a little bit of a safety net
in the case of an accidental deletion. (unless you delete the file right as
the script is kicking off, and then you're hosed)


> I've *also seen option A used* in rigorous off-site and heavily exercised
> D/R. (We're talking greater than 300 miles.) Options A and C both allow for
> any filesystems you might need or choose. (Not that there's anything wrong
> with option B, FS based solutions.) Option A often requires that you go
> deep with a specific storage vendor or service.
>
>
​There are many ways to deploy "enterprise storage" -- DR is the primary
use case for ​this, rather than HA (high availability), and to make it
really bulletproof, one would employ a 3-site configuration, with a
primary, a close-enough for synchronous replica, and then a farther-away
asynchronous replica (near-near-far as I refer to it). This gives you a
nearby synchronous mirror of the data, and meets most organization's
requirement that the DR copy be geographically disparate. This also
requires you commit heavily to a storage vendor and have matching kit in
all locations. Which, of course, depends on how much of an investment the
company or organization is willing to make. (read: how much revenue is
exposed if the data weren't recoverable in the case of an outage) RPO and
RTO factor heavily into what kind of design is needed. Near-near-far is
pretty much the only guaranteed way to get zero-RPO that I know of. If
someone comes to the meeting and says "I can never lose data, ever" this is
the prescription for that requirement.  If RPO is greater than 0 but still
measured in minutes? One could choose a Near-Far 2-site configuration,
replicating asynchronously and probably still meet that requirement.

The option A work that I was involved in was for a former employer. Most of
> the exercise was truly outstanding, excellent work. But procedures
> historically included *applying updates via tape* (after snap-shot of the
> storage across the 300-mile link). I am just not a fan of tape anymore for
> a variety of reasons. Was annoyed that the D/R exercise would consistently
> burn many hours for all the tape work, but was not the decision maker.
>
>
Good point here about tape in this context - offsite replication is NEVER a
substitute for a rigorous backup routine. If someone borks a file, or
worse, a database in the primary site and that data is being replicated,
that far copy is equally useless. Don't let anyone ever talk you out of
backups for the sake of replication and snapshots and all the features you
have available these days.  (Same goes for RAID -- RAID is not backup,
either.) You may have some ability to undo a database change, by halting
the database, rolling back log entries to the point of the database being
un-borked, but that's both time-consuming and not always reliable.


Best,
Angelo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.colug.net/pipermail/colug-432/attachments/20170201/0d39d44f/attachment.html 


More information about the colug-432 mailing list