SRE Book摘录:苦逼的运维

SRE把大量重复性、手工性、没有结果的运维工作,定义为苦逼。SRE团队成员用于运维的时间不超过工作时间的一半,另一半时间用于研发。那么什么是苦逼的运维呢?

So what istoil? Toil is the kind of work tied to running a production service that
tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows. Not every task deemed toil has all these
attributes, but the more closely work matches one or more of the following descrip‐
tions, the more likely it is to be toil:

  • Manual

This includes work such as manually running a script that automates some task.
Running a script may be quicker than manually executing each step in the script,
but the hands-on time a human spends running that script (not the elapsed time)
is still toil time.

  • Repetitive

If you’re performing a task for the first time ever, or even the second time, this
work is not toil. Toil is work you do over and over. If you’re solving a novel prob‐
lem or inventing a new solution, this work is not toil.

  • Automatable

If a machine could accomplish the task just as well as a human, or the need for
the task could be designed away, that task is toil. If human judgment is essential
for the task, there’s a good chance it’s not toil.

  • Tactical

Toil is interrupt-driven and reactive, rather than strategy-driven and proactive.
Handling pager alerts is toil. We may never be able to eliminate this type of work
completely, but we have to continually work toward minimizing it.

  • No enduring value

If your service remains in the same state after you have finished a task, the task
was probably toil. If the task produced a permanent improvement in your ser‐
vice, it probably wasn’t toil, even if some amount of grunt work—such as digging
into legacy code and configurations and straightening them out—was involved.

  • O(n) with service growth

If the work involved in a task scales up linearly with service size, traffic volume,
or user count, that task is probably toil. An ideally managed and designed service
can grow by at least one order of magnitude with zero additional work, other
than some one-time efforts to add resources.

此条目发表在Common分类目录,贴了, 标签。将固定链接加入收藏夹。