May 9, 2020 · 1 min read · 199 words · engineering
A fault is defined as a component of a system deviating from its spec. A failure is when the system as a whole stops providing the required service to the user. Systems that can anticipate faults and cope with them are fault-tolerant or resilient. Obviously, we can't anticipate every type of fault so it only makes sense to talk about certain types.
MTTF) of about 10 to 50 yearsCarefully thinking about assumption and interactions made in the software can help
Some tools exist to try and trigger faults as a way to see where the weak points in the application are.
This post was based on the book Designing Data-Intensive Applications by Martin Kleppmann.
Previous
Make Git Forget About Files Previously Tracked
Next
Preserving leading zeros when writing CSV files for Excel
Authored by Anthony Fox on May 9, 2020
Have comments or feedback? I'd love to hear from you.