Fault-Tolerance Techniques for High-Performance Computing

This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correcti...

Full description

Corporate Author: SpringerLink (Online service)
Other Authors: Herault, Thomas. (Editor, http://id.loc.gov/vocabulary/relators/edt), Robert, Yves. (Editor, http://id.loc.gov/vocabulary/relators/edt)
Language:English
Published: Cham : Springer International Publishing : Imprint: Springer, 2015.
Edition:1st ed. 2015.
Series:Computer Communications and Networks,
Subjects:
Online Access:https://doi.org/10.1007/978-3-319-20943-2