Drexel University Home Pagewww.drexel.edu DREXEL UNIVERSITY LIBRARIES HOMEPAGE >>
iDEA DREXEL ARCHIVES >>

iDEA: Drexel E-repository and Archives > Drexel Theses and Dissertations > Drexel Theses and Dissertations > Fault-tolerant distributed shared memory on a broadcast-based interconnection architecture

Please use this identifier to cite or link to this item: http://hdl.handle.net/1860/48

Title: Fault-tolerant distributed shared memory on a broadcast-based interconnection architecture
Authors: Hecht, Diana Lynn
Keywords: Fault-tolerant computing;Distributed shared memory;Multiprocessors;Electrical and computer engineering
Issue Date: 16-Jan-2003
Publisher: Drexel University
Abstract: This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Multiprocessors, and on the performance impact of implementing fault tolerance protocols that allow for Backward Error Recovery through the use of synchronized checkpointing. High Performance Parallel computing systems that implement Distributed Shared Memory (DSM) require interconnection networks capable of providing low latency and high bandwidth and efficient support for multicast and synchronization operations. Software-based DSM systems rely on the operating system to manage the replicated memory pages and consequently their performance suffers due to operating system overhead, false sharing and page thrashing. In order to obtain high levels of performance, the activities related to maintaining the consistency of shared data in a DSM should be implemented in hardware so that latencies for data access can be minimized. The recoverable DSM system examined in this thesis is intended for the class of broadcast-based interconnection networks in order to provide the low latencies required for the application workloads characteristic of DSM.An example of this class of interconnection network is the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus). The unique architecture of the SOME-Bus provides for strong integration of the transmitter, receiver, and cache controller hardware to produce a highly integrated system-wide coherence mechanism. This thesis presents four protocols for fault-tolerant DSM and uses simulation and theoretical analysis to examine the performance of the protocols on the SOME-Bus multiprocessor. The proposed fault tolerance protocols exploit the inherent data distribution operations that occur as part of the management of shared data in DSMs in order to hide the overhead of fault tolerance. The increased availability of shared data for the support of fault tolerance can be used to enhance the performance of the DSM by increasing the likelihood that a request for data can be filled locally without requiring communication with remote nodes.
URI: http://dspace.library.drexel.edu/handle/1860/48
Appears in Collections:Drexel Theses and Dissertations

Files in This Item:

File Description SizeFormat
hecht_thesis.pdf3.72 MBAdobe PDFView/Open
View Statistics

Items in iDEA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! iDEA Software Copyright © 2002-2010  Duraspace - Feedback