Search This Blog

Thursday, 20 February 2014

Advanced COmputer Architecturee

Distributed Shared-Memory Architectures

  • Distributed shared-memory machines need cache coherence for the same reasons that centralized shared-memory machines need it.
 
  • However, due to the interconnection network and scalability requirements, centralized protocols have drawbacks in these architectures.
 
  • There are several options:
  • No coherence
    • Instead, focus on scalability (Cray T3D).
    • In this scheme, only data that actually resides in the private memory may be cached (shared data is marked uncacheable .)
 
    • Coherence is maintained by software which has several disadvantages.
 
  • Compiler mechanisms are very limited.
      • They have to be very conservative, e.g. treat a block on another processor as dirty even though it may not be.
      • This results in excessive coherence overhead (extra fetching).

Distributed Shared-Memory Architectures

  • Disadvantages of software implemented coherence:
  • Multiple words in a block provide no advantage.
      • Software coherence must be run each time a word is needed.
      • The advantage of spatial locality (the "prefetch" of other words in the block) is lost in single word fetching.
 
  • Latencies to remote memory are relatively high.
      • Remote references can take 50 - 1000 CPU cycles, making coherency "misses" a very costly proposition.
 
  • Snooping
    • The lack of scalability of snooping coherence schemes is a problem in DSMs.
      • The distributed nature of the snooping protocol's data structure (which maintains the state of the cache blocks) does NOT scale well.
 
    • Snooping requires broadcast (communication with all caches on every miss) which is very expensive with an interconnection network.

Directory-based Cache Coherence Protocols

  • Directory-based coherence
    • Information kept in the directory:
  • The state of every block in memory, e.g. shared, uncached or exclusive.
      • For exclusive, the block has been written, is in one cache and memory is out-of-date .
      • This information is also keep in the cache for efficiency reasons.
 
  • Which caches have copies.
      • Can be implemented using a bit vector for each block with the processors identified by the bit's position.
 
  • Whether or not the block is dirty.
 
    • The amount of information in the directory is proportional to:
 
      • This works O.K. for less than 100 processors -- other solutions are needed for >100 processors.

Directory-based Cache Coherence Protocols

  • The directory entries can also be distributed along with the memory.
    • The high order bits of an address can be used to identify the location of the memory and directory entries for that portion of the memory.
  • This structure avoids broadcast.

Directory-based Cache Coherence Protocols

  • Two basic primitives that must be handled:
  • Handling a read miss.
  • Handling a write to a shared, clean cache block.
 
  • Handling a write miss is a combination of these two.
 
  • Our simplifying assumptions still hold here.
    • Writes to non-exclusive data generate write misses.
    • Write misses are atomic (processors block until the access completes).
 
  • This introduces two complications:
    • Since there is no longer a bus, there is no single point of arbitration.
 
    • Since broadcast is to be avoided, the directory and cache must issue explicit response messages, e.g., invalidate and write-back request messages.

Directory-based Cache Coherence Protocols

  • The states and transitions at each cache are identical to the snooping protocol.
    • The actions are somewhat different however.
 
  • First let's look at the message types:
    • Local node : Where the request originates.
    • Home node : Where memory and directory live.
    • Remote node : Node that has a copy of the block (exclusive or shared).
    • Message type
      Source
      Destination
      Con-tents
      Function
      Read miss
      Local cache
      Home directory
      P, A
      P has a read miss at addr A; request data and make P a read sharer.
      Write miss
      Local cache
      Home directory
      P, A
      P has a write miss at addr A; request data and make P exclusive owner.
      Invalidate
      Home directory
      Remote cache
      A
      Invalidate a shared copy of data at addr A.

Directory-based Cache Coherence Protocols

  • More message types:
  • Message type
    Source
    Destination
    Con-tents
    Function
    Fetch
    Home directory
    Remote cache
    A
    Fetch block at addr A and send to home directory; change the state of A in the remote cache to shared.
    Fetch/invalidate
    Home directory
    Remote cache
    A
    Fetch block at addr A and send to home directory; invalidate the block in the cache.
    Date value reply
    Home directory
    Local cache
    Data
    Return a data value from the home memory.
    Data write back
    Remote cache
    Home directory
    A, data
    Write back a data value for addr A.

Directory-based Cache Coherence Protocols

  • The protocol actions to which an individual cache responds.

Directory-based Cache Coherence Protocols

  • Actions taken by the directory in response to messages received.

Directory-based Cache Coherence Protocols

  • Directory operation
    • The directory can receive three kinds of messages:
  • Read miss
  • Write miss
  • Write-back
 
  • Uncached state
    • When the block is uncached, the directory can only receive two kinds of messages: read miss and write miss .
 
    • A read miss moves the block into the shared state.
    • A write miss moves it into exclusive .
 
    • In either case, the directory updates its list of sharing nodes to include only the node that requested the data.

Directory-based Cache Coherence Protocols

  • Shared state
    • Again, only read or write misses are possible, since all caches have the same value as memory.
 
    • If it is a read miss , the node requesting the data is added to the list of sharing nodes.
 
    • If it's a write miss :
  • The block is moved to the exclusive state.
  • Invalidate messages are sent to all current sharing nodes.
  • The sharing list is updated to only the requesting processor.
 
  • Exclusive state
    • On a read miss , the owner node is sent a fetch message.
      • This tells the node to write its data back to memory.
      • The requesting node is added to the sharing list, and the block is marked as shared.

Directory-based Cache Coherence Protocols

  • Exclusive state
    • If it's a write miss , the block must be written back by the current owner, so the directory sends out a fetch message.
 
    • When the data is written, the directory forwards it to the new owner and replaces the old owner with the new owner in the sharing list.
 
    • On a write-back , the data is updated in memory and the block goes into the uncached state.
      • The sharing list is cleared.
 
    • One obvious optimization is to have the old owner send the data directly to the new owner on a write miss.
      • This can be done either instead of or in addition to writing the data to the home.

Direc tory-based Cache Coherence Protocols

  • Issues:
  • When read-only data is replaced
    • Note that this scheme does not explicitly notify the directory when a clean block is replaced in the cache.
 
    • This is fine, since the cache will simply ignore invalidate messages for blocks that are not currently cached.
 
    • The only problem is that it will cause the directory to send out a few unnecessary messages
      • But that is probably not as bad as having the remote caches send a message each time they replace a block.

Directory-based Cache Coherence Protocols

  • Issues:
  • Synchronization
    • Deciding the order of accesses in a distributed memory system is much harder.
 
    • Without a shared bus, it is impossible to tell which writes come first.
 
    • It is not feasible to stall all accesses until a write completes.
 
    • Often, this can be handled by requiring all writes to be atomic .
      • But doing so slows down the system greatly.

No comments:

Post a Comment