Cache design in multiprocessors

... one tag line and that would be the fastest in terms of search time. Of course, to have only one address stored in cache would decrease the probability by the inverse of the number it was changed from. In determining the efficiency of a single processor cache size the cache hit ratio offers a reflection on how often the cache is actually being accessed usefully. The cache hit ratio is a probability of how often data will be in cache if the processor accessed it at any time. To increase a cache hit ratio the cache needs to have more tag lines and therefore it will be likely that the data that needs to be retrieved will be in cache. Again, this reasoning in extreme will yield near a perfect hit ratio, but the data will be extremely small and memory would have to be repeatedly accessed. With smaller sections of memory, internal fragmentation is more likely to occur, resulting in wasted space within our expensive cache. A large number of tag lines gives a greater cache hit ratio, in effect it substantially increases the search time of the cache and overall the total efficiency. The compromise of each variable is extremely likely to lie in the middle of these two extremes; it is very dependant on the sizes and types of data that the system will be working with. Could the speed of the entire system be increase in other ways? The recent rise in the popularity of shared memory multiprocessor systems (MIMDs) has created a demand for suitable techniques and hardware designs to combat cache coherence problems for large scale MIMD systems. Cache coherency is a problem in multiprocessor systems because each processor is connected to the same main memory yet each processors possesses its own cache. If one processor were to write back/through to main memory a datum that is stored in another processor’s cache then that datum is not consistent with the information in main memory. To maintain cache consistency for MIMD systems, a variety of techniques are being explored some of which are suitable for small MIMD systems and others that are actually scalable to systems with upwards of 20 processors or more. Efficiency in cache design is quite important; the cache memory of each processor is very expensive, however it is very fast. So, the best utilization of the cache memory can increase the total performance of a machine (a measure of cost versus size). The phenomena of cache coherence can simply be demonstrated with just two processors (figure below) and is more prevalent in systems with a greater number of processors. For instance, if a variable is stored in both of the caches for Processor A and Processor B, then it is potentially at risk to be inconsistent. When Processor A needs to manipulate F(X) or update it to something other than 3.14 during a calculation, the definition of F(X) will be updated in main and the cache memory of Processor A. This means that the definition of F(X) that resides in Processor B’s cache still holds 3.14 and if the same program or a new program needs to access this variable the definition of F(X) will be different in each processor leading to two different answers, one correct (accessing of Processor A‘s cache value of F(X)) and one incorrect (accessing of Processor B‘s cache value of F(X)). The techniques employed in small scale MIMDs are not suitable for large scale MIMDs which have many more processors. Broadcast-based protocols are implementations suitable for small scale multiprocessor systems which are systems that allow for all the processors to be mapped to a common bus. This initial trend to prevent cache inconsistencies involved techniques labeled as “snoopy” or ”snoop” cache protocols. Broadcast-based/snoopy cache protocols are designed to be quite active and “snoop” every cache after each memory transaction in order to maintain that all the data in each of the processor’s cache is consistent with correlating data in the shared memory and the systems disk. A multiple cache system is said to be consistent or coherent if the system’s caches can maintain this state after each memory write. Because all caches need to snoop after each memory transfer, this architecture can be implemented only if all the processors are on the same bus, but this of course can only be done if all the processors can actually map to one bus. If a system needs to employ more processors than can fit on the bus then obviously a Broadcast-based protocol is not scalable to that system. In fact, many Broadcast-based protocols allow only one processor to write to memory and each other processor is only permitted to read memory; this property is not scalable to many large scale MIMD systems in which operations other than simple calculations are needed and the number of write backs necessary is proportional to the system’s size. An obvious dichotomy of solutions arises to the cache coherence problem with the use of Broadcast-based protocols where one approach is to invalidate stored cache data once it has been manipulate and the other approach is to update the manipulated data in each processor’s cache. Broadcast-based protocols allow cache memory lines to exhibit one of four basic states. Each of the four states describes the exclusiveness, either exclusive or shared, and the state of the cached data, either altered or unaltered. Through labeling lines of memory the processor will know whether or not the data may be inconsistent, that is altered and shared. The permutation of the two variables yields four distinct possibilities (some models observe five): Exclusively owned and unaltered: This state ensures that no other processor has corrupted the cache memory line with this tag. When cache memory is placed into this state it resembles the state that any single processor system...

Essay Information


Words: 1964
Pages: 7.9
Rating: None

All Papers Are For Research And Reference Purposes Only. You must cite our web site as your source.