RISC and CISC

...ghtforward cases, like RISC processors, this can be done in two steps: fetching the referenced address registers and calculating the effective address. Once the effective address is available, the next step is usually, to forward the effective(virtual) address to the MMU for translation and to access the data cache. Here, and in the subsequent discussion, we shall not go into details of whether the referenced cache is physically or virtually addressed, and thus we neglect the corresponding issues. Furthermore, we assume that the referenced data is available in the cache and thus it is fetched in one or a few cycles. Usually, fetched data is made directly available to the requesting unit, such as the FX and FP unit, through bypassing. Finally, the last subtask to be performed is writing the accessed data into the specified register. For store instruction, the address calculation phase is identical to that already discussed for loads. However, subsequently both the virtual address and the data to be stored can be sent out in parallel to the MMU and the cache, respectively. This concludes the processing of the store instruction. RISC pipelining is a very effective technique for speeding up instruction execution along a sequential path. But if a branch enters the pipeline and disrupts the sequential processing, the performance of the pipeline will be seriously impeded unless appropriate techniques are used. In order to demonstrate the problems that cause branches in pipelining, let us consider the execution of an unconditional branch in a pipeline, with reference to Figure 1.1.6. In Figure 1.1.6, we show how an unconditional branch is executed on a traditional RISC pipeline when no special care is taken to improve the efficiency. The pipeline is assumed to process instructions in four subsequent cycles, that is, in the consecutive fetch(F), decode(D), execute(E) and write back(WB) cycles. Then the target address(TA) of the branch will be computed during the E cycle, as depicted in Figure 1.1.6(a). Now, let us consider the execution of a simple instruction sequence containing an unconditional branch(B), with reference to Figure 1.1.6(b). Here, for simplicity, let us suppose that each of the indicated instructions can be processed in four consecutive cycles. When the given instruction sequence is executed using straightforward pipelining the following happens (Figure 1.1.6(c)). The pipeline, as an assembly line, continues processing subsequent instructions until a branch(B) is detected during its decoding in ti+2 . In the next cycle( ti+3) the target address will be calculated and the sequential processing interrupted. Since the target address becomes known at the end of the E cycle, the pipeline can start fetching the first target instruction(it1) only in the cycle ti+4. However, up to this time two sequential instructions following the unconditional jump( i3 and i4) have already entered the pipeline. Of course, these instructions must be cancelled. Thus, in our example, processing an unconditional branch causes a two-cycle penalty. CISC Pipeline Now, we overview the layout of FX pipelines in CISC processors. CISC pipelines differ from RISC pipelines, mainly in that CISC pipelines must be able to process both register and memory operands and destinations. In order to access a memory operand (which is supposed to be in the cache), two additional subtasks are carried out: calculating the operand address and fetching the operand (presumably from the cache). Therefore, a traditional CISC pipeline, which is laid out to execute register-memory instruction effectively, contains two more stages than a traditional RISC pipeline. As illustrated in Figure 1.3.1, such a pipeline consists of the following six stages: instruction fetch(F), decode(D), address calculation(A), cache access(C), execute(E) and writing back the result into the register file(WB). It is used by several CISC processors such as the MC 68040 and MC 68060. Apparently, this pipeline can also be easily used to execute register-register and load/store instructions (Figure 1.3.2). In order to execute register-register instructions, the referenced register operands are fetched in the D cycle, while the A and C cycles remain unused. Subsequently, the required operation is performed in the E cycle, and the result is written back into the register file during the concluding WB cycle. We note here that unused internal cycles do not directly affect performance, since they do not cause pipeline stalls. Nevertheless, in reality, a slight performance decrease can be expected because of the increased frequency of dependencies due to the higher number of pipeline stages. For load instructions, the referenced address registers, if any, are fetched in the D cycle, and the memory address is generated in the A cycle. Then, the addressed data is fetched from the cache in the C cycle, and during the WB cycle the fetched data is written into the reg...

Essay Information


Words: 1557
Pages: 6.2
Rating: None

All Papers Are For Research And Reference Purposes Only. You must cite our web site as your source.