Portman Hotel Case Ensuring that the Personal Valets PV are motivated to work productively should have
... 1 TINY GENERAL PURPOSE RISC PROCESSOR The MMC should have the abilities of the typical microcontroller: · integer arithmetic · logical operations · bit manipulation · program flow control · memory operations Well-balanced controller features will allow to the MMC to be the host processor in a system as well as “real” microcontrollers. Of course, MMC should be able to be connected to the instruction and data cache memories and probably to some kind of the memory management unit. Like any modern microcontroller the MMC should provide flexible options for its capabilities enhancement either by extending the instruction set or coprocessors usage. ... Such interface should support the tight connection between the MMC and coprocessor resources and be able to link the MMC instruction pipeline with the coprocessor instruction pipeline. ... 1-1 – General Structure of Matrix Media Core As it shown in the MMC general structure diagram in the figure above, any MMC configuration should basically include Integer Unit and Load/Store Unit. ... The gray-colored special registers from the table should be excluded from the full context switch. ... This instructions work among different stages of the pipe. ... 2) Different instructions 3) Except Move and Context Restore instructions 4) Except all Call and Return instructions 5) Up to 4 NOPs can be used in parallel In general, only two IU instructions can be executed in parallel, excluding the special case of NOP and Control instructions. ... 3-3) In a case of the unsigned arithmetic the flags behavior looks different. ... In a case, when the unsigned result cannot be represented within 32 bits, the C flag actually behaves like sign extension or 33rd bit of the result. ... In a case, when the data transfer instruction is executed in parallel with the arithmetical or logical instruction and the destination register of the Data Load stage is the source register of the Execution Stage, the destination register contents is forwarding to the execution unit. ... In a particular case of the subroutines calling the IU doesn’t perform completely the appropriate instruction. ... Thus, the direct conditional branch can produce the zero-delayed flow change in a case, that a programmer will find enough instructions to fill the delay line of the branch instruction. ... 4-12 is shown a case, when the number of repeats is coming from the register. ... This optimization is based on the specific prediction, which has to be correct for each case. ... 4-13 – IU Pipeline: Delayed Repeat Single Instruction by indirect counter The figure above presents the pipeline of a case, when the prediction says that the instruction C will be executed at least once. ... 4-14 – IU Pipeline: Delayed Repeat Single Instruction by indirect operand The figure above presents the pipeline of a case, when the prediction says that the instruction B will be executed at least twice. ... 2 HW Loops When more than one instruction should be repetitively executed, or even single instruction should be executed in a loop without preventing the interrupts – the HW loops mechanism is used. ... 4-15 shows the case, when the number of iteration is defined directly by the immediate value and three instructions are executed twice under the HW loop constraint. ... Therefore, when in the general case the loop start and loop end addresses are calculated on the D stage of the LOOP instruction, the loop start address is taken from the PC register value of the previous clock. ... 4-16 – IU Pipeline: 2-instruction HW loop with direct counter In the first case, shown on the Figure 2.4-16, the HW loop mechanism behaves as usual, only all the decisions should be accepted immediately after the special HW loop registers are updated. It means, that some kind of data forwarding should be done between the registers update and immediate read the new values on the next clock. ... 4-17 – IU Pipeline: 1-instruction HW loop with direct counter In the second case, shown on the Figure 2. ... The delay aligns the end of loop condition happening and processing similarly to the previous case. ... As in a case of the directly encoded loop counter, there is desirable that the loop last instruction will enter the pipeline not earlier the loop counter operand is detected. ... 4-19 shows the case of HW loop containing 4 instructions C, D, E, F and 1 instruction B making the necessary delay. ... The delay parameter also provides the timing when the loop start address and loop end address should be stored in the LSTAR and LENAR special registers respectively. ... Number of Instructions Number of Iterations Delay Update LCR Update LSTAR Update LENAR 1 4 DL-stage DL-stage DL-stage 2 3 DL-stage RAO-stage RAO-stage 3 2 DL-stage DA-stage DA-stage 1 4 and more 1 DL-stage D-stage D-stage 1 3 DL-stage DL-stage DL-stage 2 1 DL-stage DA-stage DA-stage 2 3 and more 0 DL-stage D-stage D-stage 1 2 DL-stage RAO-stage RAO-stage 3 2 and more 0 DL-stage D-stage D-stage 1 1 DL-stage DA-stage DA-stage 4 2 and more 0 DL-stage D-stage D-stage 5 and more 1 and more 0 DL-stage D-stage D-stage Internal Architecture Spec MMC Concept Highly Confidential 35 In a case, when first loop iteration finishes before detecting the loop counter operand, the HW loop mechanism starts the next iteration regardless the contents of the LCR. This case is shown on the Figure 2. ... On the rising clock edge and in a case of the write operation – the LSU reads the source data register and drives its content on the memory data bus. ... The LSU behavior is very similar to the previous case, and only the return address detection happens on the DL/WAO stage, because only on this stage the IU reads the target branch address from the register operand and the flow change takes place. ... 5-10 shows how the LSU assists the IU in a case of the direct conditional subroutine call. ... 5-11 shows how the LSU assists the IU in a case of the indirect conditional subroutine call. This case, from the LSU point of view, is absolutely identical to the previous one. ... Like in the previous case the data memory effective read address is calculated on the DA stage, referencing the SP register value. ... Since the decision about changing the program flow is accepted on the E/WB stage, the LSU updates the SP only in a case when the program flow change takes place. ... In any case, the modifier value should be multiple of 4. ... 3 Modulo Addressing Mode Configuring For organizing a circular buffer, an application should configure a modulo addressing mode. ... Indeed, the reaction on the interrupt event is very similar to the subroutine call when the return address should be stored in the SW stack. ... In a case of the interrupt request the LSU knows the return address already on the beginning of the bubble clock, i. ... Since the LSU has 4GB of addressable memory space, the memory interface block should be configurable for providing the capabilities of mapping the on-core and off-core memory spaces. ... Before an access to one of the internal registers the APORT should be configured accordingly. ... There are two registers in every Address Comparator unit that should be configured in accordance to the memory region parameters: the Mask Register and the Address Register. ... It should contains ‘1’s in its MSB and ‘0’s in its LSB in accordance to the memory region length. ... Depending on the memory region length, it should contain in its MSB the relevant bits of the memory region start address and zero bits in its LSB. ... 30 RESTORE FULL CONTEXT Format: 16-bit: SRSTRCTXF Algebraic Notation: switch (CSR) { case LCR: case LSTAR: case LENAR: if( FirstEntrance) { FirstEntrance = FALSE; STCR. ... 37 SAVE FULL CONTEXT Format: 16-bit: SSAVECTXF Algebraic Notation: switch (CSR) { case LCR: case LSTAR: case LENAR: if( FirstEntrance) { FirstEntrance = FALSE; STCR. ... 8 RESTORE FULL CONTEXT ASSISTANCE Format: 16-bit: SRSTRCTXF Algebraic Notation: { switch (CSR) { case LCR: case LSTAR: case LENAR: if( FirstEntrance) { FirstEntrance = FALSE; STCR. ... 11 SAVE FULL CONTEXT ASSISTANCE Format: 16-bit: SSAVECTXF Algebraic Notation: { switch (CSR) { case LCR: case LSTAR: case LENAR: if(STCR. ... 12 SET POINTER CONFIGURATION Format: 16-bit: SSETCFG dstptr imm4 32-bit: SETCFG dstptr imm4 imm16 Algebraic Notation: switch(dstptr) { case RA0: AMR = (AMR & 0xFFFFFFF0) | imm4; break; case RA1: AMR = (AMR & 0xFFFFFF0F) | (imm4 << 4); break; case RA2: AMR = (AMR & 0xFFFFF0FF) | (imm4 << 8); break; case RA3: AMR = (AMR & 0xFFFF0FFF) | (imm4 << 12); break; case RA4: AMR = (AMR & 0xFFF0FFFF) | (imm4 << 16); break; case RA5: AMR = (AMR & 0xFF0FFFFF) | (imm4 << 20); break; case RA6: AMR = (AMR & 0xF0FFFFFF) | (imm4 << 24); break; case RA7: AMR = (AMR & 0x0FFFFFFF) | (imm4 << 28); break; } switch(dstptr) { case RA0: AMR = (AMR & 0xFFFFFFF0) | imm4; ACBL0 = (ACBL0 & 0xFFFF0000) | imm16 break; case RA1: AMR = (AMR & 0xFFFFFF0F) | (imm4 << 4); ACBL0 = (ACBL0 & 0xFFFF0000) | imm16; break; case RA2: AMR = (AMR & 0xFFFFF0FF) | (imm4 << 8); ACBL0 = (ACBL0 & 0x0000FFFF) | (imm16 << 16); break; case RA3: AMR = (AMR & 0xFFFF0FFF) | (imm4 << 12); ACBL0 = (ACBL0 & 0x0000FFFF) | (imm16 << 16); break; Internal Architecture Spec MMC Concept Highly Confidential 119 case RA4: AMR = (AMR & 0xFFF0FFFF) | (imm4 << 16); ACBL4 = (ACBL4 & 0xFFFF0000) | imm16 break; case RA5: AMR = (AMR & 0xFF0FFFFF) | (imm4 << 20); ACBL4 = (ACBL4 & 0xFFFF0000) | imm16 break; case RA6: AMR = (AMR & 0xF0FFFFFF) | (imm4 << 24); ACBL4 = (ACBL4 & 0x0000FFFF) | (imm16 << 16); break; case RA7: AMR = (AMR & 0x0FFFFFFF) | (imm4 << 28); ACBL4 = (ACBL4 & 0x0000FFFF) | (imm16 << 16); break; } Flags Update: F V C M Z Pipeline: AMR ACBL# Immediate Operand PA PF D Write Write imm4, imm16 DA RAO DL/WAO E/WB Legend: dstptr: RA# imm4: 4-bit configuration for the appropriate RA register imm16: 16-bit length of the circular buffer addresses by the specified RA register MMC Concept Internal Architecture Spec 120 Highly Confidential 3. ... It means that the last instruction of the delay slot should complete the 64-bit instruction word before entering the HW loop 1st instruction. If the HW loop constraint doesn’t have a delay slot, the LOOP instructions itself should either complete the 64-bit instruction word, fully occupy the 64-bit instruction word or be used with parallel instructions for full filling the 64-bit instruction word before entering the HW loop 1st instruction. ... In a case when both of parallel JUMP instruction conditions become to be true, the first JUMP instruction takes precedence. ... In this case, the original resource value is read and used as a source of an operation and after that the new value is written into the resource as a target of another operation. ... o When the DL stage destination is a source of the DL stage These exceptions happen in a case of two sequential instructions as described in the table below: Previous Next load/move store/move ALU load/move Normal flow Normal flow Normal flow store/move Normal flow Normal flow Use ALU result ALU Use loaded value Normal flow Use ALU result The second exception happens in a case of parallel instructions as described in the next table: Source Destination Store move load source = destination source = destination store N/A N/A move source = destination Forbidden PC register should not be a source or destination operand in any data transfer instruction. ... When written, MMC will behave as in a case of an unconditional relative jump instruction.