Design[edit]Instruction Subsets[edit]The minimum, mandatory set of RISC-V instructions is the integer instruction set. (Indicated with a letter "I".) This set by itself can implement a simplified general-purpose computer, with full software support, including a general-purpose compiler.[3] A computer design may add further subsets: Integer multiplication and division (set "M"), Atomic instructions for handling real-time concurrency ("A"), IEEEFloating point ("F") with Double-precision ("D") and Quad-precision ("Q") options.[3] A "privileged" instruction set defines instructions to support a UNIX-styleoperating system. There are plans for it to support hypervisors, to support virtualization.[12] A computer with all of these instruction sets, an "RVIAFDP" is said to be "general-purpose" summarized as "G".[3] There's an optional "compact" subset to reduce code size (set "C"). Many RISC-V computers might add this ISA to reduce power, code size, and memory.[3] There's also a 32-bit embedded subset ("E") that supports only 16 registers, to reduce the cost of the smallest CPUs.[3] These sets are further described by the size of the registers, i.e. 32 or 64 bits. There are small differences in each subset for different register sizes. A small 32-bit computer for an embedded system might be "RV32EC." A large 64-bit computer might be "RV64G."[3] Subsets are also planned for 128-bit computers, bit-manipulation ("B"), decimal floating-point ("L"), Packed SIMD (i.e. budget multimedia, "P"), vector processing ("V") and transactional memory ("T").[3] Register Sets[edit]RISC-V has a constant 0 in place of register zero, 31 integer registers and optionally, 32 floating-point registers. All arithmetic, bitwise-logic and subroutine calls reference only registers to avoid delays from accessing memory. A constant 0 is accessed as register 0. The assembler uses register 0 as a placeholder to make any of several human-readable instructions into one machine instruction, e.g., move rx to ry becomes add r0 to rx and store in ry.[3] Control and status registers exist, but user-mode programs can only access those used for performance-measurement. There are no instructions to save and restore multiple registers. Those were thought to be unnecessary, too complex and perhaps too slow.[3] To reduce circuitry and associated costs, very small ("embedded") RISC-V CPUs (set "E") may have only 16 of the most frequently used registers.[3] Memory Access[edit]Memory is addressed as 8-bit bytes. Loads and stores support data sizes from 8 bits to the computer's word size. Loads and stores larger than a byte need not be aligned to their natural word-width, but alignment may increase performance. (E.g. 16-bit, 2-byte data may fetch and store in less time if it is always aligned to start only on even addresses.) This feature reduces code size, and can be supported on simple CPUs with software emulation driven by an alignment failure interrupt.[3] Words larger than a single byte are "little-endian." I.e. the least significant byte has the smallest address.[3] Loads and stores can access constants in code, local variables in the stack, or items in a data structure. They calculate the address by adding a 12-bit signed offset to a base register. If the base register is zero, the data or constants can be in low memory, or high (negative offset) memory, such as ROM. RISC-V handles 32-bit constants and addresses with instructions that set the upper 20 bits of a 32-bit register. Load upper immediate lui stores 20 bits to bits 31 through 12. Another instruction, auipc generates the same 20 upper address bits by adding an offset to the program counter and storing the result into a base register. This permits position-independent code to have 32-bit addresses relative to the program counter. The base register can be used as-is with the 12-bit offsets of the loads and stores. If needed, addi can set the lower 12-bits of a register. In 64-bit ISAs,lui and auipc sign-extend the result to 64 bits.[3] So, how does RISC-V manage memory systems that are shared between CPUs or threads? A thread of execution always sees its memory operations in the programmed order. But between threads and I/O devices, RISC-V is simplified: It doesn't guarantee the order of memory operations, except by specific instructions, such asfence. A fence instruction guarantees that the results of predecessor operations are visible to successor operations of other threads or I/O devices. fence can guarantee the order of combinations of both memory and memory-mapped I/O operations. E.g. it can separate memory read and write operations, without affecting I/O operations. Or, if a system can operate I/O devices in parallel with memory, fence doesn't force them to wait for each other. A single CPU with no threads may decode fence as nop. Like many RISC CPUs, RISC-V lacks address-modes that "write back" to the registers. For example, it does not do auto-incrementing.[3] RISC-V is little-endian to resemble other familiar, successful computers. This also reduces a CPU's complexity and costs slightly less because it reads all sizes of words in the same order. For example, the RISC-V instruction set decodes starting at the lowest-addressed byte of the instruction. The specification leaves open the possibility of non-standard big-endian or bi-endian systems.[3] Some RISC CPUs (e.g. MIPS, PowerPC, DLX, Berkeley's RISC-I) place 16-bits of offset in the loads and stores. They set the upper 16 bits by a "load upper word" instruction. This permits upper-halfword values to be set easily, without shifting bits. However, most use of the upper half-word instruction makes 32-bit constants, like addresses. RISC-V uses a SPARC-like combination of 12-bit offsets and 20-bit "set upper" instructions. The smaller 12-bit offset helps compact, 32-bit load and store instructions select two of 32 registers yet still have enough bits to support RISC-V's variable-length instruction coding.[3] Subroutine Calls, Jumps and Branches[edit]RISC-V's subroutine call, "jump and link," jal places its return address in a register. This is faster in many computer designs, because it saves a memory access compared to systems that push a return address directly on a stack in memory. jal has a 20-bit signed (2's complement) offset. The offset is multiplied by 2, then added to the PC to generate a relative address to a 32-bit instruction. If the result is not at a 32-bit address (i.e. evenly divisible by 4), the CPU may force an exception.[3] RISC-V CPUs jump to calculated addresses using a "jump and link-register," jalr instruction. jalr is similar to jal, but gets its destination address by adding a 12-bit offset to a base register. (In contrast,jal adds a larger 20-bit offset to the PC.) jalr's bit format is like the register-relative loads and stores. Like them, jalr can be used with the instructions which set the upper 20 bits of a base register in order to make 32-bit branches, either to an absolute address (using lui) or a PC-relative one (using auipc for position-independent code). (Using a constant zero base address allows single-instruction calls to a small (the offset), fixed positive or negative address.) RISC-V recycles jal and jalr to get unconditional 20-bit PC-relative jumps and unconditional register-based 12-bit jumps. Jumps just make the linkage register 0 so that no return address is saved.[3] RISC-V also recycles jalr to return from a subroutine: To do this, jalr's base register is set to be the linkage register saved by jal or jalr. jalr's offset is zero and the linkage register is zero, so that there is no offset, and no return address is saved. Like many RISC designs, in a subroutine call, a RISC-V compiler must use individual instructions to save registers to the stack at the start, and then restore these from the stack on exit. RISC-V has no "save multiple" or "restore multiple" register instructions. These were thought to make the CPU too complex, and possibly slow, as well.[31] RISC-V has no condition codes or carry bit. The designers believed that condition codes make fast CPUs more complex by forcing interactions between instructions in different stages of execution. This choice makes multiple-precision arithmetic more complex. Also, a few numerical tasks need more energy.[3] Instead, RISC-V has short branches that perform comparisons: equal, not-equal, less-than, unsigned less-than, greater-than and unsigned greater-than. Ten comparison-branch operations are implemented with only six instructions, by reversing the order of operands in the assembler. For example, "branch if greater than" can be done by "less than" with a reversed order of operands.[3] The comparing branches have a twelve-bit signed range, and jump relative to the PC.[3] RISC-V's ISA requires default branch predictions for CPUs: Backward conditional branches should be predicted "taken." Forward conditional branches predict "not taken." The predictions are easy to decode in a pipelined CPU: Branch addresses are signed numbers added to the PC. Backward branches have negative two's complement addresses, and therefore have a one in the most significant bit of the address. Forward branches have a zero. The most significant bit is in a fixed location in the operation code in order to speed up the pipeline. Complex CPUs can add branch predictors to work well even with unusual data or situations. The ISA manual recommends that software be optimized to avoid branch stalls by using the default branch predictions. This reuses the most significant bit of the signed relative address as a "hint bit" to tell whether the conditional branch will be taken or not. So, no other hint bits are needed in the operation codes of RISC-V branches. This makes more bits available in the branch operation codes. Simple, inexpensive CPUs can merely follow the default predictions and still perform well with optimizing compilers. Compilers can still perform statistical path optimization, if desired.[3] To avoid unnecessary loading of branch prediction electronics, (and therefore unnecessary pipeline stalls) the comparing branch codes should never be used for unconditional jumps.[3]
|