Saturday, 18 June 2011

The Data Movement Engine (DME)

The PPU has a potentially vast amount of floating point power available, this however is of no use unless all the floating point units can be kept fed with data, the Data Movement Engine is responsible for doing this.
The DME is comprised of 5 memory control units, an external memory controller, a PCI bus interface and a “switch fabric”.
The switch fabric is a network of switches and busses which allow all the different units to talk to one another.  In this case the switch fabric has 7 x 256 bit bidirectional ports, the number of units which can be talking simultaneously is not specified.
The work done in the Data Movement Engine is controlled by a series of 5 memory control units.  4 of these are connected to the Vector Processor Elements in the Floating Point Engine, the 5th is connected to the PPU Control Engine.
Each memory control unit contains a block of RAM and it moves data to and from it.  This will mainly involve passing data to and from the external RAM and the vector processor element it is connected to, with the memory controller unit’s RAM acting as a buffer in between.  They are not limited to this however and can also move data to and from the other memory controller units and the PCI bus.
You may wonder why it doesn’t just move data directly to or from the vector processors but this is done to make the usage of the external memory bus as efficient as possible.  Moving data in big chunks is faster than moving data in small chunks so doing this will increase performance.  Keeping data in on-chip buffers also allows data to be moved around the chip without going to main memory, again saving memory bandwidth.
The connections to the Switch Fabric and the Vector Processor Elements are separate so it looks like two types of communication can be operating simultaneously.  For example, data could be written to one of the Vector Processing Elements while other data is being read in from external memory.
The Floating Point Engine (FPE)
The Floating Point Engine is the part of the PPU which does the real work, it performs all the actual physics calculations.
The FPE is made up of 4 Vector Processor Engines (VPE) and each of these is in turn made up of 4 Vector Processor Units, giving you in effect 16 vector processing cores.
The Vector Processing Units are not normal CPU cores but do contain some of the components normally found inside them along with some decidedly non-standard units.
All the data processing is done on 32 bit values stored in 16 floating point registers or 8 integer registers, there are likely other registers for program control and predication (a technique used in place of branches).
The execution unit appears to do vector processing with 6 elements whereas the normal is 4, this unit also contains a standard integer processing unit.  It is not described in any detail in the patent but if it is anything like variant 1 the execution unit will use a hybrid processing model.  This will issue a single integer instruction and a 6 part vector instruction as a single VLIW (Very Long Instruction Word) instruction.
The Vector Processor Units also contain a set of internal memories one of which is dedicated to storing the program being executed.  There is also an “Inter-Element Memory” which is used to store data for processing.  This is really a pair of memory blocks (A and B).  At one point bank A is accessed by the processor while bank B can be accessed by the Memory Control Unit.  When the processing and any data transfer is complete the access to these memories “switch” and the processor uses bank B while bank A is accessed by the memory control unit.  This technique allows both memories to be accessed at full speed simultaneously, it is in effect a hardware double buffer.

0 comments:

Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | belt buckles