Dragonfly and Hyperthreading....

Mon Feb 21 22:10:29 PST 2005

    I agree with all your comments and will add a few more negative effects
    from having a long pipeline... and that is that pipeline stall and miss
    conditions are seriously aggravated when you have a longer pipeline.
    Intel underestimated the effect branch prediction misses, register
    collisions, and main memory access delays had on their pipeline.
    Perfectly predictable, hand-optimized code can run very, very fast on an
    Intel cpu, but since most code is not perfectly predictable and is
    definitely not hand-optimized, they wound up hitting these situations
    more often then they liked.

    Also, the latches separating each pipeline stage (even using the
    trick of alternating the clock phase for each stage) impose a minimum
    of two gate delays plus clock slop plus wire routing delay and this
    puts a limit on how little logic you can have in each stage and
    still be effective.  Having fewer, larger pipeline stages can actually
    wind up being faster in some cases, especially when you are shoving
    data out to a slower unit (like main memory) which can absorb additional
    time slop.  So, e.g. if you have a pipeline stage which can output 
    either to a register or to a memory buffer it could very well be that
    the logic going to the memory buffer can exceed the nominal stage
    time without creating a problem.

    I'm not a practicing VLSI engineer :-)  I helped design a small ASIC many
    years ago, but I am a good logic board designer (mainly 68000 based
    stuff).

						-Matt