git: nvme - Fix interrupt pin support when MSI-X is unavailable.

Matthew Dillon dillon at
Fri May 19 11:10:51 PDT 2017

commit 2d7468370b6e6292608352c8d698d50d72df3453
Author: Matthew Dillon <dillon at>
Date:   Fri May 19 10:55:10 2017 -0700

    nvme - Fix interrupt pin support when MSI-X is unavailable.
    * Real hardware (so far) all supports MSI-X, but VMs emulating NVMe
      have been found not to.
    * Fix numerous assertions that were getting hit due to the non-MSI-X
      case not installing the sc->cputovect[i] mapping.
      Install a fake cputovect[] mapping.  This mapping is primarily to allow
      multiple submission queues (per-cpu when possible).  Completion queues
      will be further limited to reduce loop-check overheads.
    * For the non-MSI-X case, limit the number of completion queues to 4,
      since there is really no point having more there being only one interrupt
      vector.  We use 4 to allow the chipset side to run optimally even though
      it is not necessarily useful to have that many on the cpu side.  Though
      to be fair, in cases where the cpu-side driver polls for completions,
      having multiple completion queues CAN help even if there is only one
      interrupt as each completion queue is separately locked.
    * Properly set the interrupt masking registers in the non-MSI-X case
      (probably not needed).  Note that these registers are explicitly not
      supposed to be accessed by the host when MSI-X is used.
    * Fix a bug where the maximum number of queues possible was one too high.
      This limit is *never* reached anyway, but fix the code just in case.
    * Fix a bug where we assumed that the number of queues returned by the
      NVME_FID_NUMQUEUES command would always be <= the number of queues
      requested.  In fact, this is not the case for at least one chipset
      or for some VM emulations.  Limit the returned values to no more than
      the requested values.
    * Set the queue->nqe field last when creating a completion queue.  This
      prevents interrupts which poll multiple completion queues from attempting
      to poll a completion queue that has not finished getting set up.  This
      case always occurs when pin-based interrupts are used and sometimes
      occurs when MSI-X vectors are used, depending on the topology.
    * NOTES ON DISABLING MSI-X.  Not all chipsets implement pin-based interrupts
      properly for NVMe.  The BPX NVMe card, for example, appears to just leave
      the pin interrupt in a stuck state (the chipset docs say the level
      interrupt is cleared once all doorbell heads are synchronized for the
      completion queues, but this does not happen).  So NVMe users should not
      explicitly disable MSI-X when it is nominally supported, except for
    Reported-by: sinetek

Summary of changes:
 sys/dev/disk/nvme/nvme.c        | 43 ++++++++++++++++++++++++++++++++++++--
 sys/dev/disk/nvme/nvme_admin.c  | 46 ++++++++++++++++++++++++++++++-----------
 sys/dev/disk/nvme/nvme_attach.c | 23 +++++++++++++++++++++
 3 files changed, 98 insertions(+), 14 deletions(-)

DragonFly BSD source repository

More information about the Commits mailing list