[issue717] panic: assertion: msg->ms_flags & MSGF_INTRANSIT

Matthew Dillon dillon at apollo.backplane.com
Wed Jul 4 12:53:55 PDT 2007


:Peter Avalos <pavalos at theshell.com> added the comment:
:
:Got another one...not sure if it's gunna help though:
:...
:boot() called on cpu#1
:Uptime: 2d12h22m33s
:
:dumping to dev #da/0x20001, blockno 378927
:dump devstat_end_transaction: HELP!! busy_count for da1 is < 0 (-1)!
:LWKT_WAIT_IPIQ WARNING! 0 wait 1 (-3)
:SECONDARY PANIC ON CPU 0 THREAD 0xc0354e04

    Was this after your ahd logic fix?

    That secondary panic can only happen if devstat_end_transaction() is
    called too many times.  The INTRANSIT panic implies a message being
    replied to twice, or a message being corrupted prior to being replied.

    This implies that the BIO related to a transaction is being replied
    to twice.  I've never seen this before in my life so I think it must
    be specific to the AHD driver.  I'll bet the driver is trying to
    complete an I/O twice.

    Lets try to catch the problem earlier and maybe get a more meaningful
    backtrace.   Make sure you have options DDB_TRACE set (I think you
    do).  I have committed some code to lwkt_msgport.c (1.44) which tries
    to catch the MSGF_INTRANSIT flag on the originating cpu.

    --

    Another data point... you got the ahd_run_qoutfifo() panic again
    as well, which implies that ahd_run_qoutfifo() was running when the
    original panic occured.  I think there's an issue in AHDs transaction
    processing.  The driver does a LOT of work on a scb before it frees
    it by calling ahc_free_scb().  Somewhere in there a recursion is
    happening.

    It may be beneficial to add a 'processing in progress' flag to the scb
    in ahc_done() and assert that the flag is not set to catch double-calls
    to ahd_done() on the same scb earlier.  Something like what I have
    below may help.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>

Index: aic79xx.h
===================================================================
RCS file: /cvs/src/sys/dev/disk/aic7xxx/aic79xx.h,v
retrieving revision 1.2
diff -u -p -r1.2 aic79xx.h
--- aic79xx.h	17 Jun 2003 04:28:21 -0000	1.2
+++ aic79xx.h	4 Jul 2007 19:47:50 -0000
@@ -589,12 +589,13 @@ 	SCB_EXPECT_PPR_BUSFREE	= 0x01000,
 	SCB_PKT_SENSE		= 0x02000,
 	SCB_CMDPHASE_ABORT	= 0x04000,
 	SCB_ON_COL_LIST		= 0x08000,
-	SCB_SILENT		= 0x10000 /*
+	SCB_SILENT		= 0x10000,/*
 					   * Be quiet about transmission type
 					   * errors.  They are expected and we
 					   * don't want to upset the user.  This
 					   * flag is typically used during DV.
 					   */
+	SCB_RUNNINGDONE		= 0x20000
 } scb_flag;
 
 struct scb {
Index: aic79xx_osm.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/aic7xxx/aic79xx_osm.c,v
retrieving revision 1.13
diff -u -p -r1.13 aic79xx_osm.c
--- aic79xx_osm.c	4 Jun 2007 17:21:55 -0000	1.13
+++ aic79xx_osm.c	4 Jul 2007 19:49:48 -0000
@@ -196,6 +196,9 @@ ahd_done(struct ahd_softc *ahd, struct s
 {
 	union ccb *ccb;
 
+	KKASSERT((scb->flags & SCBRUNNINGDONE) == 0);
+	scb->flags |= SCBRUNNINGDONE;
+
 	CAM_DEBUG(scb->io_ctx->ccb_h.path, CAM_DEBUG_TRACE,
 		  ("ahd_done - scb %d\n", SCB_GET_TAG(scb)));
 





More information about the Bugs mailing list