[issue717] panic: assertion: msg->ms_flags & MSGF_INTRANSIT
Matthew Dillon
dillon at apollo.backplane.com
Wed Jul 4 12:53:55 PDT 2007
:Peter Avalos <pavalos at theshell.com> added the comment:
:
:Got another one...not sure if it's gunna help though:
:...
:boot() called on cpu#1
:Uptime: 2d12h22m33s
:
:dumping to dev #da/0x20001, blockno 378927
:dump devstat_end_transaction: HELP!! busy_count for da1 is < 0 (-1)!
:LWKT_WAIT_IPIQ WARNING! 0 wait 1 (-3)
:SECONDARY PANIC ON CPU 0 THREAD 0xc0354e04
Was this after your ahd logic fix?
That secondary panic can only happen if devstat_end_transaction() is
called too many times. The INTRANSIT panic implies a message being
replied to twice, or a message being corrupted prior to being replied.
This implies that the BIO related to a transaction is being replied
to twice. I've never seen this before in my life so I think it must
be specific to the AHD driver. I'll bet the driver is trying to
complete an I/O twice.
Lets try to catch the problem earlier and maybe get a more meaningful
backtrace. Make sure you have options DDB_TRACE set (I think you
do). I have committed some code to lwkt_msgport.c (1.44) which tries
to catch the MSGF_INTRANSIT flag on the originating cpu.
--
Another data point... you got the ahd_run_qoutfifo() panic again
as well, which implies that ahd_run_qoutfifo() was running when the
original panic occured. I think there's an issue in AHDs transaction
processing. The driver does a LOT of work on a scb before it frees
it by calling ahc_free_scb(). Somewhere in there a recursion is
happening.
It may be beneficial to add a 'processing in progress' flag to the scb
in ahc_done() and assert that the flag is not set to catch double-calls
to ahd_done() on the same scb earlier. Something like what I have
below may help.
-Matt
Matthew Dillon
<dillon at backplane.com>
Index: aic79xx.h
===================================================================
RCS file: /cvs/src/sys/dev/disk/aic7xxx/aic79xx.h,v
retrieving revision 1.2
diff -u -p -r1.2 aic79xx.h
--- aic79xx.h 17 Jun 2003 04:28:21 -0000 1.2
+++ aic79xx.h 4 Jul 2007 19:47:50 -0000
@@ -589,12 +589,13 @@ SCB_EXPECT_PPR_BUSFREE = 0x01000,
SCB_PKT_SENSE = 0x02000,
SCB_CMDPHASE_ABORT = 0x04000,
SCB_ON_COL_LIST = 0x08000,
- SCB_SILENT = 0x10000 /*
+ SCB_SILENT = 0x10000,/*
* Be quiet about transmission type
* errors. They are expected and we
* don't want to upset the user. This
* flag is typically used during DV.
*/
+ SCB_RUNNINGDONE = 0x20000
} scb_flag;
struct scb {
Index: aic79xx_osm.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/aic7xxx/aic79xx_osm.c,v
retrieving revision 1.13
diff -u -p -r1.13 aic79xx_osm.c
--- aic79xx_osm.c 4 Jun 2007 17:21:55 -0000 1.13
+++ aic79xx_osm.c 4 Jul 2007 19:49:48 -0000
@@ -196,6 +196,9 @@ ahd_done(struct ahd_softc *ahd, struct s
{
union ccb *ccb;
+ KKASSERT((scb->flags & SCBRUNNINGDONE) == 0);
+ scb->flags |= SCBRUNNINGDONE;
+
CAM_DEBUG(scb->io_ctx->ccb_h.path, CAM_DEBUG_TRACE,
("ahd_done - scb %d\n", SCB_GET_TAG(scb)));
More information about the Bugs
mailing list