BFBI OTT bug or limitation? UPDATE2

Mon Feb 23 02:37:14 PST 2009

Bill Hacker wrote:
Top-posting to my own post ...
Again.

Reproduced the original verbose crash. Only the last line is the same as 
below.

Failed to set up the HP-200LX serial, so will run it again...

Bill

:-(

du -h > dulist

Two more runs, one OK  with hammer mirror-stream over ssh NOT running, 
second  run with it mirroring a nearly empty dirtree (five static, 
one-line text files only), runs for several minutes, then drops into 
debugger with a mere three lines, rather than the original 
scrolled-off-the screen;

CRC DATA @ 9000000a3b15b280/128 FAILED
Debuger ("CRCFAILED: DATA")
Stopped at    Debugger+0x34:    movb    $0, in_Debugger.3970
But this does not seem to relate. Could be an NATACONTROL + old HHD I/O 
error artifact.

First looong err message had to do with vnodes..

More testing to do, then will swap-in a newer 500 GB SATA.

Bill

Bill Hacker wrote:
Brute Force and Bloody Ignorance is Over The Top when;

'du -h'

saves keystrokes over:

'shutdown now'

Likewise a few other ways to either reboot or drop into the debugger.

Environment:

Target box:

VIA C7, 2 GB DDR2-533

2 X IBM 60GB PATA as NATACONTROL RAID1

2.3.0. 'default install' to all-HAMMER

root-mounted /slave1 PFS-created as such by hammer mirror-copy over ssh

Source box:

Lenovo G400 3000 laptop. 33GB slice ad0s1 for DFLY.

2.2.0 Installed to UFS, spare 8 GB partition *ONLY* later formatted 
and mounted as hammerfs '/hmr', made into a master for testing, 
'/hmr/master'.

ACTION:

hammer mirror-stream /hmr/master Thor@<target_IP>/slave1
over ssh 100 Mbps internal link.
Fire-off bonnie++ to fill the /hmr partition with fairly deep recursion.
It fills and stops gracefully with a ...cannot write' message, begins 
to cleanup its work area.

meanwhile, the ssh link has been doing its best - and it's best is 
very good.

Watching the target sees ~/slave1 gradually clear as bonnie++ mops up 
the master, until back where du shows zero usage, slaves having no 
snapshots of their own.

But the /hmr/master mount has gone from zero to 94% used, and the 
target has gone from 76% used to 87% used.

'du' on the master cannot seem to locate where TF the '94%' df reports 
for /hmr is hiding, but never mind.. we can nuke and newfs that 
partition at will.

But where is the used space on the *target* hiding?

'du -h /' on the all-hammerfs target *reboots* it soemwhere along the 
way.

...  Comes back up quick - I'll give it that...

But hang-on.  Could an 'ordinary user do that at will?

'du -h > dulist' (for later grep'ing) throws a panic and drops DFLY 
into the debugger...

Also worrisome...

By comparison, a UFS fs when overloaded, ordinarily soldiers on with 
109% utilized and a 'no space on device' message. For days....

Hammer needs to get there also...

If this is an out-of-memory situation with 2GB, it shouldn't be.

If the fs is full, the exit should be graceful, not catastrophic.

If no one else can reproduce this, I'll try it on other hardware - and 
with a serial terminal.

NB: Rather small drives and partitions used. /hmr/master 8GB, entire 
hammer fs on target only 60 GB.

That part is intentional.

No need to wait all day to see if it happens on a half-terrabyte also.

Panic not captured. Do we need it, or is this a known issue?

Bill