Sdhci spam on console, extremely slow boot

Fri Mar 24 10:20:26 PDT 2023

Thank you Matt for the suggestion on silencing the spam. I haven’t tried it but my initial look at the code would silence the spam while still subjecting the system to a long wait on boot. I might be wrong. I’m not a kernel developer.

You bring up a question of how can the kernel handle this better. Maybe this is a good place to brainstorm some approaches.

I looked at how FreeBSD does it and they simply use ``device.hints`` to allow someone to block a particular device from even loading while keeping the driver for other devices.

In my course of trying anything,I tried setting ``dev.sdhci_pci.0.timeout=1`` in ``/boot/loader.conf`` but it was ineffective (reading sdhci.c kinda made me go “yeah that wouldn’t work”).

I don’t suggest creating ``device.hints`` because it seems… kludgey to do so and it’s even less accessible to an end user than a one-stop shop of ``loader.conf``. 

When I examine the flags from the empty microsd slot, it tells DragonFly “I am embedded and not removable” which is kinda true as one has to yank the microsd out of it (no dedicated eject spring). If I were to think up the minimum way of addressing this, being able to tell the kernel “please ignore this specific device" (something like ``hw.sdhci.slots_disable=0x1`` which could be a bit mask of slots to enable or disable) would be nice. Or something like ``hw.sdchi.slot.0.timeout=-1`` or even ``hw.sdhci.controller_timeout_seconds=1``.

Perhaps the fact we block on boot 10 seconds per message for a total of 20 messages (200 seconds, I counted them last night but I might be off as I was very tired) might be something we can put into background? Like `hw.sdhci.controller_detect_async``? But that opens up a different question — if someone absolutely needed to boot from that slot, waiting for it to be online is a mandatory (which is the current behavior).

I see the following use cases:

- end user must load from an embedded slot that is unoccupied and blocks till ready or timeout in 200 seconds (current implementation satisfies that)
- end user does not use embedded slot and slot is optional to operation (i.e. me booting from an external SSD due to a long unrelated story of exploding gateways and emergency replacements)
- end user uses embedded slot as a data drive and can mount later

The more I think about this, the more I think being able to limit the retries or total duration of controller timeout on boot for a single sdchi or even the all sdhci slots might be the best solution. My board eventually does come up. Unfortunately I cannot disable the slots from the UEFI bios (thanks Intel).

Autumn

> On Mar 24, 2023, at 12:00 AM, Matthew Dillon <dillon at backplane.com> wrote:
> 
> You can kill kernel logging entirely, see if this works as a quick workaround in /boot/loader.conf:
> 
> kern.kprintf_logging=0
> 
> The less invasive solution is to compile up a kernel with that junk commented out.  Comment out lines 1071 and 1072 in /usr/src/sys/dev/disk/sdhci/sdhci.c.   And yes, I agree... we should limit the rate at which it prints messages like that.  What would be a reasonable rate limit?   Once every 30 seconds maximum?  It would be like 4-5 lines of code and a static variable, so not hard.
> 
> -Matt