Patch for inode SLIST conversion

Sat Jan 19 11:08:12 PST 2008

:If I understand serialization tokens correctly, they lead in case of the 
:two functions above to mutual exclusive execution, because none of them 
:voluntarily blocks or calls a blocking method. Is this assumption correct?

    Correct.

:                 /*
:                  * We must check to see if the inode has been ripped
:                  * out from under us after blocking.
:                  */
:                 INOFIND(ip, dev, inum);
:                 if (ip == NULL || ITOV(ip) != vp) {
:        			vput(vp);
:..
:
:See the comment on "ripped out from under us". This can only happen 
:because vget might block, right?

    Correct, though once you start calling random procedures which are
    not specifically documented as being non-blocking, it can get dangerous
    because the assumption might not hold if the code is changed in the
    future.

    But here... yes, correct.

:Now, ufs_ihashget doesn't modify the hashtable at all. So why not store 
:a "transaction id" in each hash table bucket, which gets increased by 
:ufs_ihashins and ufs_ihashrem. Then in ufs_ihashget I read in this id 
:*before* calling vget and check afterwards if it has changed. If it is 
:unchanged, I can omit a second pass over the linked list.
:
:It's actually quite easy to implement and if all my assumptions apply, 
:the correct way of doing it. Please correct me if I'm wrong.
:
:
:Regards,
:
:   Michael

    Ok, three things.  First, the transaction id will be fairly granular,
    meaning that if it does not match you have to loop.  So the code
    is going to wind up being very slightly more complex then it is now.

    Second, the transaction id idea has been used in BSD code before,
    particular with regards to dealing with vm_map's.  It will be
    higher-performing, but not by a lot, and it has the problem of 
    obfuscating the code a bit (meaning you have to carefully document
    the algorithm in comments) and it is also somewhat algorithmically
    fragile because there is no direct check against duplicate nodes.

    Third, the hash table is sized such that it has or should have very
    short chain lengths.  The relookup will all be in the cpu's L1 cache
    and I doubt you'd notice any improvement in performance.

    I would personally like to keep the code as is and not make any major
    changes to it, simply because we know it works and it is extremely
    fragile.  If you really want to redo the code, you can, but I would
    like to wait until after the 2.0 release to commit it.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>