[PATCH] Suggested FreeBSD merge

Mon Nov 15 13:30:05 PST 2004

:That's the problem. The movl strlen,%ebx only works if strlen is a static
:address. Otherwise the code has to do a lookup in the GOT first. This means
:typically two more instructions. Leaving out the normal function init,
:it would be something like this:
:	movl strlen at GOT(%ebx), %eax
:	movl (%eax), %eax
:	call *%eax
:The normal calling sequence for PIC is:
:	call strlen at PLT
:with strlen at PLT being translated into a relative address, which contains:
:	jmp strlen
:(the real address, somewhat simplified)

    (Discussion of the function pointer method)

    The 'strlen' pointer can be a static address...  that's why you declare
    the pointer in the special section.  Actually what would be declared
    in the special section would be an array of pointers representing the 
    library vectors, that being just one of them.

    Since they would be in a special section we can:

    * Align the section on a page boundary.

    * The kernel can remap that page to its OWN library vector list, pointing
      to a fixed (from the kernel's standpoint) high-memory user accessible
      address where the actual functions resides.

      *THOSE* can be at e.g. 0xc0100000 or something like that.  Just as long
      as the user program does not try to jump to the routines in high memory
      directly, since depending on the kernel the locations may be different.

    What we cannot do is have the user program load the function pointer
    directly from high memory, because that results in a situation where the
    kernel no longer has the flexibility to change the address (similar to
    the BSD/OS PS_STRINGS problem).

:>     I just don't see this being viable generally without some significant
:>     work.  The only way I see a direct-call model working is if the 
:>     direct-call code reserved a fixed amount of space for each function
:>     so the offsets are well known, and if the function is too big to fit
:>     in the reserved space the space would be loaded with a JMP to the
:>     actual function instead.
:
:Exactly. The location of the mapping can be considered part of the ABI,
:with the best location being at the bottom the virtual address space,
:I guess.
:
:>     So the THIRD way would be to do this:
:> 
:> 	.section	.klib-dragonfly01,"ax", at progbits
:> 	.globl		strlen
:> 	.type		strlen, at function
:> strlen:
:> 	[ the entire contents is replaced with actual function if the actual
:> 	  function does not exceed 64 bytes, else only the jump vector is
:> 	  modified ]
:> 	[ the default function can be placed here directly if it does not
:> 	  exceed 64 bytes ]
:> 	jmp		clib_strlen	; default overrided by kernel
:> 	.p2align	6,0x90		; 64 byte blocks
:> 
:>     Advantages: 
:> 
:> 	* Direct call, no jump table for simple functions.
:> 
:> 	* The kernel can just mmap() the replacement library right over the
:> 	  top.
:> 
:>     Disadvantages:
:> 
:> 	* requires a sophisticated utility to check whether the compiled
:> 	  function fits and decide whether to generate a jmp or whether
:> 	  to directly embed the function.
:> 
:> 	* space/compactness tradeoff means that the chosen size may not
:> 	  be cache friendly, or may be space friendly, but not both.
:
:Yes, exactly. This is what Apple is doing for MacOS X. The version problem
:is not that big, because like I said, the ABI would be fixed and could be
:bound to the normal COMPAT handling. Having a default included in libc as
:fallback would work too. IIRC the speed difference is bigger on PPC,
:because you are doing PIC almost always there.
:
:The cache friendliness is difficult, we have can do at least alignments
:pretty well. I don't think there's a difference for normal cache size
:length, because GCC does some padding of functions by default.
:
:Joerg

    (Discussion of the function reserve area method)

    I think we could do #3, as long as we do not reserve too much space for
    each function entry.  64 bytes ought to be plenty for the vast majority
    of optimized functions we might want to provide with the remainder going
    through a JMP.  This also preserves all the advantages we would expect...
    default functions would be available, the direct call model winds up being
    used.

    The only complexity is that we need to use 'size' and 'objdump' along
    with a script, or perhaps even write our own ELF utility to take a
    'library of functions' and map them into the reserved space, and to
    deal with functions that are > 64 bytes.

    In particular I can see this being used not only to optimize library
    functions but also to give us a clean syscall interface (the syscall 
    'layer' that I've been nattering on about for the last year).  It would
    kill two birds with one stone!

    The syscall layer will mostly take syscall arguments, stuff them into
    a message, and call a 'go' function.   In fact, this requires some
    experimentation because if most of the syscall layer functions eat more
    then 64 bytes we'd probably want to consider increasing the reserve area
    to 128 bytes.

    The size of each reserve area could also vary from library to library.
    e.g. the library support functions could use 64 bytes while the syscall
    layer support functions could use 128.

    --

    Since the function-pointer method isn't really viable in a standard
    link configuration (since they are pointers rather then functions a user
    program cannot override them and provide its own), I think if we were to
    do this we would have to use method #3.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>