[PATCH] Suggested FreeBSD merge
Matthew Dillon
dillon at apollo.backplane.com
Mon Nov 15 10:24:55 PST 2004
:The address was just a value off-hand. I think we can differentiate between
:(a) the application ABI and (b) the kernel version to be mapped.
:
:>From the application point-of-view, having a fixed address is very useful,
:because it allows the compiler to skip the overhead of Position Independent
:Code, esp. the GOT/PLT setup. Since this should be used for sensitive
:low-level routines, it makes sense to skip this.
I'm not sure I understand what you mean here. I see only three ways to do
this. Using strlen() as a contrived example. The first way I don't
think we can do because it makes strlen() a function pointer rather then
a function. It would be something like:
#define __section(name) __attribute__((__section__(name)))
__section(".klib-dragonfly01") size_t (* const strlen)(const char *);
This would generate code as follows. This code would be AS FAST as a
direct jump due to the branch prediction cache. That is, the
movl strlen,%ebx + call combination will take no longer then call strlen
would take.
movl strlen,%ebx
call *%ebx
However, I don't think we can use a C declared function pointer and still
adhere to the standards unless the procedures are typically #define'd
entities in standard header files.
A second way of doing this is a call/jump:
(strlen would be at a fixed offset within the special section)
.section .klib-dragonfly01,"ax", at progbits
.globl strlen
.type strlen, at function
strlen:
jmp clib_strlen ; default overrided by kernel
The kernel would modify the jump address. i.e. it would change it from
whatever address 'clib_strlen' was to point into its shared map.
However, this is MUCH slower then an indirect call because it forces the
cpu to resynchronize the instruction stream twice.
:A good place to request to loading of this page[s] is libc. That way the
:linker can be told that the symbol is part of the libc namespace and using
:some magic, the compiler can be made aware of the fixed nature [for shared
:libraries]. The location of the page can be arbitrary, even 0x0 would make
:sense. Since this is part of the namespace of libc, it is bound by the
:ABI version of libc, so no additional compatibility problems should arise.
:If a library doesn't want to depent on this, it can use the normal indirect
:calls via GOT/PLT.
Sure, we could compile up our 'shared' library and then make the linker
aware of the symbol map, but that means that *EVERY* *TIME* we want
to modify the shared library every single program that uses it would
have to be recompiled. Or, if not recompiled, we would have to keep
a copy of every version of the shared library that we ever wrote.
Not only that, but different compiler options would produce different
code, causing the offsets to change even without any code changes.
I just don't see this being viable generally without some significant
work. The only way I see a direct-call model working is if the
direct-call code reserved a fixed amount of space for each function
so the offsets are well known, and if the function is too big to fit
in the reserved space the space would be loaded with a JMP to the
actual function instead.
So the THIRD way would be to do this:
.section .klib-dragonfly01,"ax", at progbits
.globl strlen
.type strlen, at function
strlen:
[ the entire contents is replaced with actual function if the actual
function does not exceed 64 bytes, else only the jump vector is
modified ]
[ the default function can be placed here directly if it does not
exceed 64 bytes ]
jmp clib_strlen ; default overrided by kernel
.p2align 6,0x90 ; 64 byte blocks
Advantages:
* Direct call, no jump table for simple functions.
* The kernel can just mmap() the replacement library right over the
top.
Disadvantages:
* requires a sophisticated utility to check whether the compiled
function fits and decide whether to generate a jmp or whether
to directly embed the function.
* space/compactness tradeoff means that the chosen size may not
be cache friendly, or may be space friendly, but not both.
:The jump table is not the problem, the problem having to resolve the
:references to it. For the code page itself, .TEXT relocations are not
:critical, that can be handled easily and with low overhead. Just to
:clarify, I mean calls / jumps from code in the code page to itself.
:It has to be self-contained, of course.
:
:If we have to use a jump table from a variable address, it adds at least
:two instructions to every reference [as variable] or call [as function].
:This can easily out-weight the performance improvements.
:
:Joerg
I'm not sure I understand what you are describing here relative to
what I am describing. I was not describing PIC, per-say. The only way
to have a direct-call model is if an absolute, static amount of function
space is reserved for each procedure.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Submit
mailing list