Initial filesystem design synopsis.

Jason Smethers jason at smethers.net
Mon Feb 26 21:44:11 PST 2007


02.04.39.273032 at hotmail.com>
In-Reply-To: <pan.2007.02.27.02.04.39.273032 at hotmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 69
Message-ID: <45e3c52e$0$833$415eb37d at crater_reader.dragonflybsd.org>
NNTP-Posting-Host: 72.177.222.117
X-Trace: 1172555054 crater_reader.dragonflybsd.org 833 72.177.222.117
Xref: crater_reader.dragonflybsd.org dragonfly.kernel:10697

Rupert Pigott wrote:
> On Mon, 26 Feb 2007 14:37:15 -0800, Matthew Dillon wrote:
>>     The alternative is to set a specific endianess for the filesystem in
>>     stone, just like the 'network byte order' concept locked in a
>>     particular byte ordering for packets.  People today STILL hate the
>>     fact that they have to translate certain protocols to network byte
>>     order even when the machines on both ends use the same byte order,
>>     but one that happens to not be 'network' byte order.
> 
> That doesn't seem like a good enough reason to add more conditional
> branches to the critical paths, make the wire protocol more complicated
> and make it more difficult for the Mk.I eyeball to parse the dumps. It
> also pushes an additional conditional into at runtime. A fixed-endian
> filesystem makes that conditional "free" and therefore reduces the
> performance hit - and makes the life of the compiler trivial.

It's one branch at load and one branch at store for meta-data. The 
endian is not likely to change once the filesytem is mounted; therefore, 
in the critical path the processor's branch prediction should almost 
always predict the correct path taken or not taken.

Besides, the meta data is most likely to already be in memory, and thus 
you're down to just the branch on store.

If it really is an issue, you can always do per filesystem function 
pointers to either a nop function or the correct endian translation 
function. With today's processors which do not yet include good indirect 
  branch support (i.e. pointer based function calls), this would make 
the best case always as slow as a branch prediction which "always 
misses". This will of course likely improve with the next generation of 
processors, such as AMD's K8L which include indirect branch prediction.

Changes between system endians will likely be very rare anyways. Best 
case is there would be an "endian field" in the meta-data which is 
always set to little-endian. Implementations which do not support endian 
changes would simply always fail to mount.

A seperate tool may be made which does endian conversion on media before 
mounting on a different endian system. Then, the compiler would always 
optimize the kernel filesytem code to only support the endian of the 
computer it is compiled for.

We could also do a per segment conversion of meta-data at run time as 
the segment is accessed, but the need for conversion will likely be rare 
anyways. Such conversions may require additional moving of data to other 
"already converted segments" since we would likely not want to overwrite 
existing meta-data to insure transactional safety. A background thread 
could also do the conversion as necessary starting with "unused" 
segments, and do "performance optimization" at the same time.

As far as I know, there is no good reason today for a microprossor 
designer to choose one endian over another from a design standpoint. 
Today, it more or less comes down to preference. Therefore, it is highly 
unlikely to see another processor architecture which is not big-endian 
or little-endian.

As for network protocols, I prefer negotiating the endian such that the 
endian of the "server" or "peer under higher load" gets preference. 
Then, the "client" or "peer under lower load" takes on the burden of 
conversion. Then developer only has to worry about endian changes to 
interface with the network API. For most developers, this means that 
communication will almost always take place in little-endian between x86 
machines.

As far a human readable binary, that's why we should build tools to 
format the data. =)


- Jason





More information about the Kernel mailing list