Initial filesystem design synopsis.

Rupert Pigott darkb00ng at hotmail.com
Mon Feb 26 18:05:00 PST 2007


On Mon, 26 Feb 2007 14:37:15 -0800, Matthew Dillon wrote:

Thanks for replying. :)

>     The run-time variable is simply identifying what endianess the
>     filesystem is using, allowing the filesystem image (or pieces of it)
>     to be transported between boxes with different endianess without them
>     blowing up.   Machine architectures change all the time, but storage
>     is forver (or at least for longer)... It needs to 'just work'.

Sure.

>     Most clusters will be made up of boxes using the same endianess, and
>     the filesystem will be formatted for that self same endianess.  They
>     will look at the variable and say 'ho hum, that's already the format
>     I use natively so I don't have to do any translation at all'.

That is what I thought to be the case.

>     It is particularly important in any sort of clustering system or
>     protocol (or protocol used by a clustering system) to allow boxes with
>     different endianess to talk to each other, even if it means taking a
>     small performance hit in that communication.  Anyone remember the

Hell yeah. That is part of the reason why I think that making endianess an
option for the on-disk format is a bit crazy. It *adds* complexity to
the protocol and therefore increases the chances of mishap. The code has
to navigate 2x as many valid sequences of input as a fixed endian protocol. 

>     'talk' bug from 20+ years ago? To this day I do not think Sun ever
>     fixed the byte ordering for 'talk' that made their talk incompatible
>     with everyone elses.  We are not going to repeat that mistake.

That would not be the first or last time Sun screwed up !

Someone could screw up the conversion process for a bi-endian FS just the
same way. Anyone can write a broken implementation of a protocol, that
doesn't necessarily make the protocol bad. In my view reducing the number
of permutations of wire-format would reduce the scope for screwing up.

: The on-disk format has a defined endianess.
 
>     The alternative is to set a specific endianess for the filesystem in
>     stone, just like the 'network byte order' concept locked in a
>     particular byte ordering for packets.  People today STILL hate the
>     fact that they have to translate certain protocols to network byte
>     order even when the machines on both ends use the same byte order,
>     but one that happens to not be 'network' byte order.

That doesn't seem like a good enough reason to add more conditional
branches to the critical paths, make the wire protocol more complicated
and make it more difficult for the Mk.I eyeball to parse the dumps. It
also pushes an additional conditional into at runtime. A fixed-endian
filesystem makes that conditional "free" and therefore reduces the
performance hit - and makes the life of the compiler trivial.

>     We're not making that same mistake, hence the filesystem will be
>     speced for endian neutrality even if the code isn't written to
>     support it right off the bat.

. .. And in years to come when people have rediscovered PDP double word
ordering they will be cursing the day that their FS's were created in Big
Endian and Little Endian ! :P

As a footnote : Many architectures have some facility for bi-endian
support in hardware these days, so we're in for some interesting times
even without a lunatic reviving PDP double word ordering ! :)

Regards,
Rupert





More information about the Kernel mailing list