Storing hundreds of millions of files in HAMMER (1 or 2)
mneumann at ntecs.de
Thu Jul 16 07:54:37 PDT 2015
Am 15.07.2015 um 18:53 schrieb Matthew Dillon:
> You should use a database, frankly. A HAMMER1 inode is 128 bytes and
> for small files I think the data will run on 16-byte boundaries. Not
> sure of that. Be sure to mount with 'noatime', and also use the
> double buffer option because the kernel generally can't cache that
> many tiny files itself.
> The main issue with using millions of tiny files is that each one
> imposes a great deal of *ram* overhead for caching, since each one
> needs an in-memory vnode, in-memory inode, and all related file
> tracking infrastructure.
> Secondarily, hammer's I/O optimizations are designed for large files,
> not small files, so the I/O is going to be a lot more random.
Thanks for the insight. I take a database :). It's a pitty that none of
the databases out there support HAMMER-like queue-less streaming out of
I wish you would have finished backplane database :).
> On Wed, Jul 15, 2015 at 8:58 AM, Michael Neumann <mneumann at ntecs.de
> <mailto:mneumann at ntecs.de>> wrote:
> Lets say I want to store 100 million small files (each one about
> 1k in size) in a HAMMER file system.
> Files are only written once, then kept unmodified and accessed
> randomly (older files will be access less often).
> It is basically a simple file based key/value store, but
> accessible by multiple processes.
> a) What is the overhead in size for HAMMER1? For HAMMER2 I expect
> each file to take exactly 1k when the file
> is below 512 bytes.
> b) Can I store all files in one huge directory? Or is it better to
> fan out the files into several sub-directories?
> c) What other issues I should expect to run into? For sure I
> should enable swapcache :)
> I probably should use a "real" database like LMDB, but I like the
> versatility of files.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users