Storing hundreds of millions of files in HAMMER (1 or 2)

Matthew Dillon dillon at backplane.com
Wed Jul 15 09:53:00 PDT 2015


You should use a database, frankly.   A HAMMER1 inode is 128 bytes and for
small files I think the data will run on 16-byte boundaries.  Not sure of
that.  Be sure to mount with 'noatime', and also use the double buffer
option because the kernel generally can't cache that many tiny files itself.

The main issue with using millions of tiny files is that each one imposes a
great deal of *ram* overhead for caching, since each one needs an in-memory
vnode, in-memory inode, and all related file tracking infrastructure.

Secondarily, hammer's I/O optimizations are designed for large files, not
small files, so the I/O is going to be a lot more random.

-Matt

On Wed, Jul 15, 2015 at 8:58 AM, Michael Neumann <mneumann at ntecs.de> wrote:

> Hi,
>
> Lets say I want to store 100 million small files (each one about 1k in
> size) in a HAMMER file system.
> Files are only written once, then kept unmodified and accessed randomly
> (older files will be access less often).
> It is basically a simple file based key/value store, but accessible by
> multiple processes.
>
> a) What is the overhead in size for HAMMER1? For HAMMER2 I expect each
> file to take exactly 1k when the file
> is below 512 bytes.
>
> b) Can I store all files in one huge directory? Or is it better to fan out
> the files into several sub-directories?
>
> c) What other issues I should expect to run into? For sure I should enable
> swapcache :)
>
> I probably should use a "real" database like LMDB, but I like the
> versatility of files.
>
> Regards,
>
>   Michael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/users/attachments/20150715/17a55227/attachment-0006.html>


More information about the Users mailing list