<html><head></head><body lang="en-US" style="background-color: rgb(255, 255, 255); line-height: initial;"> <div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Hi Mehmet,</div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br></div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">I think an OS having a seekable file interface pretty much means that it is suited best for seekable data, which is going to have some length to it. The interface is designed for that kind of consumption, so I would hope the implementation is optimized for it. Otherwise the interface overhead is wasted. </div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br></div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Some databases are very good for relational data. Some are good at being a key-value store. Some are good at storing XML. A filesystem is good at storing seekable data of significant length, and that is what I want my filesystem optimized for.</div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br></div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Sure, a filesystem should be more general-purpose than a more specialized database, but to expect it to be the better solution for a completely small-file problem is, I think, asking too much. At some point, using the seekability has to be more appropriate.</div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br></div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">I could see wanting to use a filesystem for its other features, like the folder hierachy, and still wanting small files, like the MH email format. However, even in that extreme case, there will often be large files, like PDF files, and ZIP files. Of course, hammer still suits that case just fine. Maybe XFS would be faster. </div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><span style="font-size: initial; line-height: initial; text-align: initial;"><br></span></div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><span style="font-size: initial; line-height: initial; text-align: initial;">Think of building a fuse filesystem on top of postgres. It would be cool for compatibility, but it wouldn't be an appropriate thing to do besides that. For example, you could search an MH folder and store the results in a postgres database, and do it with grep and some shell generating results in the fuse mount. But that is an extreme case. </span></div> <div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br></div> <div style="font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Ben</div> <table width="100%" style="background-color:white;border-spacing:0px;"> <tbody><tr><td colspan="2" style="font-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"> <div style="border-style: solid none none; border-top-color: rgb(181, 196, 223); border-top-width: 1pt; padding: 3pt 0in 0in; font-family: Tahoma, 'BB Alpha Sans', 'Slate Pro'; font-size: 10pt;"> <div><b>From: </b>Mehmet Erol Sanliturk</div><div><b>Sent: </b>Wednesday, July 15, 2015 6:43 PM</div><div><b>To: </b>Sepherosa Ziehau</div><div><b>Cc: </b>Matthew Dillon; users</div><div><b>Subject: </b>Re: Storing hundreds of millions of files in HAMMER (1 or 2)</div></div></td></tr></tbody></table><div style="border-style: solid none none; border-top-color: rgb(186, 188, 209); border-top-width: 1pt; font-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"></div><br><div id="_originalContent" style=""><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 15, 2015 at 6:30 PM, Sepherosa Ziehau <span dir="ltr"><<a href="mailto:sepherosa@gmail.com" target="_blank">sepherosa@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Maybe just use large file, and sub-index the chunks of a large file<br>
and add open/read/write/lseek/close like APIs for users. You will<br>
have more control than using database.<br>
<br>
On Wed, Jul 15, 2015 at 11:58 PM, Michael Neumann <<a href="mailto:mneumann@ntecs.de">mneumann@ntecs.de</a>> wrote:<br>
> Hi,<br>
><br>
> Lets say I want to store 100 million small files (each one about 1k in size)<br>
> in a HAMMER file system.<br>
> Files are only written once, then kept unmodified and accessed randomly<br>
> (older files will be access less often).<br>
> It is basically a simple file based key/value store, but accessible by<br>
> multiple processes.<br>
><br>
> a) What is the overhead in size for HAMMER1? For HAMMER2 I expect each file<br>
> to take exactly 1k when the file<br>
> is below 512 bytes.<br>
><br>
> b) Can I store all files in one huge directory? Or is it better to fan out<br>
> the files into several sub-directories?<br>
><br>
> c) What other issues I should expect to run into? For sure I should enable<br>
> swapcache :)<br>
><br>
> I probably should use a "real" database like LMDB, but I like the<br>
> versatility of files.<br>
><br>
> Regards,<br>
><br>
> Michael<br>
<span class=""><font color="#888888"><br>
<br>
<br>
--<br>
Tomorrow Will Never Die<br>
</font></span></blockquote></div><br><br></div><div class="gmail_extra">In reality , an operating system is one of the "best" data base management system .<br><br>Question by Michael Neumann is very important with respect to this fact .<br><br></div><div class="gmail_extra">I was thinking to use DragonFly BSD for such a task , but it seems that it is not useful on that issue , because assumption about that files should be large is not so much suitable for an operating system . Then XFS seems to be a good alternative in Fedora Server edition .<br><br><br></div><div class="gmail_extra">Thank you very much .<br><br><br></div><div class="gmail_extra">Mehmet Erol Sanliturk<br><br><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><br></div></div>
<br><!--end of _originalContent --></div><br><br></body></html>