Globbing (was Re: HAMMER update 10-Feb-2008)

Aggelos Economopoulos aoiko at cc.ece.ntua.gr
Mon Feb 11 22:01:52 PST 2008


On Tuesday 12 February 2008, Justin C. Sherrill wrote:
> On Mon, February 11, 2008 7:25 pm, Simon 'corecode' Schubert wrote:
> 
> >> We have a solution (file limit) that doesn't address the problem (rm
> >> just doesn't work sometimes).
> >>
> >> Wasn't one of the slogans for DragonFly "Dragging BSD Unix, kicking and
> >> screaming, into the 21st century"?
> >
> > Sorry, I don't follow you there.  Could you elaborate?
> 
> We have two problems:
> - excess memory usage by handling too many files at once.
> - rm, ls, and other common programs will fail when handling many files.
> 
> xargs is a workaround for tiny memory spaces we don't have any more.

Not really. When you only operate once on each argument and then forget
about it, as in rm <n files>, there is no reason to first generate and
store all the arguments in memory and *then* start operating on them.
This requires that you have enough memory to store n * max_arg_size
bytes. Notice the 'n'. This means your space requirements grow linearly
with the number of arguments (I'm focusing on filenames here, so
max_arg_size is bounded by PATH_MAX). Removing the limit for the exec
system call does not change this fact.

Using xargs allows you to only treat k arguments at a time, so the
required memory space is bounded by k * max_arg_size + (space for
the operation, assumed to be constant here). Notice that k and
max_arg_size are also constant and n is irrelevant. This means that
your memory usage is always smaller than some constant. Therefore, if
you have enough memory to at least handle one argument at a time, you
can handle an arbitrarily large number of arguments (2^100? no problem,
at least not from the memory consumption of this pipeline ;) Larger
numbers of k give you better performance. Reading the manual page
for xargs should give you an idea how to affect k.

So, using xargs is fundamentally better as far as memory consumption is
concerned. Performance should never be an issue except maybe in
pathological cases. This silly analysis does not even take into account
the effects from all the memory allocation you'll have to do to for a
large number of arguments.

> I'm 
> expressing frustration that we don't have a solution that fixes both
> problems.  Or rather, that we have a solution that causes a problem.


Without meaning to offend anyone, this is a user education problem. Indeed,
it comes up regularly in unix users mailing lists around the world, along
with "But I can delete a root owned file that I don't have write permission
to". IMO, this is a case of using the wrong tool for the job. You can use a
brick as a hammer (and indeed if it's a small nail and the hammer's in the
other room, it makes sense to use it) but that doesn't mean you should keep
doing that and of course, you shouldn't be surprised when it breaks. Sorry,
but this is the best analogy I could come up with in a hurry :) I hope it
helps make the point clear.

HTH,
Aggelos





More information about the Kernel mailing list