Globbing (was Re: HAMMER update 10-Feb-2008)

Tue Feb 12 06:34:40 PST 2008

On Tuesday 12 February 2008, Chris Turner wrote:
> Aggelos Economopoulos wrote:
> > 
> > So, using xargs is fundamentally better as far as memory consumption is
> > concerned. Performance should never be an issue except maybe in
> > pathological cases. This silly analysis does not even take into account
> > the effects from all the memory allocation you'll have to do to for a
> > large number of arguments.
> > 
> 
> just being pedantic but ignoring performance while discussing memory 
> seems a bit short sighted - for e.g. 1M files an 'xargs -L' will create
> 1M fork()'s instead of 1, which will change the execution time 
> significantly..

My point was (in response to Justin's comment about the tiny memory spaces
of the past) that you can fill memory of any size. Not to mention that I'm
using all 1G of RAM on my desktop and a few hundread Megs of swap, so if
I run some script that needlessly forces everything out to swap a) it will
take a large performance hit b) I'm going to be a bit annoyed at the author.

As for performance, yes, I'm not new to mailing lists, I knew when I sent
an email just to clarify the memory issue *without* covering all aspects
that someone with more time than I have was going to pick this up :) I'm
sure you realize you're criminally inefficient if you use rm <1M files>
(what a convenient number btw ;),assuming there was no limit, for at least
a couple of reasons (sorry, I'm not going to bother coming up with more and,
yes, you can come up with counterexamples. Please only mention any *real*
issues.)

1. you can do the same thing without wasting many megabytes of memory by
   piping (for example) to ruby -ne 'File.unlink($_.chop)' (this is easy
   to change for \0 terminated filenames). Yes I know. Not everyone knows
   how to program, perhaps someone should patch rm to read its standard
   input?

2. where did you get those 1M pathnames anyway? Obviously from a program;
   piping this output to xargs rm allows you to run the deletion and
   the path generation *in parallel* (your program is probably waiting for
   the disk more than half the time anyway) so even then, this should run
   faster than rm <1M files>. Please test if unsure. Also do the obvious
   optimization of doing rm -r when removing a directory. I think this would
   help the original poster as well.

   The best (and typical) case would be if you're using find (just add
   a -delete).

> but whatever.. this thread is annoying me :)

I hope you'll agree that the issue here is not the time/space complexity
(or the size of the factors in them). This was just to get everyone on
the same page. The purpose should be to make it clear that command line
arguments are for interactive input (hence rm uses it for pathnames) and
batch input should only be fed through a file (nobody's complaining that
wc `cat file` doesn't work).

Aggelos