View Full Version : Crawling the file system
kiwipenguin
10-29-2002, 11:08 PM
Does anyone have any Python code that recursively crawls the directory structure right down to the lowest directory? I have a Perl script that does it, but it's ugly, and doesn't like directories with names like "a.bob", not much use when that's the syntax all my students' home directoriey have.
Some bright spark on my school board decided to have "random inspections" of students files. I figured I could save a lot of time if Python went out and scanned all the files and left me a report of files that need human review.
Damn teachers! :eh:
inkedmn
10-30-2002, 12:33 AM
check out os.path.walk()
here's something i whipped up awhile back to search my mother-in-law's hdd for all files that end with the extension passed as an argument
import os.path, time
from sys import argv
filelist = []
if len(sys.argv) == 2:
ending = sys.argv[1]
else:
print "Usage: python %s <file extension>" % sys.argv[0]
def append(arg, dirname, fnames):
for filename in fnames:
if filename.endswith(ending):
filelist.append(dirname+filename)
start = time.time()
os.path.walk('c:\\', append, None)
end = time.time()
print "total files:", len(filelist), "ending in", ending
print "total time:", (end - start), "seconds"
that's a pretty basic example, but it should get you started...
kiwipenguin
10-30-2002, 05:06 AM
Why is it everything I used to do in Perl is a million times easier in Python? 8)
One problem tho.
D:\programs\Python>python crawl.py
File "crawl.py", line 16
os.path.walk('c:\', append, None)
^
SyntaxError: invalid token
What's up with that? I should probably mention I'm running this under Windows with Python 2.2
Strike
10-30-2002, 08:45 AM
Looks like an indentation error to me, could you post/paste the entire file?
GnuVince
10-30-2002, 10:41 AM
I know nobody cares, but since I did a similar script last week to search for .exe files in the students directories at the school where I work, I'll give you the Ruby version:
require "find"
Find.find("\\\\serveurbsf\\eleves") { |file|
puts file if file =~ /exe$/i
}
A more generic version would be:
require "find"
if ARGV.length != 2 then
puts "Usage: #$0 <path> <file extension>"
exit 1
end
path = ARGV[0]
ext = ARGV[1]
Find.find(path) { |file|
puts file if file =~ /#{ext}$/i
}
Thanks for letting me a little bitch.
/me goes back to play with his imaginary girlfriend
inkedmn
10-30-2002, 03:17 PM
make sure you're importing os.path, not just os...
recluse
10-30-2002, 03:32 PM
inkedmn I must say I'm impressed with how far you've come with your knowledge of Python. It seems just like yesterday you were asking the questions. Now you're answering them with great compentency. *tear forms*
GnuVince: Do those imaginary gf's waste much time? Can I get one that not only loves me dearly and gives me shoulder rubs, but also motivates me?
Ok I'll go back to my cave now. Have fun duders.
inkedmn
10-30-2002, 03:42 PM
thanks man, that's very nice of you to say :)
inkedmn
10-30-2002, 03:55 PM
ok, fixed that code:
import os.path, time
import sys
filelist = []
if len(sys.argv) == 2:
ending = sys.argv[1]
else:
print "Usage: python %s <file extension>" % sys.argv[0]
sys.exit(0)
def append(arg, dirname, fnames):
for filename in fnames:
if filename.endswith(ending):
filelist.append(dirname+filename)
start = time.time()
os.path.walk('c:\\', append, None)
end = time.time()
print "total files:", len(filelist), "ending in", ending
print "total time:", (end - start), "seconds"
that works :)
[edit]
ok, for some reason, vbulletin is removing the second backslash in the "os.path.walk(...)" line. it should look like this:
c:\ \ (remove the space between the backslashes)
Strike
10-30-2002, 05:04 PM
Originally posted by inkedmn
[edit]
ok, for some reason, vbulletin is removing the second backslash in the "os.path.walk(...)" line. it should look like this:
c:\ \ (remove the space between the backslashes) [/B]
Hint: use raw strings where slashes may be problematic. :)
vBulletin® v3.7.0, Copyright ©2000-2009, Jelsoft Enterprises Ltd.