View Full Version : Unix.lseek and lines
GnuVince
07-22-2002, 12:28 AM
I wish to make a program that will go to the 7th line of a text file, take what's on that line, and print it. Right now, I have this:
# #load "unix.cma";;
# open Unix;;
# let a = openfile "words.txt";;
val a : Unix.open_flag list -> Unix.file_perm -> Unix.file_descr = <fun>
# let a = openfile "words.txt" [O_RDONLY] 400;;
val a : Unix.file_descr = <abstr>
# lseek a 7 SEEK_SET;;
- : int = 7
# let b = in_channel_of_descr a;;
val b : in_channel = <abstr>
# let c = input_line b ;;
val c : string = "aal"
But lseek uses characters, not lines from what I can see, so how would I go on the 7th line? Also, if there's something "unclean" in my procedure, tell me please.
phubuh
07-22-2002, 03:29 AM
Iterate through it and look for newlines. :D
I am the master of kludges.
jemfinch
07-22-2002, 04:09 AM
I think you'd be better off by writing a function that returned a list of the lines in the file, converting that list to an Array, and referencing cell 6.
Jeremy
PrBacterio
07-22-2002, 05:52 PM
Originally posted by jemfinch
I think you'd be better off by writing a function that returned a list of the lines in the file, converting that list to an Array, and referencing cell 6.
Jeremy
Actually I think phubu's suggestion to skip through the first 6 newline characters in the file and then read the following line would be better, as that would be more efficient and neither of the two suggestions is any cleaner or more elegant for this specific purpose.
I would write this as a general function "seek_to_line" or somesuch, i.e.
# let seek_to_line ch l =
seek_in ch 0;
for i = 1 to l - 1 do
try while input_char ch != '\n' do () done
with End_of_file -> ()
done;;
val seek_to_line : in_channel -> int -> unit = <fun>
# let f=open_in"e:/compiler/parser.mly";;
val f : in_channel = <abstr>
# seek_to_line f 7;;
- : unit = ()
# input_line f;;
- : string = "%token TKbegin TKend TKcomma TKsemicolon TKcolon"
GnuVince
07-22-2002, 06:28 PM
Hey PrBacterio! Long time no see! Glad to see you're still lurking around!
jemfinch
07-22-2002, 10:00 PM
Originally posted by PrBacterio
Actually I think phubu's suggestion to skip through the first 6 newline characters in the file and then read the following line would be better, as that would be more efficient and neither of the two suggestions is any cleaner or more elegant for this specific purpose.
But phubu's suggestion isn't generally useful; mine is.
Anytime you're searching through a file for a specific line, you're almost always sure to be going for more than one line. In general, in fact, you'll want all the lines, which is why a readlines function is probably the most useful.
If you're worried about memory efficiency (i.e., you'll be working with big files with which it isn't practical to keep the entire contents in memory) then I think there's a much better (more general, cleaner, more functional) solution than the one you posted.
Here are the two function signatures, I don't feel like writing the actual functions (which would surely have syntactical errors and whatnot since I've been coding mostly in Python lately.)
readlines : Unix.file_descr -> string list
forlines : Unix.file_descr -> (string -> 'a option) -> 'a
"readlines" takes a file descriptor and returns a list of the lines, obviously.
"forlines" takes a function which is called with each line as the line is processed (dynamic programming could be used to cache lines so subsequent calls to the function don't have to re-read the file). The function returns a 'a option -- if it's Some v, then v is return, if it's None, then the function is called again with the next line.
It could be used to solve GnuVince's problem like this:
let get_line i =
let v = ref i in (fun s -> if !v = 0 then Some s else v := pred !v; None)
forlines fd (get_line 6)
I hope the code there is correct, but the idea should be easily understandable.
In my opinion, it's a much more functional and cleaner solution to the problem than seek_to_line. and it's certainly (objectively) more general.
As a small point of note, I don't think End_of_file should be caught in your seek_to_line function -- it's an error condition and as such should be propogated to the rest of the program.
Jeremy
(EDIT: buggy code)
GnuVince
07-22-2002, 10:22 PM
I think PrBacterio's function is much more useful to me. I have a file containing half a million words and I want to randomly select only 10. I don't see why I should use a readlines function, it would be a waste of memory (didn't you say that a list element takes 3 words of memory?)
jemfinch
07-23-2002, 01:19 AM
Originally posted by GnuVince
I think PrBacterio's function is much more useful to me. I have a file containing half a million words and I want to randomly select only 10. I don't see why I should use a readlines function, it would be a waste of memory (didn't you say that a list element takes 3 words of memory?)
Did you read my whole post? I offered a general, clean approach to doing exactly what you want without populating an entire list. It's far more general and far cleaner than seek_to_line.
Jeremy
jemfinch
07-23-2002, 01:37 AM
Ok, since AIM conversations forced me to write forlines myself (which should probably be renamed "withlines" to be a little closer to what it's doing), here's the code for it:
let rec forlines fd f =
let s = input_line fd in
match f s with
| None -> forlines fd f
| Some v -> v
If I was really writing it, I'd probably rewrite it to save the original offset and lseek back to it when I was done, but this was just off-the-cuff.
Here are some examples on how to use forlines:
let rec readlines fd =
let l = ref [] in
try
forlines fd (fun s -> l := (s :: !l); None)
with End_of_file ->
List.rev (!l)
let rec get_line fd i =
let n = ref i in
forlines fd (fun s -> if !n = 0 then Some s else n := (pred !n); None)
There you go :)
Jeremy
vBulletin® v3.7.0, Copyright ©2000-2009, Jelsoft Enterprises Ltd.