View Full Version : Ink's Challenge!!!
inkedmn
05-21-2002, 06:49 PM
ok kids, it's time for the next challenge...
Here's what we're going to do, step-by-step:
- Open a text file (any will work but i figured we could use the GPL - http://www.gnu.org/licenses/gpl.txt ).
- Record the current time
- Iterate over the entire text file and (using a hash/dictionary) record how many times each letter/number/punctuation is used (do not count spaces).
- Record the current time
- Order the hash most -> least (most used letter is first, least used is last).
- Return the first 10 entries in the hash
- Show the time it took to iterate over the file and record the number of instances of each character.
now, GET TO WORK!!! :)
Dru Lee Parsec
05-21-2002, 07:21 PM
I like this one a lot.
GnuVince
05-21-2002, 07:36 PM
Yup, me too. One question: does our program have to be case sensitive or case insensitive?
inkedmn
05-21-2002, 07:50 PM
let's say case INsensetive just to make it less cumbersome.
if anybody disagrees, please speak up because i'm open to new ideas...
inkedmn
05-21-2002, 08:31 PM
Hey Vince!
you wanna benchmark this challenge too?
Dru Lee Parsec
05-21-2002, 08:43 PM
Can we ignore punctuation and just do letters?
Well, ok, I'm just asking because my letters are counting correctly but my punctuation isn't :(
I'll get back to work now.
inkedmn
05-21-2002, 08:46 PM
Originally posted by Dru Lee Parsec
Can we ignore punctuation and just do letters?
i honestly don't care either way, whatever the majority thinks...
Bradmont
05-21-2002, 10:02 PM
#include <iostream>
#include <fstream>
#include <cctype>
#include <time.h>
#include <algorithm>
using namespace std;
class character {
public: // yeah, I'm bad. Sue me.
long int count; // number of times the character showed up
char value; // what the character is
friend bool operator<(const character & c1, const character & c2);
};
bool operator<(const character & c1, const character & c2){ // so I can use the STL sort
return c1.count < c2.count;
}
int main(){
int starttime;
int endtime;
character characters[256]; // our hash
for (int i = 0; i < 256; i++){ // initialise the hash, we store the value because we sort the list later
characters[i].value = i;
characters[i].count = 0;
}
char temp;
ifstream in ("gpl.txt"); // open the file, too lazy to take the name as any sort of argument...
if (in.fail()){
cerr << "Error opening file." << endl;
return 1;
}
starttime = time( NULL); // get the initial time
while (!in.eof()){
in >> temp;
characters[tolower(temp)].count++; // each character's position in the hash is equal to its ascii value, at this point.
}
in.close(); // close the file
endtime = time( NULL); // get the end time
cout << "Read time: " << endtime - starttime << endl; // display the read time
sort(characters, characters+256); // sort
endtime = time( NULL); // get the end time
cout << "Read and sort time:" << endtime - starttime << endl; // display the total run time
cout << "Top 10 chars:" << endl;
for (int i = 0; i < 10; i++){
cout << characters[255-i].value << '\t' << characters[255-i].count << endl; // the last 10 elements in the list, because those are the ones that got sorted to the end.
}
return 0;
}
I didn't bother to get the time in anything smaller than seconds, so it says its run time is 0 ;)
Bradmont
05-21-2002, 10:03 PM
oh, here's the output:
Read time: 0
Read and sort time:0
Top 10 chars:
e 1644
t 1350
o 1315
i 1131
r 1090
a 950
n 919
s 880
h 599
c 536
Bradmont
05-21-2002, 10:05 PM
Originally posted by inkedmn
i honestly don't care either way, whatever the majority thinks... I think we shouldn't ignore punctuation. :) (can you guess why? :D )
xilica
05-21-2002, 10:07 PM
no punctuation calls for sloppy code
and when u go back through it looks the same and is hard to disect.
Bradmont
05-21-2002, 10:16 PM
Originally posted by xilica
no punctuation calls for sloppy code
and when u go back through it looks the same and is hard to disect.
I really don't see how. the only thing I'd have to change in my solution there is this:
characters[tolower(temp)].count++; // each character's position in the hash is equal to its ascii value, at this point.
to:
if (!ispunct(temp))
characters[tolower(temp)].count++; // each character's position in the hash is equal to its ascii value, at this point.
Bradmont
05-21-2002, 11:09 PM
oh, btw, for those who couldn't tell, mine's in c++
Bradmont
05-22-2002, 12:57 AM
ok, I updated my code for a (vastly) more accurate time measurement.
#include <iostream>
#include <fstream>
#include <cctype>
#include <time.h>
#include <sys/time.h>
#include <algorithm>
using namespace std;
class character {
public: // yeah, I'm bad. Sue me.
long int count; // number of times the character showed up
char value; // what the character is
friend bool operator<(const character & c1, const character & c2);
};
bool operator<(const character & c1, const character & c2){
// so I can use the STL sort
return c1.count < c2.count;
}
int main(){
timeval starttime;
timeval endtime;
character characters[256]; // our hash
for (int i = 0; i < 256; i++){ // initialise the hash, we store the
//value because we sort the list later
characters[i].value = i;
characters[i].count = 0;
}
char temp;
ifstream in ("gpl.txt"); // open the file, too lazy to take the name
//as any sort of argument...
if (in.fail()){
cerr << "Error opening file." << endl;
return 1;
}
gettimeofday(&starttime, NULL); // get the initial time
while (!in.eof()){
in >> temp;
characters[tolower(temp)].count++; // each character's position
//in the hash is equal to its ascii value, at this point.
}
in.close(); // close the file
gettimeofday(&endtime, NULL); // get the end time
cout << "Read time: " << (endtime.tv_sec +
(endtime.tv_usec * 0.000001F)) -
(starttime.tv_sec + (starttime.tv_usec * 0.000001F))
<< " seconds" << endl; // display the read time
sort(characters, characters+256); // sort
gettimeofday(&endtime, NULL); // get the end time
cout << "Read and sort time: " << (endtime.tv_sec +
(endtime.tv_usec * 0.000001F)) -
(starttime.tv_sec + (starttime.tv_usec * 0.000001F))
<< " seconds" << endl; // display the total run time
cout << "Top 10 chars:" << endl;
for (int i = 0; i < 10; i++){
cout << characters[255-i].value << '\t' <<
characters[255-i].count << endl;
// the last 10 elements in the list, because
//those are the ones that got sorted to the end.
}
return 0;
}
PrBacterio
05-22-2002, 01:18 AM
In OCaml:
(* prepare the counting arrays *)
let count_array = Array.make 256 0
let chars_array = Array.create 256 0
let _ = for i = 0 to 255 do chars_array.(i) <- i done
(* is a character whitespace? *)
let whitespace = function
| ' ' | '\n' | '\r' | '\t' | '\b' -> true
| _ -> false
(* count the characters *)
let count_from f_in =
try while true do
let c = Char.uppercase (input_char f_in) in
if not (whitespace c) then count_array.(int_of_char c) <- count_array.(int_of_char c) + 1
done with End_of_file -> ()
(* sort the results *)
let sort_results () = Array.sort (fun c d -> count_array.(d) - count_array.(c)) chars_array
(* main program *)
let main =
let f_input = open_in "gpl.txt" in
let read_time_start = Sys.time () in
(* read the file and increment the character counters *)
( try count_from f_input
with x -> close_in f_input; raise x);
let read_time = Sys.time () -. read_time_start in
close_in f_input;
let sort_start_time = Sys.time () in
(* sort *)
sort_results ();
let sort_time = Sys.time () -. sort_start_time in
(* print the results *)
Printf.printf "Read time: %2.4f\nSort time: %2.4f\n" read_time sort_time;
for i = 0 to 9 do
Printf.printf "%d. %c (%d times)\n" (i + 1) (char_of_int chars_array.(i)) count_array.(chars_array.(i))
done
Read time: 0.0000
Sort time: 0.0000
1. E (1644 times)
2. T (1350 times)
3. O (1315 times)
4. I (1131 times)
5. R (1090 times)
6. A (950 times)
7. N (919 times)
8. S (880 times)
9. H (599 times)
10. C (536 times)
Bradmont
05-22-2002, 01:27 AM
I think we're going to need to find a very slow machine to test these on... ;)
inkedmn
05-22-2002, 01:30 AM
ok, got it (with the help of a friend of mine who's ass better register here shortly)...
in python (duh)
#!/usr/bin/env python
# ink's contest, for cf.net
import time
dict = {}
try:
file = open('gpl.txt', 'r')
except Exception, e:
print e
begin = time.time()
stuff = file.read()
for item in stuff:
if item != ' ':
if dict.has_key(item) == 1:
dict[item] += 1
elif dict.has_key(item) == 0:
dict[item] = 1
items = [(v, k) for k, v in dict.items()]
items.sort()
items.reverse()
items = [(k, v) for v, k in items]
end = time.time()
total = end - begin
x = 0
for k, v in items:
if x >= 10:
break
else:
print "%s : %d" % (k, v)
x += 1
print "Took %f seconds." % total
and here's the output:
e : 1510
t : 1220
o : 1217
i : 1010
r : 981
a : 838
n : 834
s : 793
h : 550
c : 491
Took 0.110000 seconds.
:)
PrBacterio
05-22-2002, 01:58 AM
I did it in Lisp too for good measure:
(defpackage COUNTCHARS
(:nicknames CCS)
(:export "MAKE-CHARACTER-COUNT-TABLE"
"SORTED-CHAR-TABLE-BY"
"FILE-CHAR-COUNTS-TIMED"))
(in-package "COUNTCHARS")
(defun make-character-count-table (file-name)
(let ((counts (make-array char-code-limit :initial-element 0)))
(with-open-file (f-in file-name :direction :input)
(loop
(let ((c (read-char f-in nil)))
(if c
(if (graphic-char-p c)
(incf (aref counts (char-code (char-upcase c)))))
(return counts)))))))
(defun sorted-char-table-by (counts-array)
(let ((chars (make-array char-code-limit :initial-element #\space)))
(dotimes (i char-code-limit)
(setf (aref chars i) (code-char i)))
(sort chars #'(lambda (ch1 ch2) (> (aref counts-array (char-code ch1))
(aref counts-array (char-code ch2)))))
chars))
(defun file-char-counts-timed (file-name)
(let* ((counts (time (make-character-count-table file-name)))
(lchars (time (sorted-char-table-by counts))))
(dotimes (i 10)
(format t "~W. ~W (~W)" (1+ i) (aref lchars i) (aref counts (char-code (aref lchars i)))) (terpri))))
?(ccs::file-char-counts-timed "e:/proggis/tests/gpl.txt")
Total Execution time: 0.258914 seconds
Time spent garbage collecting: 0.0 seconds
Total Execution time: 0.015391 seconds
Time spent garbage collecting: 0.0 seconds
1. #\E (1644)
2. #\T (1350)
3. #\O (1315)
4. #\I (1131)
5. #\R (1090)
6. #\A (950)
7. #\N (919)
8. #\S (880)
9. #\H (599)
10. #\C (536)
NIL
GnuVince
05-22-2002, 10:27 AM
PrBacterio: That's some very nice looking code, congrats! By the way, how do you get all these colored sources?
PrBacterio
05-22-2002, 11:06 AM
Originally posted by GnuVince
PrBacterio: That's some very nice looking code, congrats! By the way, how do you get all these colored sources?
I wrote a small program for creating colored versions of source code in various languages in various output formats, e.g. UBB, irc and html. I think source code becomes almost unreadable without any syntax coloring; Ive become so used to it that its almost become an intrinsical property of programming languages to me. And ubb has a built-in feature to colorize PHP code; so as I saw that, and as it really bugged me that most source code posted on message boards is without syntax coloring (which I think of as really hard to read), I quickly slapped together this small utility. So far its got modules for C, C++, OCaml, Scheme and Lisp; but there are still some bugs in some of those.
Strike
05-22-2002, 11:59 AM
(just as a side note: I'm working on a Java version though Dru will likely beat me to it, I'm guessing)
PrBacterio, what is this utility of yours written in? It'd be cool to take one that used the vim syntax files (since they have tons for various languages), and produced output based on that. Anyway, if it's in a language I know, I might write modules for other languages that I know.
PrBacterio
05-22-2002, 12:02 PM
Strike: I wrote it in OCaml and I write the parsers/syntax modules in ocamllex.
Strike
05-22-2002, 12:12 PM
I have a question - it seems like this challenge is calling for an in-place sort of a dictionary or hash, which seems useless/pointless to me. Dictionary/hash searches are already O(1), which is why there's no need to sort them. Do you want it put into a sorted array?
inkedmn
05-22-2002, 12:46 PM
The sorting is just to make the output nicely ordered. Where/how the hash is sorted makes no difference to me...
does that answer your question? or did i totally miss the point?
GnuVince
05-22-2002, 02:53 PM
PrBacterio: Could you be nice enough to send me the source of that program? Thank you very much. (say, are you a professional programmer?)
Strike
05-22-2002, 03:39 PM
How come when I run it I get the same numbers as inkedmn, but no one else? (of course, mine is in Python as well, but ...)
1: 'e' = '1510'
2: 't' = '1220'
3: 'o' = '1217'
4: 'i' = '1010'
5: 'r' = '981'
6: 'a' = '838'
7: 'n' = '834'
8: 's' = '793'
9: 'h' = '550'
10: 'c' = '491'
Took 0.454274 seconds.
Code here:
#!/usr/bin/env python
def main():
from time import time
import re
fd = open("gpl.txt")
start = time()
# prelim stuff - compile a regex to recognize non-whitespace
nonwhitespace = re.compile("\S")
dict = {}
fd.seek(0)
for char in fd.read():
if nonwhitespace.match(char):
if dict.has_key(char):
dict[char] += 1
else:
dict[char] = 1
end = time()
# print the top 10
pairs = zip(dict.values(), dict.keys())
pairs.sort(paircmp)
top_ten = pairs[:10]
print top_ten
x = 1
for pair in top_ten:
print "%d: '%c' = '%d'" % (x, pair[1], pair[0])
x += 1
elapsed = end - start
print "Took %.6f seconds." % elapsed
def paircmp(a, b):
# does a reverse sort
if a[0] > b[0]:
return -1
elif a[0] == b[0]:
return 0
else:
return 1
if __name__ == "__main__":
main()
Bradmont
05-22-2002, 03:51 PM
Strike: Because everybody else did thiers case insensitively.
xilica
05-22-2002, 04:20 PM
Originally posted by GnuVince
PrBacterio: That's some very nice looking code, congrats! By the way, how do you get all these colored sources?
when i add color i use the forums built in reply system. when you reply you can change the color. yes it is a little bit more difficult but hey whatever works for you
inkedmn
05-22-2002, 04:25 PM
i'm sure he'd rather see the source for the syntax highlighting than change the color in the reply window for all the different functions/statements that appear in his code...
just a guess :)
Strike
05-22-2002, 04:49 PM
Ah, that was it Bradmont. Case insensitivity makes me get:
1: 'e' = '1644'
2: 't' = '1350'
3: 'o' = '1315'
4: 'i' = '1131'
5: 'r' = '1090'
6: 'a' = '950'
7: 'n' = '919'
8: 's' = '880'
9: 'h' = '599'
10: 'c' = '536'
Took 0.696814 seconds.
inkedmn
05-22-2002, 04:53 PM
here's what i get after changing to case-insensetivity:
e : 1644
t : 1350
o : 1315
i : 1131
r : 1090
a : 950
n : 919
s : 880
h : 599
c : 536
Took 0.231000 seconds.
Strike
05-22-2002, 04:59 PM
What's cool is that the top ten is almost the same as the top ten of all gauged occurences in the English language:
1.. E (12.7%)
2. T (9.1%)
3. A (8.2%)
4. O (7.5%)
5. I (7.0%)
6. N (6.7%)
7. S (6.3%)
8. H (6.1%)
9. R (6.0%)
10. D (4.3%)
(C is 12th, 2.8%)
Our percentages and rankings:
1. E (9.1%)
2. T (7.5%)
3. O (7.3%)
4. I (6.3%)
5. R (6.1%)
6. A (5.3%)
7. N (5.1%)
8. S (4.9%)
9. H (3.3%)
10. C (3.0%)
inkedmn
05-22-2002, 05:04 PM
Originally posted by Strike
What's cool is that the top ten is almost the same as the top ten of all gauged occurences in the English language
That's the idea...
this is going to be multi-step challenge (Dru came up with most of it)...
more to come...
Bradmont
05-22-2002, 05:33 PM
cool
Dru Lee Parsec
05-23-2002, 12:27 PM
OK, here's my entry:
import java.util.*;
import java.io.*;
public class TextHash {
public static void main(String[] args) {
if ((args.length == 0 ) || (args.length > 2)) {
System.out.println("\n\n");
System.out.println("Usage: java TextHash [flag] textFileName ");
System.out.println("\n Parameters: textFileName is the file to be parsed.\n");
System.out.println(" Flags:\n -t Text Only. Don't count punctuation ");
System.out.println("");
System.exit(0);
}
boolean textOnly = false;
String fileName = "";
if (args.length == 2) {
if (args[0].equalsIgnoreCase("-t")) {
textOnly = true;
}
fileName = args[1];
}
else {
fileName = args[0];
}
// debug info
System.out.println("fileName is " + fileName);
System.out.println("textOnly parse is " + textOnly);
File textFile = null;
BufferedReader bReader = null;
try {
textFile = new File(fileName);
bReader = new BufferedReader(new FileReader(textFile));
}
catch (Exception e) {
System.err.println("File: " + fileName + " not found");
System.exit(0);
}
long time1 = System.currentTimeMillis();
Hashtable hash = new Hashtable();
int lettercount = 0;
try {
String line = bReader.readLine();
// for each line
while (line != null) {
//System.out.println("line is >" + line);
line = line.toUpperCase();
// get each letter in the line.
for (int i = 0;i < line.length(); i++ ) {
lettercount++;
String sub = line.substring(i, i+1);
char c = sub.charAt(0);
//System.out.println("sub is : >" + sub + "< isLetter is " + Character.isLetter(c));
if ((textOnly == true) && (Character.isLetter(c) == false)) {
// then ignore it
}
else if (Character.isWhitespace(c)){
// ignore spaces
}
else {
MyCounter counter = (MyCounter)hash.get(sub);
if (counter == null) {
counter = new MyCounter(sub);
}
else {
counter.increment();
}
hash.put(sub, counter );
}
}
line = bReader.readLine();
}
}
catch (java.io.IOException ioe) {
System.err.println("IO Exception");
}
long time2 = System.currentTimeMillis();
long delta = time2-time1;
System.out.println("Parsing time is " +delta + "milliseconds");
MyCounter[] theArray = new MyCounter[hash.size()];
Enumeration keys = hash.keys();
int index = 0;
while (keys.hasMoreElements()) {
String key = (String)keys.nextElement();
//System.out.println(hash.get(key));
theArray[index] = (MyCounter)hash.get(key);
index++;
}
System.out.println("\nSorting");
long time3 = System.currentTimeMillis();
quickSort(theArray, 0, theArray.length -1 );
long time4 = System.currentTimeMillis();
long delta2 = time4-time3;
System.out.println("Sorting took " + delta2 + "milliseconds");
System.out.println("\nPrinting");
System.out.println("Total letters is " + lettercount);
for (int j = theArray.length - 1;j>= 0;j-- ) {
//double percent = theArray[j].getCount() / (lettercount/100);
//System.out.println("theArray[j].getCount() " + theArray[j].getCount());
//System.out.println(theArray[j] + " " + percent + "%");
System.out.println(theArray[j]);
}
System.out.println("letter count is " + lettercount);
}
private static void quickSort(MyCounter a[], int lo0, int hi0)
{
int lo = lo0;
int hi = hi0;
int mid;
MyCounter temp;
if ( hi0 > lo0)
{
mid = ((MyCounter)(a[ ( lo0 + hi0 ) / 2 ])).getCount();
while( lo <= hi )
{
while( ( lo < hi0 ) && ( ((MyCounter)(a[lo])).getCount() < mid ) ) {
++lo;
}
while( ( hi > lo0 ) && ( ((MyCounter)(a[hi])).getCount() > mid ) ) {
--hi;
}
if( lo <= hi )
{
temp = a[lo];
a[lo] = a[hi];
a[hi] = temp;
++lo;
--hi;
}
}
if( lo0 < hi )
quickSort( a, lo0, hi );
if( lo < hi0 )
quickSort( a, lo, hi0 );
}
}
}
// a helper class that keeps track of the count
class MyCounter {
int count;
String s;
public MyCounter(String x){
count = 1;
this.s = x;
}
public void increment(){
count++;
}
public int getCount(){
return count;
}
public void setCount(int count){
this.count = count;
}
public String toString(){
return s + " : " + count;
}
}
and here's my output
D:\Dev_Projects\myjava>java TextHash -t gpl.txt
fileName is gpl.txt
textOnly parse is true
Parsing time is 62milliseconds
Sorting
Sorting took 0milliseconds
Printing
Total letters is 17669
E : 1644
T : 1350
O : 1315
I : 1131
R : 1090
A : 950
N : 919
S : 880
H : 599
C : 536
D : 476
U : 434
L : 424
P : 381
F : 365
Y : 353
M : 350
G : 269
W : 228
B : 214
V : 120
K : 56
X : 26
Q : 10
J : 9
letter count is 17669
GnuVince
05-23-2002, 06:23 PM
I have something. It's not the cleanest way to do things, but it works:
let create_my_hash () =
let hash = Hashtbl.create 0 in
for pos = 97 to 122 do
Hashtbl.add hash (Char.chr pos) 0
done;
hash
let _ =
(* Open file for reading and create Hashtbl *)
let fd = open_in "gpl.txt" in
let myhash = create_my_hash () in
(* Start reading time *)
let read_t1 = Sys.time () in
(* Process the file *)
try
while true do
let mychar = Char.lowercase (input_char fd) in
if Hashtbl.mem myhash mychar then
Hashtbl.replace myhash mychar ((Hashtbl.find myhash mychar) + 1)
done
with End_of_file -> ();
(* End reading time and total time *)
close_in fd;
let read_t2 = Sys.time () in
let read_time = read_t2 -. read_t1 in
(* Sort *)
let sort_t1 = Sys.time () in
let arr = Array.make 26 (0, '_') in
let x = ref 0 in
Hashtbl.iter (fun k v ->
arr.(!x) <- (v,k);
x := !x + 1;
) myhash;
Sort.array (>) arr;
let sort_t2 = Sys.time () in
let sort_time = sort_t2 -. sort_t1 in
Printf.printf "Read time: %.2f\nSort time: %.2f\n" read_time sort_time;
for i = 0 to 9 do
Printf.printf "%c : %d\n" (snd arr.(i)) (fst arr.(i))
done
and output
Read time: 0.01
Sort time: 0.00
e : 1644
t : 1350
o : 1315
i : 1131
r : 1090
a : 950
n : 919
s : 880
h : 599
c : 536
It was hard but I got it. And I learned a lesson: Hash tables are hard to manipulate
inkedmn
05-23-2002, 09:06 PM
Bravo Vince! i knew you could do it! :)
Strike
05-23-2002, 10:11 PM
I propose that if you have to scroll through more than 2 screens worth of code that you attach your code for those of us who want to give it a shot or play with it. Sound fair? I mean, I don't mind big code postings, I just don't like having to paste them all :)
Dru Lee Parsec
05-24-2002, 12:34 PM
I think that was directed at me :|
Sure no problem. But then again, IT"S BEAUTIFUL CODE! Feel the beauty! Feel the love!
I particularly like the little helper class that contains the character and the count. And the quick sort that sorts objects instead of just int's, that's cool.
My Crypto challenge code will have a full GUI. SCHWEEET!
inkedmn
05-24-2002, 12:36 PM
Originally posted by Dru Lee Parsec
My Crypto challenge code will have a full GUI. SCHWEEET!
showoff ;)
Strike
05-24-2002, 02:19 PM
Dru: yeah, it was for your code. No hard feelings, just a thought :)
Dru Lee Parsec
05-24-2002, 03:19 PM
Oh I know. No worries.
But you must admit, the code is beautiful.
iDxMan
05-25-2002, 05:09 PM
Unfortunately this perl example requires the time::hires module..
#!/usr/bin/perl
die "Usage: $0 gpl.txt\n" if (!@ARGV);
use Time::HiRes qw(time);
$start = Time::HiRes::time();
while(<>) {
chomp;
tr/a-zA-Z//cd;
tr/a-z/A-Z/;
foreach $c (split(//)) {
$idx{$c}++;
}
}
foreach (sort {$idx{$b} <=> $idx{$a}} %idx) {
$i++;
print "$_ : $idx{$_}\n";
if ($i == 10) {
$stop = Time::HiRes::time();
print "Runtime: ", ($stop-$start),"\n";
exit;
}
}
E : 1644
T : 1350
O : 1315
I : 1131
R : 1090
A : 950
N : 919
S : 880
H : 599
C : 536
Runtime: 0.211485028266907
(yeah, its on a priddy slow p233 /64ram running far too much as is :D --- 0.0347909927368164 on my real box )
PrBacterio
05-26-2002, 09:44 AM
Originally posted by GnuVince
PrBacterio: Could you be nice enough to send me the source of that program? Thank you very much. (say, are you a professional programmer?)
Of course I could send it to you; in fact, I have simply attached it to this post (I dont know if this will work, if it doesnt simply post your email adress where I should send it to). But note that this program is pretty much a quick, ugly hack, so dont use it as an example :)
And no Im not a professional as in, having a job in the field; Im currently a math student. But Ive been programming for the better part of 10 years or so now so I guess Ive got quite some experience by now.
GnuVince
05-26-2002, 12:06 PM
Thanks, would you also happen to have a Makefile?
PrBacterio
05-26-2002, 01:49 PM
Originally posted by GnuVince
Thanks, would you also happen to have a Makefile?
Nope, I didnt bother to make one. As I said, its a quick thing I threw together when the need arose; as soon as each of the individual files compiled & worked once I never touched them again :)
Simply put all of the .mll files through ocamllex, and then compiler all of the .ml files. shl.ml needs all of the others, and all of the files need tshl.ml, but apart from that, the order does not matter. Oh and of course it also needs the labltk library.
GnuVince
05-26-2002, 01:58 PM
[vince@vincent: ~/prog/ocaml/colors]% ocamlc ./cppshl.cma ./ocshl.cma ./scmshl.cma ./tshl.cma labltk/labltk.cma shl.ml -o shl
File "shl.ml", line 194, characters 18-27:
Unbound value Tk.openTk
[vince@vincent: ~/prog/ocaml/colors]%
Any idea?
recluse
05-30-2002, 04:36 AM
For the most part I understand how iDxMan's code works but could someone please explain how this works, I don't understand how the sort works, esp in conjunction with foreach.
foreach (sort {$idx{$b} <=> $idx{$a}} %idx) {
phubuh
05-30-2002, 09:00 AM
Originally posted by recluse
For the most part I understand how iDxMan's code works but could someone please explain how this works, I don't understand how the sort works, esp in conjunction with foreach.
foreach (sort {$idx{$b} <=> $idx{$a}} %idx) {
phubuh@igloo ~$ perl recluse.pl
1 <=> 5 == -1
5 <=> 1 == 1
5 <=> 5 == 0
As you can see, the <=> operator returns 1 if LH > RH, -1 if LH < RH and 0 if LH == RH.
The 'sort' keyword also uses this method of determining how to sort the array. 'sort' performs the given subroutine on every element and if it returns 1, it moves the current element up in the sorted list, etc.
I hope you understand. It's a kickass function.
PrBacterio
05-30-2002, 11:58 AM
Originally posted by GnuVince
Any idea?
Hmm it seems it doesn't find all of labltk. Try it with the -I option, i.e. ocamlc -I labltl labltk.cma etc.
jemfinch
05-31-2002, 02:57 AM
For a more accurate time measurement in O'Caml, use Unix.gettimeofday.
Jeremy
ChefNinja
07-22-2002, 09:24 AM
Woop, content deleted :) Check below
ChefNinja: The string module already has strings representing letters in various forms: uppercase, lowercase, etc. rather than creating your own array, why not use that?
ChefNinja
07-22-2002, 06:20 PM
<-- hasn't learned string module yet :D
inkedmn
07-22-2002, 07:13 PM
http://www.python.org/doc/current/lib/module-string.html
ChefNinja
07-23-2002, 02:55 PM
How can I take the dictionary containing each letter and its corresponding number, and then sort it and take the top 10 :\
Edit: LOL. string.ascii_lowercase.... oh man, that beats typing out my own dictionary of the alphabet :D
Edit 2: Using my own dictionary is actually faster than using string.ascii_lowercase... yay, my first optimization? :)
inkedmn
07-23-2002, 03:20 PM
i put up an example on www.codeexamples.org on how to sort a dictionary by value (it's actually the code from this challenge, so it might be in this thread somewhere...)
ChefNinja
07-23-2002, 08:44 PM
SHUWEEEET! :D :D :D :D
Here it is, pretty fast if you ask me :)
#for cf.net
import time
def count(str, stime):
abc = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
dict = {}
n = 0
i = 0
while i < len(abc):
for letter in str:
if letter == abc[i]:
n = n + 1
dict[n] = abc[i]
n = 0
i = i + 1
ftime = time.time()
ntime = ftime - stime
sort(dict, ntime, abc)
def sort(dict, ntime, abc):
nlist = []
n = 0
i = 0
while n < len(abc):
nlist.append(abc[i])
i = i + 1
n = n + 1
nlist = dict.keys()
nlist.sort()
nlist.reverse()
x = 0
i = 0
while x < 10:
if dict.has_key(nlist[i]):
print dict[nlist[i]], nlist[i]
i = i + 1
x = x + 1
print "Took", ntime, "seconds"
f=open("gpl.txt", "r")
str = f.read()
stime = time.time()
count(str, stime)
Output:
e 1510
t 1220
o 1217
i 1010
r 981
a 838
n 834
s 793
h 550
c 491
Took 0.21899998188 seconds
ChefNinja
08-31-2002, 04:20 AM
Aww jeah, using a totally pointless class to do this ;)
import time
import string
class gplReader:
def __init__(self):
self.abc = string.ascii_lowercase
self.dict = {}
f=open("gpl.txt", "r")
self.str = f.read()
def myCount(self):
n = 0
i = 0
str = self.str
abc = self.abc
self.stime = time.time()
while i < len(abc):
for letter in str:
if letter == abc[i]:
n = n + 1
self.dict[n] = abc[i]
n = 0
i = i + 1
self.etime = time.time()
self.ntime = self.etime - self.stime
def mySort(self):
nlist = []
nlist = self.dict.keys()
nlist.sort()
nlist.reverse()
x = 0
i = 0
while x < 10:
if self.dict.has_key(nlist[i]):
print self.dict[nlist[i]], nlist[i]
i = i + 1
x = x + 1
print "Took %s seconds." % self.ntime
blah = gplReader()
blah.myCount()
blah.mySort()
yay :)
inkedmn
08-31-2002, 01:40 PM
nice work my friend :)
ChefNinja
08-31-2002, 03:59 PM
import time
import string
class gplReader:
def __init__(self):
f=open("gpl.txt", "r")
self.str = f.read()
self.str = string.lower(self.str)
self.abc = string.ascii_lowercase
self.dict = {'a':0, 'b':0, 'c':0, 'd':0, 'e':0, 'f':0, 'g':0, 'h':0, 'i':0, 'j':0, 'k':0, 'l':0, 'm':0, 'n':0, 'o':0, 'p':0,
'q':0, 'r':0, 's':0, 't':0, 'u':0, 'v':0, 'w':0, 'x':0, 'y':0, 'z':0,}
self.ndict = {}
self.nlist = []
def gplCount(self):
self.stime = time.time()
for letter in self.str:
if self.dict.has_key(letter):
self.dict[letter] += 1
else:
self.dict[letter] = 1
self.ftime = time.time()
self.ttime = self.ftime - self.stime
def gplSort(self):
n = 0
while n < len(self.abc):
temp = self.dict[self.abc[n]]
self.ndict[temp] = self.abc[n]
n = n + 1
self.nlist = self.ndict.keys()
self.nlist.sort()
self.nlist.reverse()
def gplPrint(self):
i = 0
for x in range(10):
if self.ndict.has_key(self.nlist[i]):
print self.ndict[self.nlist[i]], self.nlist[i]
i = i + 1
print "Took %s seconds." % self.ttime
blah = gplReader()
blah.gplCount()
blah.gplSort()
blah.gplPrint()
Major speed improvement :)
e 1644
t 1350
o 1315
i 1131
r 1090
a 950
n 919
s 880
h 599
c 536
Took 0.0199999809265 seconds.
jemfinch
09-05-2002, 02:54 AM
Really, one shouldn't use hash tables or dictionaries for this. Remember, characters are just small integers, so you can just use an array.
Here's generic code to count the most prevalent characters of *any* type:
#!/usr/bin/env python
import string
if __name__ == '__main__':
fd = file('gpl.txt', 'r')
table = [0] * 256
for c in fd.read():
table[ord(c)] += 1
table = zip(table, string.maketrans('', ''))
table.sort()
table.reverse()
for (i, letter) in table[:10]:
print '%s: %s' % (letter, i)
That's not case insensitive, though.
Here's something that more closely fulfills the specification:
#!/usr/bin/env python
import time
import string
if __name__ == '__main__':
fd = file('gpl.txt', 'r')
start = time.time()
table = [0] * 26
letters = fd.read()
read = time.time()
for c in letters:
if c.isalpha():
table[ord(c.upper()) - 65] += 1
table = zip(table, string.uppercase)
table.sort()
table.reverse()
readandsort = time.time()
print 'Time to read: %s' % (read - start)
print 'Time to iterate, tabulate, and sort: %s' % (readandsort - read)
print 'Total time: %s' % (readandsort - start)
for (i, letter) in table[:10]:
print '%s: %s' % (letter, i)
Here's what I get:
Jeremy Fincher@FUNCTOR ~/src/my/python/competitions
$ ./inkie.py
Time to read: 0.000889897346497
Time to iterate, tabulate, and sort: 0.105870008469
Total time: 0.106759905815
E: 1644
T: 1350
O: 1315
I: 1131
R: 1090
A: 950
N: 919
S: 880
H: 599
C: 536
Jeremy
jemfinch
09-05-2002, 03:08 AM
This one's slightly faster:
#!/usr/bin/env python
import time
import string
if __name__ == '__main__':
fd = file('gpl.txt', 'r')
start = time.time()
table = [0] * 26
letters = fd.read()
read = time.time()
for c in letters.upper():
if c.isalpha():
table[ord(c) - 65] += 1
table = zip(table, string.uppercase)
table.sort()
table.reverse()
readandsort = time.time()
print 'Time to read: %s' % (read - start)
print 'Time to iterate, tabulate, and sort: %s' % (readandsort - read)
print 'Total time: %s' % (readandsort - start)
for (i, letter) in table[:10]:
print '%s: %s' % (letter, i)
Jeremy
jemfinch
09-05-2002, 03:24 AM
My timing wasn't fulfilling the spec:
if __name__ == '__main__':
fd = file('gpl.txt', 'r')
start = time.time()
table = [0] * 26
letters = fd.read()
read = time.time()
for c in letters.upper():
if c.isalpha():
table[ord(c) - 65] += 1
tabulate = time.time()
table = zip(table, string.uppercase)
table.sort()
table.reverse()
readandsort = time.time()
print 'Time to read: %s' % (read - start)
print 'Time to iterate and tabulate: %s' % (tabulate - read)
print 'Total time: %s' % (readandsort - start)
for (i, letter) in table[:10]:
print '%s: %s' % (letter, i)
Jeremy
jemfinch
09-05-2002, 03:30 AM
And the new, most improved, probably-won't-get-faster-than-this version:
#!/usr/bin/env python
import time
import string
ascii = string.maketrans('', '')
def delete(s, deletes):
return s.translate(ascii, deletes)
nonUppercase = delete(ascii, string.uppercase)
if __name__ == '__main__':
fd = file('gpl.txt', 'r')
start = time.time()
table = [0] * 26
letters = delete(fd.read().upper(), nonUppercase)
read = time.time()
for c in letters:
table[ord(c) - 65] += 1
tabulate = time.time()
table = zip(table, string.uppercase)
table.sort()
table.reverse()
readandsort = time.time()
print 'Time to read: %s' % (read - start)
print 'Time to iterate and tabulate: %s' % (tabulate - read)
print 'Total time: %s' % (readandsort - start)
for (i, letter) in table[:10]:
print '%s: %s' % (letter, i)
Jeremy
Uranium-235
09-05-2002, 06:55 PM
<?php
//Paul "Uranium-235" Linebarger
//remote http method
//yes this will take more time then wanted, but it dosent require a local file
function getmicrotime()
{
$mtime = explode(" ",microtime());
return $mtime[1] + $mtime[0];
}
$starttime = getmicrotime();
$file = file("http://www.gnu.org/licenses/gpl.txt");
for($i = 0; $i < count($file); $i++)
{
for($k = 65; $k <= 122; $k++)
{
$ascii = chr($k);
$letter[strtolower($ascii)] += substr_count($file[$i], $ascii);
}
}
arsort($letter);
$letter = array_slice($letter, 0, 10);
while (list ($key, $val) = each ($letter))
{
echo "$key : $val
\n";
}
$endtime = getmicrotime() - $starttime;
print("
Final time: $endtime");
?>
Output: http://dynamic2.gamespy.com/~extreme/scripts/gplcount.php
Uranium-235
09-05-2002, 10:54 PM
<?php
//Paul "Uranium-235" Linebarger
//local method
//faster way, local file required
if(!is_file("gpl.txt"))
die("0n0! teh file is missing! I bet Gryph took it! AFTER HIM!"); // ;)
function getmicrotime()
{
$mtime = explode(" ",microtime());
return $mtime[1] + $mtime[0];
}
$starttime = getmicrotime();
$filestr = implode('', file("gpl.txt"));
for($k = 65; $k <= 122; $k++) //ascii 65 - 122 is a-zA-Z
{
$ascii = chr($k);
$letter[strtolower($ascii)] += substr_count($filestr, $ascii);
}
arsort($letter);
$letter = array_slice($letter, 0, 10);
while (list ($key, $val) = each ($letter))
{
echo "$key : $val
\n";
}
$finaltime = getmicrotime() - $starttime;
print("
Final time: $finaltime");
?>
Output: http://dynamic2.gamespy.com/~extreme/scripts/gplcount2.php
Uranium-235
09-05-2002, 10:56 PM
ahhahaha I just exposed a bug, it will print the html if you insert a smilie into the php code (hence the <img src... after die())
hahahha
imported_Gryphon
09-06-2002, 01:34 AM
:P
TheLinuxDuck
09-06-2002, 09:59 AM
Here's my own go. I tried to make it compact, thus not-very-readable, but I'm fine with that. (= I thought about using a module that gets more descriptive time, but I've not seen one that works quickly, so I said "BAHHH!!!".
#!/usr/bin/perl # strict and warnings omitted only for length
use Benchmark;
my(%cnt, @list, $stT, $endT);
my($gpl) = "gpl.txt";
$stT = new Benchmark;
open IN, $gpl or die "Can't open '$gpl': $!\n";
while(<IN>) { $_!~/^\n|\s$/?$cnt{lc($_)}++:0 foreach(split(//,$_)); }
close IN;
$endT = new Benchmark;
@list = sort { $cnt{$b} <=> $cnt{$a} } keys %cnt;
print "'$list[$_]': $cnt{$list[$_]}\n" for(0..9);
print "It took ", timestr(timediff($endT, $stT)), "\n";
exit;
Word ya mugs.
vBulletin® v3.7.0, Copyright ©2000-2009, Jelsoft Enterprises Ltd.