PDA

View Full Version : Dru's 2nd challenge


Dru Lee Parsec
05-20-2002, 01:26 PM
OK folks, we've had a lot of people writing some very interesting code that randomized 1 million integers. In fact, I've been quite impressed at all the different languages and approaches that I've seen. Good job to every one of you. :goodjob:

So the first challenge had us creating an array (list, vector, collection) of integers from 0 to 999999. We then had to randomize them in such a way as to have no repeatitions. That leads us directly to programming challenge #2:

Sort those suckers!

Yep, now that we have a million integers in random order the new challenge is:

1) start a new timer, or get the current time or somehow mark the time.
2) sort the million integers.
3) get the time
4) Show the time it took to sort a million integers
5) Print out the first 100 integers to show that they're sorted.

ALL sorting code must be implemented between the first time stamp and the second time stamp. No "preparation" code is allowed before the first time stamp. ;)

You may write your code in ANY LANGUAGE YOU WANT. You may use any sorting algorithm you want.

Good luck folks. Have fun with it. :D

GnuVince
05-20-2002, 02:13 PM
Hehehehe :) I don't know who that new rule was for ;)

PrBacterio
05-20-2002, 02:14 PM
So its an array of exactly 1,000,000 elements? In which each number between 1 and 1,000,000 occurs exactly one time? Now that's an easy one...

In C:
/* takes an array arr of exactly
* n integers between 1 one n,
* where each integer occurs exactly one
* time, and sorts it in-place */
void sort_again(int n, int *arr)
{
int i;
for(i=0; i<n; ++i) arr[i] = i+1;
}
This works perfectly as long as the preqrequisites of the contract of this function are fulfilled.

Otherwise, as its a completely random array, a simple quick- and/or mergesort should give optimal results. So its a one-liner in most languages .... in Lisp:
(sort arr #'<)

In OCaml:
Sort.array ( < ) arr

GnuVince
05-20-2002, 02:43 PM
let _ =
(* Creates an array of one million elements containing all zeros, so that
all elements are integers (strict typing is forcing me)
*)
let capacity = 1000000 in
let myarray = Array.make capacity 0 in

(* Initialize Random *)
Random.self_init ();

(* Put values from 0 to 999,999 in each array element *)
for i = 0 to (capacity-1) do myarray.(i) <- i done;

(* Let's shuffle the whole thing (I made it like Dru) *)
let t = ref 0 in
let x = ref 0 in
for j = 0 to (capacity-1) do
x := Random.int (capacity-1);
t := myarray.(j);
myarray.(j) <- myarray.(!x);
myarray.(!x) <- !t;
done;

(* Get the start time. *)
let time1 = Sys.time () in

(* Sort the array *)
(* Array.stable_sort compare myarray; *)
Sort.array (<) myarray;

(* Get the end time *)
let time2 = Sys.time () in

(* Get the difference (-. is used because we are working with floats) *)
let delta = time2 -. time1 in

(* Print the time it took *)
Printf.printf "Sorting %d numbers took %.2f seconds\n" capacity delta;
print_endline "Here are the first 100";
flush stdout;

(* Print the first 100 elements *)
for p = 0 to 99 do
Printf.printf "%d " myarray.(p);
done;
print_newline ()


I get times in the range of 2.94 seconds to 3.12 seconds.

inkedmn
05-20-2002, 02:45 PM
here it is in python (sorting took about 12 seconds):


#!/usr/bin/env python

# challenge 2

import time, random

list = [x for x in range(1000000)]
random.shuffle(list)
begin = time.time()
list.sort()
end = time.time()
total = end - begin
for num in range(100):
print list[num],
print "\nSorting took %d seconds." % total


:)

GnuVince
05-20-2002, 02:53 PM
inkedmn: why to you use:

myarray = [x for x in range(1000000)]

? Just use:

myarray = range(1000000)

inkedmn
05-20-2002, 03:38 PM
good point...


#!/usr/bin/env python

# challenge 2

import time, random

list = range(1000000)
random.shuffle(list)
begin = time.time()
list.sort()
end = time.time()
total = end - begin
for num in range(100):
print list[num],
print "\nSorting took %d seconds." % total


thanks vince :)

Bradmont
05-20-2002, 03:38 PM
Originally posted by PrBacterio
So its an array of exactly 1,000,000 elements? In which each number between 1 and 1,000,000 occurs exactly one time? Now that's an easy one...

In C:
/* takes an array arr of exactly
* n integers between 1 one n,
* where each integer occurs exactly one
* time, and sorts it in-place */
void sort_again(int n, int *arr)
{
int i;
for(i=0; i<n; ++i) arr[i] = i+1;
}
This works perfectly as long as the preqrequisites of the contract of this function are fulfilled.


you're not actually sorting those... you're just recreating the original list.

PrBacterio
05-20-2002, 03:48 PM
Originally posted by Bradmont


you're not actually sorting those... you're just recreating the original list.

Oh wow. You don't say?

Dru Lee Parsec
05-20-2002, 05:39 PM
So its an array of exactly 1,000,000 elements? In which each number between 1 and 1,000,000 occurs exactly one time? Now that's an easy one.

Ok, Although that's a creative solution the intent of the competition is to sort the actual random data.

So I'll add an additional requirement that we're actually sorting the randomized array. Also, no fair using the integer as the index into a new array. Obviously since we have 1 million integers from 0 to 999,999 then number 12 should go into the 12th index of the array. But I want to see an actual sorting algorithm.

Good luck.

GnuVince
05-20-2002, 05:52 PM
Is using a builtin function ok?

inkedmn
05-20-2002, 05:54 PM
Originally posted by GnuVince
Is using a builtin function ok?

i was just going to ask the same thing...

Dru Lee Parsec
05-20-2002, 07:17 PM
Oh I think that would be OK, But you may find it faster to write your own. I know that in our first programming challenge we found that it was much faster to "roll our own" than to use a built in task.

In any case, you'd learn more by writing your own sort routine.

Remember, the prize you win by competing in this programming challenge is to learn more about coding.

GnuVince
05-20-2002, 07:57 PM
Hrmmm... I can't see any clean way to sort an array...

xilica
05-20-2002, 08:04 PM
Are the integers sorted ascending or just plain randomly?

GnuVince
05-20-2002, 08:15 PM
xilica: you definition of sorted includes the word random? Like [2, 3, 1] are in order?

xilica
05-20-2002, 08:19 PM
gnuvince, im a little confused... when it says sort does Dru want us to sort like this : 1,2,3,4,5,6,
or 1,456, 4, 214321, 3?

GnuVince
05-20-2002, 08:32 PM
He said, take [2;3;0;1] and sort it (this means that it looks like: [0;1;2;3])

GnuVince
05-20-2002, 09:00 PM
Hrmmm... I'm leaving my solution as is for now. Using Bubble sort, takes WAY too long (stopped it after 15 inutes). QuickSort seems way to complicated.

Bradmont
05-20-2002, 09:17 PM
quicksort isn't that complicated:

take any element in the set (the pivot), and sort all other elements into two sets, those greater and those less than the pivot (those equal to it can go in either, but be consistant). Now do the same with each of the two subsets, and so on, until everything is sorted.

inkedmn
05-20-2002, 11:33 PM
Originally posted by Bradmont
quicksort isn't that complicated:

take any element in the set (the pivot), and sort all other elements into two sets, those greater and those less than the pivot (those equal to it can go in either, but be consistant). Now do the same with each of the two subsets, and so on, until everything is sorted.

where in the list/array would you start?

wouldn't it be easier to, say, go two by to through the list and (if the first/left number is greater than the right/second number, switch them? continue doing this until the left number is always smaller?

just a thought...

Bradmont
05-20-2002, 11:41 PM
Originally posted by inkedmn


where in the list/array would you start?

wouldn't it be easier to, say, go two by to through the list and (if the first/left number is greater than the right/second number, switch them? continue doing this until the left number is always smaller?

just a thought...
Yes, that (the bubblesort) is easier, but it's also a lot slower. Bubblesort is an O(n^2) algorithm, whereas quicksort is O(nlog(n)). What that means is that for a data set of n items, the slowest performance of a bubblesort will vary directly as the square of n (eg, for a 2 element list, the sort will take 4 time units, for a 3 element list it'll be 9, etc), whereas for a quicksort it'll be n * log(n) (much more efficient).

Bradmont
05-20-2002, 11:42 PM
oh, as for starting the quicksort, it doesn't matter. You can choose any element as the pivot, though it's customary to use the first element in the list (or sublist)

GnuVince
05-20-2002, 11:46 PM
What the heck do O(n^2) and O(nlog(n)) mean?

inkedmn
05-20-2002, 11:52 PM
Bradmont-
i want you to know that if your goal was confusing the living hell out of me, you were WILDLY successful.

now, that being said...

WHA!?!?!?!?

GnuVince
05-21-2002, 12:01 AM
I gave it a shot with a quick sort. It took 32 seconds, so Sort.array (>) myarray is still faster. I used Doug Bagley's implementation: http://www.bagley.org/~doug/ocaml/Notes/okoans_answered.shtml

Bradmont
05-21-2002, 12:03 AM
Here's an article about big O notation. It uses matlab code for examples, but the code is pretty easily understandable.
http://www.me.berkeley.edu/~e77/lecnotes/ch20/ch20.htm

Bradmont
05-21-2002, 12:06 AM
Originally posted by GnuVince
I gave it a shot with a quick sort. It took 32 seconds, so Sort.array (>) myarray is still faster. I used Doug Bagley's implementation: http://www.bagley.org/~doug/ocaml/Notes/okoans_answered.shtml

That's not really much of an argument, because you don't know what algorithm Sort.array uses. Many generic library sort functions are quicksorts anyway, and they are often optimised for speed, so it's not too much of a wonder that the library is faster.

inkedmn
05-21-2002, 12:09 AM
/me checks out the python source for the library sort function...

GnuVince
05-21-2002, 12:11 AM
Array.sort uses heap sort and Array.stable_sort uses merge sort, so it's probably one of those (caml.inria.fr is down at the moment (been for quite a while actually...) so I can't download the source to tell you)

Bradmont
05-21-2002, 12:15 AM
Heapsort is another O(n log(n)) sort, so it should have similar results to the quicksort, but the initial state of the dataset will have an effect on the run time (for example, if you use a quicksort that always takes the first element in an array/list as the pivot, an already sorted list will yield the worst performence).

Dru Lee Parsec
05-21-2002, 06:44 PM
OK, here's the output from my first attempt:


randomizing 1000000 numbers took : 328 milliseconds
Here are the first 100
112923 481695 888154 936438 367919 983901 471815 943639 725621 572749
682039 826316 831247 915030 448208 61229 565117 441568 478600 264485
320369 80179 48130 529571 451050 956434 310135 640211 358729 16863
899596 702904 378842 42012 813432 368421 783179 218388 18300 671137
81834 686003 7028 826869 7764 831099 999435 165087 447999 82107
693895 449283 347557 284739 379449 719236 138355 847346 356716 658820
861449 425158 198478 704421 937834 106186 462512 434906 690427 300246
62632 335102 407209 806424 700699 77114 258908 156861 969341 784939
428532 288491 274104 364835 58441 451318 835377 561013 172024 615944
987935 442337 439734 779860 290841 199960 921227 690864 870349 569433
Here are the last 10
155990 154209 999997 373294 186896 116902 308072 249991 262425 999990
Now we're sorting
Sorting 1000000 numbers took 734 milliseconds
Here are the first 100
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99


I'll post my code after I've optimized it.

Dru Lee Parsec
05-21-2002, 06:51 PM
OK, got it down to 610 milliseconds:


import java.util.*;

public class RandomTest {

private static int initialCapacity = 1000000;
private static int[] list = null;

public static void main(String[] args) {
list = new int[initialCapacity];
for (int i= 0;i< initialCapacity ;i++ ) {
list[i] = i;
}

Random random = new Random();
long time1 = System.currentTimeMillis();
// now randomize them
int t = 0;
int upperlimit = (int)(initialCapacity /2) +1;
for (int j= 0;j< upperlimit ;j++ ) {
int x = random.nextInt(initialCapacity);
t = list[j];
list[j] = list[x];
list[x] = t;
x = random.nextInt(initialCapacity);
t = list[upperlimit - j];
list[upperlimit - j] = list[x];
list[x] = t;

}


long time2 = System.currentTimeMillis();
long delta = time2 - time1;
System.out.println("randomizing " + initialCapacity + " numbers took : " + delta + " milliseconds");

System.out.println("Here are the first 100");
for (int p=0;p<10;p++ ) {
for (int q=0;q<10;q++ ) {
System.out.print(list[(p*10)+q] + " ");
}
System.out.println("");
}

System.out.println("Here are the last 10");
for (int r = 0;r<10;r++ ) {
System.out.print(list[initialCapacity - r - 1] + " ");
}
System.out.println("");
System.out.println("Now we're sorting");

// get time before we start
time1 = System.currentTimeMillis();

// sort it
quickSort(list, 0, initialCapacity -1);

// get end time
time2 = System.currentTimeMillis();
delta = time2 - time1;
System.out.println("Sorting " + initialCapacity + " numbers took " + delta + " milliseconds");

// same code you saw above.
System.out.println("Here are the first 100");
for (int p=0;p<10;p++ ) {
for (int q=0;q<10;q++ ) {
System.out.print(list[(p*10)+q] + " ");
}
System.out.println("");
}

}

private static void quickSort(int a[], int lo0, int hi0)
{
int lo = lo0;
int hi = hi0;
int mid;
int temp;

if ( hi0 > lo0)
{

/* Arbitrarily establishing partition element as the midpoint of
* the array.
*/
mid = a[ ( lo0 + hi0 ) / 2 ];

// loop through the array until indices cross
while( lo <= hi )
{
/* find the first element that is greater than or equal to
* the partition element starting from the left Index.
*/
while( ( lo < hi0 ) && ( a[lo] < mid ) ) {
++lo;
}

/* find an element that is smaller than or equal to
* the partition element starting from the right Index.
*/
while( ( hi > lo0 ) && ( a[hi] > mid ) ) {
--hi;
}

// if the indexes have not crossed, swap
if( lo <= hi )
{
temp = a[lo];
a[lo] = a[hi];
a[hi] = temp;
++lo;
--hi;
}
}

/* If the right index has not reached the left side of array
* must now sort the left partition.
*/
if( lo0 < hi )
quickSort( a, lo0, hi );

/* If the left index has not reached the right side of array
* must now sort the right partition.
*/
if( lo < hi0 )
quickSort( a, lo, hi0 );

}
}

}



here's the output without all the data being dumped:


Now we're sorting
Sorting 1000000 numbers took 610 milliseconds


If I could figure out how to do a non-recursive version of quick sort then I could speed it up even more, but I think 610 milliseconds is a pretty good time.

The optimization down to 610 ms. came from putting the swap code inside the quicksort method instead of having it as a separate method.

Arker
09-27-2002, 08:46 PM
I guess I'm going to revive yet another dead thread... Since I wrote some code to randomize the million integers (other thread), I had might as well re-sort it to eh?

It seems the quicksort and bubblesort are popular in this thread, since this particular competition is dead, I thought I'd throw all of my ints from the array into a huge linkedlist and mergesort 'em... Just to be different...

I'll chop all the random array code out since it's irrelevant in this thread and the code is long enough as it is.


#define SIZE 1000000
#define PRINTNUM 100
#define EMPTY -1

/* This is the linked list node definition */
typedef struct nodestr
{
int data;
struct nodestr *next;
}
node;

node *add (node * head, int a);
void freeAll (node * head);
node *mergeSort (node * here);
node *splitList (node * here);
node *fuse (node * left, node * right);

int intlist[SIZE], rand_intlist[SIZE];
node *llist = NULL;

----------<SNIP>------------

insert array building/randomizing code here...

----------<SNIP>------------

/* begin runtime timer #2 */
time_start2 = time (NULL);

for (i = 0; i < SIZE; i++)
llist = add (llist, rand_intlist[i]);

llist = mergeSort (llist);

for (i = 0; i < PRINTNUM; i++)
{
printf ("%d ", llist->data);
llist = llist->next;
}

freeAll (llist);

/* calculate and display sorting runtime in seconds */
printf ("\n\nRuntime for merge sort was %g seconds.\n",
difftime (time (NULL), time_start2));

return (0);
}


/* adds an element to the list */
node *
add (node * head, int data)
{
node *addMe;
addMe = malloc (sizeof (node));
if (addMe == NULL)
{
printf ("Error: malloc failed in add\n");
exit (1);
}
addMe->data = data;
addMe->next = head;
return (addMe);
}

/* recursively sorts a linked list using two helper methods */
node *
mergeSort (node * here)
{
node *other;

if (here->next == NULL)
return here;
other = splitList (here);
other = mergeSort (other);
here = mergeSort (here);
here = fuse (other, here);
return (here);
}

/* splits the linked list into two pieces to be sorted and merged */
node *
splitList (node * here)
{
node *other = here;
if (here->next == NULL)
return (here);
here = here->next->next;
while (here != NULL)
{
here = here->next;
if (here != NULL)
here = here->next;
other = other->next;
}
here = other;
other = other->next;
here->next = NULL;
return (other);
}

/* fuses two sorted linked lists together, forming a third, sorted, list */
node *
fuse (node * left, node * right)
{
node *sortedHead, *sortedTail;
if (left->data <= right->data)
{
sortedHead = left;
sortedTail = left;
left = left->next;
sortedTail->next = NULL;
}
else
{
sortedHead = right;
sortedTail = right;
right = right->next;
sortedTail->next = NULL;
}
while (left != NULL && right != NULL)
{
if (left->data <= right->data)
{
sortedTail->next = left;
sortedTail = left;
left = left->next;
sortedTail->next = NULL;
}
else
{
sortedTail->next = right;
sortedTail = right;
right = right->next;
sortedTail->next = NULL;
}
}
if (left != NULL)
sortedTail->next = left;
if (right != NULL)
sortedTail->next = right;
return (sortedHead);
}


/* deletes all of the list nodes and frees the allocated memory for each */
void
freeAll (node * head)
{
node *p = head;
while (p != NULL)
{
head = head->next;
free (p);
p = head;
}
}


Here is the output:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 71 72 73
74 75 76 77 78 79 80 81 82 83 84 85 86 87
88 89 90 91 92 93 94 95 96 97 98 99

Runtime for merge sort was 5 seconds.


Sorry about the length... I just wanted to share.. This is fun!@ :)

Cheers,
Arker.