PDA

View Full Version : Drims -- I made a programming language!


phubuh
11-04-2002, 11:27 AM
About two weeks ago, I was bored, and decided to write an interpreter. I had never done that before, so my code got quite ugly, but after a few rewrites it looks pretty good. Now, I'm sharing with you what I hacked out that night.


Welcome to the interactive Drims environment!
>> "Hello!".
String("Hello!")
>> 5000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.
Integer(5000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
>> [5000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 squared].
Integer(25000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
>> [squared 500].
Integer(250000)
>> [56 plus 40].
Integer(96)
>> [[10 plus 2] isGreaterThan [9 plus 3]].
Boolean(No)
>> $bool = [[10 plus 2] equals [9 plus 3]].
Boolean(Yes)
>> $bool.
Boolean(Yes)
>> [$bool negated].
Boolean(No)
>> [$system print "hello!"].
"hello!"
String("hello!")
>> $aCodeBlock = {[$system print "hello!"].}.
org.phubuh.drims2.parser.Codeblock@5483cd
>> [$bool test: ifTrue: $aCodeBlock ifFalse: {[$system print ":-("].}].
"hello!"
String("hello!")
>> $bool = [$bool negated].
Boolean(No)
>> [$bool test: ifTrue: $aCodeBlock ifFalse: {[$system print ":-("].}].
":-("
String(":-(")
>> [plus 10 59].
Integer(69)
>> [[$system readLine] toUpperCase].
hello! lol! lol! lol!
String("HELLO! LOL! LOL! LOL!")
>> [lengthOf "hello"].
Integer(5)
>> [9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 squared].
Integer(99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9980000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001)
>>

First of all: Everything is an object. A variable is an object, a codeblock is an object, a string is an object, an integer is an object, etc.

Second of all: Anything that does anything at all is a method. There are no special magic keywords. For example, the 'if' keyword that exists in basically all modern languages is implemented in Drims as a keyword message (I'll get to those later) on the Boolean object. So is 'while'.

In the last paragraph, I said 'methods' so you'd understand what I talked about -- but in Drims, methods are called messages. There are three types of messages:

Unary messages. These include Integer's 'squared' and String's 'toUpperCase'. They are called unary, because they know only one thing: which object to send themselves to.

Binary messages. Integer's 'plus', String's 'concatenatedWith', etc. These know which object to send themselves to, and they also have an object to give the receiver. (argument)

Keyword messages. So far, only Boolean's 'test'. These are the only messages that can give multiple arguments. What separates these from the run-of-the-mill f(x, y, z) is that all arguments are required to have names; f(firstArgument: x secondArgument:y thirdArgument:z). The keywords can be arranged however you want. They don't even have an internal order.


{ ... } blocks are what I like to call lambda subroutines. For those not into functional programming (lambda functions are usually in those languages, but I've heard that Python has them too. Oh well.), lambda functions are anonymous functions that can be passed as arguments and assigned to variables, and basically anything you can do with regular values. My codeblocks are like those, only you can't pass arguments to them.

Codeblocks are useful for stuff like Boolean's test. test takes two keyword arguments: ifTrue and ifFalse. (Actually, you can just pass ifFalse or ifTrue, but that doesn't matter.) Those arguments are supposed to be codeblocks. When called upon, the test method checks its own value, and sends the respective argument the 'evaluate' message, which evaluates the codeblock.

I haven't implemented the syntax for creating your own classes or methods yet, but I'll do that whenever I have the time. Actually, I haven't figured out how stuff like inheritance and polymorphism will work yes, but I guess I'll do that too.

So, any comments are gladly accepted. Preferrably constructive criticism though. :)

phubuh
11-04-2002, 11:29 AM
Oh, I forgot to mention one thing. Integers are stored as Strings in memory, so the only limit for integer arithmetic is your memory!

jemfinch
11-04-2002, 01:02 PM
Originally posted by phubuh
First of all: Everything is an object. A variable is an object, a codeblock is an object, a string is an object, an integer is an object, etc.

Second of all: Anything that does anything at all is a method. There are no special magic keywords. For example, the 'if' keyword that exists in basically all modern languages is implemented in Drims as a keyword message (I'll get to those later) on the Boolean object. So is 'while'.

In the last paragraph, I said 'methods' so you'd understand what I talked about -- but in Drims, methods are called messages. There are three types of messages:

Unary messages. These include Integer's 'squared' and String's 'toUpperCase'. They are called unary, because they know only one thing: which object to send themselves to.

Binary messages. Integer's 'plus', String's 'concatenatedWith', etc. These know which object to send themselves to, and they also have an object to give the receiver. (argument)

Keyword messages. So far, only Boolean's 'test'. These are the only messages that can give multiple arguments. What separates these from the run-of-the-mill f(x, y, z) is that all arguments are required to have names; f(firstArgument: x secondArgument:y thirdArgument:z). The keywords can be arranged however you want. They don't even have an internal order.



So basically you've reimplemented the Smalltalk object model. It seems kinda strange that you wouldn't mention that as your source of inspiration.


{ ... } blocks are what I like to call lambda subroutines.


In Smalltalk and Self, two languages that use the object model you've implemented, they're just called "blocks."


For those not into functional programming (lambda functions are usually in those languages, but I've heard that Python has them too. Oh well.),


Perl has them too (sub { }); Ruby has them too (in the form of the blocks you have there)


Codeblocks are useful for stuff like Boolean's test. test takes two keyword arguments: ifTrue and ifFalse.


Yep, just like in Smalltalk and Self. Same names and capitalization, too :)


I haven't implemented the syntax for creating your own classes or methods yet, but I'll do that whenever I have the time. Actually, I haven't figured out how stuff like inheritance and polymorphism will work yes, but I guess I'll do that too.


I'm planning to implement something like Smalltalk or Self (more likely Self) in SML at some point in the future, and have been churning over the representation of objects in my head. Since SML is strictly-typed, I have jump through a few more hoops to do it, but basically, objects are sets closures (their internal data has to be hidden in order to allow objects to have various types of internal data); there is a set of closures that can convert the object into basic SML types (like strings, integers, words, etc.); that's the "internal" (as in, available to the internal language, SML) set of closures. There's also a closure that represents the "external" object (i.e., the language implemented, SelfInSML or whatever) which takes "messages" (which, as in Smalltalk and Self, can be unary, binary, or keyword) and returns other objects. Then (and this is where inheritance comes in) there's a list of other objects which is the inheritance hierarchy. Special processing would be necessary to implement multiple inheritance well, but simple single inheritance would be easy to implement.

Obviously, some more thought will have to go into that, but that's the basic idea of things...


So, any comments are gladly accepted. Preferrably constructive criticism though. :)

I just think you should've mentioned where you got your ideas, that's all.

What language are you implementing Drims in, btw?

Jeremy

sans-hubris
11-04-2002, 01:46 PM
You should add support for lexicals! And closures too!

phubuh
11-04-2002, 06:29 PM
Wow, jemfinch, that was rather harsh.

I'm being completely honest when I say this:
I have never programmed Smalltalk nor Self.

You see, my brother and I was eating kebab, and came to the discussion of constructing a programming language. He brought up the object model, and lots of other things. Occasionally, he would say something like "just like in Smalltalk" -- but having never programmed Smalltalk, I didn't care much.

I wrote this interpreter (in Java, by the way) for the sole reason of teaching myself to do it. To improve as a programmer. You see, I'm fourteen years old, and I don't yet have access to any programming classes or anything like that, so I just come up with stuff to write and write them. If I had programmed Smalltalk, I probably still would have used 'their' object model and message passing system -- but of course, I'd pay them homage when talking about the language I implemented.

jemfinch
11-04-2002, 07:00 PM
Originally posted by phubuh
[B]Wow, jemfinch, that was rather harsh.

I'm being completely honest when I say this:
I have never programmed Smalltalk nor Self.

You see, my brother and I was eating kebab, and came to the discussion of constructing a programming language. He brought up the object model, and lots of other things. Occasionally, he would say something like "just like in Smalltalk" -- but having never programmed Smalltalk, I didn't care much.


I wasn't being harsh, I was being realistic. You'll find out very quickly when you get to college that the only difference between plagiarism (passing off someone's ideas as one's own) and research is attribution. Your brother attributed when he occasionally mentioned Smalltalk in describing the object model -- you didn't, and that's where I think you went wrong.

You see, when you post something that's so conspicuously based on Smalltalk/Self, but don't actually mention either, it seems like you're trying to pass someone else's ideas off as your own. My immediate response was, "Does he think we're not smart enough to notice the similarities between this Drims and Smalltalk?" I figured you'd have a decent reason for not mentioning Smalltalk/Self (and I have no reason to doubt the explanation you just gave) but the correction, IMO, needed to be made nonetheless, since it can only get worse as you get older. People are fired or expelled or blackballed for mistakes of this kind at higher levels.


If I had programmed Smalltalk, I probably still would have used 'their' object model and message passing system -- but of course, I'd pay them homage when talking about the language I implemented.

It's a good object system, and it's consistent enough to make writing an interpreter easier than for many other languages. I'm curious how you solved the problem of accessing data from the Java side, also (are you going to release the code?)

Just remember to give credit where credit is due next time :)

Jeremy

kmj
11-04-2002, 07:11 PM
phubuh: can we see the code? :)

madMoney
11-21-2002, 10:10 PM
phubuh, this is really interesting, and i keep hoping you'll post the source for us :p

are you still working on drims or what? do you plan on letting us see how you made it work?

sicarius
11-22-2002, 09:57 PM
I know the question wasn't directed to me, but I am not sure what you meant when you(JemFinch) asked about how he solved the problem of accessing data from the java side. Could you clairify that some? I can't answer it regardless, but I'm trying to figure out how there would even be a problem.

jemfinch
11-25-2002, 09:45 PM
Originally posted by sicarius
I know the question wasn't directed to me, but I am not sure what you meant when you(JemFinch) asked about how he solved the problem of accessing data from the java side. Could you clairify that some? I can't answer it regardless, but I'm trying to figure out how there would even be a problem.

I started answering this awhile back, but I went to use the restroom and the computer (it wasn't mine) crashed in the meantime, so I never got back around to answering this. I'll be implementing some of the stuff I'm talking about sooner or later, in a language I'll probably call "SMiLe".

Anyway, here goes. First, to understand most of what I'm saying, you'll probably want to understand the Hindley-Milner (HM) typesystem, used in ML (O'Caml, SML, etc.), Haskell, and several other languages. If you don't know the HM typesystem, then mostly what you need to know is that a function can't return two different types. If you want a function to return either a string or an int, you need to define a new datatype like this:


datatype string_or_int = String of string | Int of int


So then, when you call the function, you can be assured of proper typing -- you won't be able to use the string or int returned until you determine whether it's a String or an Int.

So back to the representation of objects in a strictly typed language like SML. Basically, objects in class-based object systems (probably about every OO language you've ever used) are a combination of some state (the internal data, the private attributes, etc.) and a class (the operations on that data, the set of methods, basically.) In a language like C, you might implement an object something like this:

struct object = {
void *state;
methods *methods
};

Where type "methods" is something like a hash table mapping strings to pointers to functions taking a void* (the state) and a void* (an object argument; for simplicity's sake, we'll assume there's only one argument allowed) and returning a void* (the object return value). Then, when you wanted to call a method on an object, you'd do something like (pardon my C if it's bad, I haven't used C in a long while):

void *call_method(void *object, char *method_name, void *argument) {
if has_method(object->methods, method_name) {
return (get_method(object->methods, method_name))(object->state, argument);
} else {
return NULL;
}
}

Or something like that. Anyway, the point is that since a void* can be a pointer to anything, it's easy to implement a class in C, because the state (a void*) can be whatever you want it to be. The state for a string might be "struct { int length; char* value }" whereas the state for a file might be "int fd" (obviously, the state would actually be pointers to both of those, but you get the idea).

That works fine for C. It doesn't work for strictly typed languages like SML, where we scoff at the idea of having void*s running rampant throughout our program. Don't get me wrong -- you can parameterise a type over another type in SML; you can define a type like this:

datatype 'a list = CONS of 'a * 'a list | NIL

and be able to make lists of any type. But you can't make heterogenous lists, lists which have more than one type in them. But don't think that limitation is a bad thing -- it catches a lot of errors at compile time, and makes the language a wonderful language to program in. And, of course, you can define a new type (as previously) that can be either String or Int, and thus have both strings and ints in your list.

So you might think that you could implement objects similarly to C in SML. You might think you can do this:

datatype 'a object = { state: 'a, methods: 'a methods }

And be happy. You could do that, but you wouldn't be happy. You'd basically be restricting your programming language to have state of only one type -- you couldn't, say, make an abstract syntax tree that had objects with different states just like you couldn't have a list of objects with different states.

So the hard part in SML is getting that 'a out of there and still making it easy to write extension types in SML. The secret is hiding the 'a type parameter under the "sheep's clothing" of closures. We can use any state representation we want to use, but instead of having an object be a pair of the state and the methods that operate on the state, we'll instead just have an object defined by its methods, except now, we'll have different methods for every instance. We'll make closures that hide the state so only the methods can see it.

Let's say I want to define a bool class in SML this way. Here's what it might look like:

(* An object is its methods. *)
type object = methods
(* A method is a function taking an object and returning an object *)
and type method = (object -> object)
(* An object's methods is a simple list of (strings, method) pairs. *)
and type methods = string * method list

fun makeSMiLeBool (b : bool) =
[ ("negated", (fn _ => makeSMiLeBool (not b))),
("ifTrue", (fn block => if b then (sendMsg (block, "value")) else None)]

So the actual underlying state representation (the boolean value, either true or false) is hidden by the closure. So something like that could be well-typed in SML. (Obviously, in an actual language, rather than use an association list of string, method pairs, I'd use something faster like a hash table.)

But there's another problem. It's all swell when we're writing objects that don't really interact with other objects, but what happens when we want to implement numbers in SMiLe? It might look like this:

fun makeSMiLeNumber (i : int) =
[ ("+", (fn otherNumber => (* What goes here? *) ...) ]


What goes there? How do we get the underlying integer value behind the object otherNumber? We need that if we're going to be able to add numbers or do anything significant with the extension types we write in SML.

So we need a new idea. Here's the best way I've thought of so far (as ugly as it is...):


(* This basically replaces the void* in C, but it's more typesafe *)
datatype sml_data = Int of int
| String of string
| Word of word
| File of file_type (* I forget the name of the type at the moment *)
| Array of sml_data array
| List of sml_data list
| Ref of sml_data ref

type object = methods
and type state = sml_data
and type methods = string * method list
and type method = (state * object -> object)


It's unfortunate, and ugly, and extending it to include a different state type means extending the sml_data type, but it might work. I'm still trying to think of a prettier way to do it, though.

Anyway, I'm curious how it's done in other strictly typed languages. In Java, I would imagine it involves a lot of casts to Object, but anyway...

Jeremy

sicarius
11-26-2002, 10:30 PM
Im sure I uderstood what you are saying Jem, but if he is writing the interpreter can't he just define how data is stored for his langaunge any way he likes? From what you posted I got the impression that you were expecting him to translate his language into some target language. Like from his Drims to SML. In which case I can see the problem of data access. On the other hand if he simply defines a gramar for the language, parses it and then stores whatever data he needs however he likes, the rest should be simple enough.
maybe im missing something.

jemfinch
11-26-2002, 11:54 PM
Originally posted by sicarius
maybe im missing something.

Yeah, you are, but that's ok :)


Im sure I uderstood what you are saying Jem, but if he is writing the interpreter can't he just define how data is stored for his langaunge any way he likes?


Of course he can. But objects have to be implemented somehow, and that somehow has to hold the state of the object in some form. The problem is that in a strictly typed language like SML, all the objects have to either (a) hide their state so nothing else sees it, or (b) use the same datatype to hold their state. Choice (b) isn't very friendly, because it's either very dynamic (and that dynamicism leads to more bugs), or because it's not extensible (some objects need some brand-new never-before-conceived state representation, and that's just not possible in this scheme without modifying the interpreter code itself.) Choice (a) is conceivable, but not possible: at some point, you'll need to get at the state of an object, and at that point you've devolved into choice (b).


From what you posted I got the impression that you were expecting him to translate his language into some target language.


No, not at all. But a language has to be implemented, and that implementation has to choose some way to represent objects or values in that language.


On the other hand if he simply defines a gramar for the language, parses it and then stores whatever data he needs however he likes, the rest should be simple enough.


The "however he likes" part is the part that isn't possible :)

There is another way around the implementation I presented before, but I can't say much about it now, I've got some testing and some more conceptualization to do before it's ready :)

Jeremy