• Welcome to Valhalla Legends Archive.
 

Lexical parsers and generators

Started by Telos, March 04, 2004, 10:14 AM

Previous topic - Next topic

Telos

A project that Im currently working on takes input in the form of a string and needs to parse it into respective tokens of a language.  While looking around at Flex and Bison (Lex and Yacc more accurately) as a method of making my program do what I want and decided that I want to write my own simply for the experience.  If anyone has suggestions or references on how to create something like this Id appreciate it

iago

I wish I'd kept my notes from Compiler class, but I didn't.  They'll be back online in september, probably, at http://www.cs.umanitoba.ca/~cs329.  That'll probably be too late, though.
This'll make an interesting test for broken AV:
QuoteX5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*


Adron

So, what did you decide?
To write your own flex and bison?
To use flex and bison to write your own application?
To write your own application without the help of flex or bison?

What will you need to do, read tokens or understand statements made up of tokens?

Do you understand the responsibilities of flex and bison respectively?





Telos

Id like to write my application without flex and bison but Im not familiar with the theory behind doing it.  My program will need to take a string of input and parse it into respective tokens as well as evaluating any functions in the line.  The language is clearly defined tho its just a matter of turning the string into an appropriate tree of tokens.

Adron

I believe flex and bison should be your tools of choice. But apart from that, you could just reimplement them in your code if you like. They're generic implementations of well known algorithms, and if you would like to rewrite them, go ahead.

For the scanner, you may want to have a large set of regular expressions, checking the stream against them all in parallell. If you get to the end of a regular expression, consume as many characters as it accepts, and pass the token represented by the regular expression on to the parser.

For the parser, there are several options. If your language is interpretable and you can make a decision what to do with a token based on it and a lookahead of the next token, you might implement it as a lot of functions, each function representing a node in the parse tree.

The first function will read a token, and based on that and a lookahead decide which function to call next. This works for some languages, but not most. You could also write a bison-style parser that reads tokens into strings and then simplifies them according to rules. But I don't see what the advantage would be to writing your own suboptimal solution when you could use the one that exists.

Telos

Mainly for the learning experience involved in creating something like that but if its really going to be that much better to use flex and bison then I will do that.

Adron

I just believe that learning to use flex and bison, and understanding the theory behind their operation is a much better way of spending your time than trying to implement their algorithms.

You will need to understand the theory behind them to be able to use them effectively, including debugging your application when it doesn't behave the way you expect. So you will still learn how to parse.

And it's not that they are all that hard to implement. I've done it for small cases on paper just to understand, but it's complex enough that I wouldn't want to actually implement the code for it.

Telos

All right I will be doing that instead of attempting to write the parser myself.  The "language" Im looking to implement consists of maybe 6 operators a set of user-defined functions and a lot of parentheses so I thought it might be easy to do without an extra tool

Adron

The biggest advantage to using flex and bison is that you won't be locking yourself into a solution. You can easily extend the language later without getting bitten by design decisions you made much earlier to simplify coding the parser.

Telos

In this case the language is already defined and will not expand any time soon which is part of why it was attractive enough to make me want to write my own parser but I figure Ill learn from flex and bison first.

Adron

Show the definition of the language?

Telos

I dont know of any online specification its first order logic

Operators:
OR (As in P OR Q)
AND (As in P AND Q)
XOR (As in P XOR Q)
NOT (As in NOT P or NOT Q)
IMPLIES (As in P IMPLIES Q)
IFF (As in P IFF Q) [If and only if]

Quantifiers:
FORALL ( FOR ALL P Condition(x) IMPLIES Condition2(x) )
THEREIS ( THEREIS P Condition(x) IMPLIES Condition2(x) )

Unification:
UNIFY ( Condition(x, y), Condition(y, z) ) = { x/z }

Adron

It doesn't seem that simple with all the syntax around it. It would require some thinking to determine whether it can be easily solved at all.