# HG changeset patch # User windel # Date 1356363322 -3600 # Node ID fe145e42259dd621ae81f66cfd6c78cc6330c63c # Parent 6efbeb90377702b1db95396d86ae2fb3030b048c Fixes after movage diff -r 6efbeb903777 -r fe145e42259d python/codeeditor.py --- a/python/codeeditor.py Mon Dec 24 15:03:30 2012 +0100 +++ b/python/codeeditor.py Mon Dec 24 16:35:22 2012 +0100 @@ -1,6 +1,6 @@ from PyQt4.QtCore import * from PyQt4.QtGui import * -import compiler.lexer +#import compiler.lexer import os.path class MySyntaxHighlighter(QSyntaxHighlighter): @@ -11,9 +11,9 @@ fmt = QTextCharFormat() fmt.setForeground(Qt.darkBlue) fmt.setFontWeight(QFont.Bold) - for kw in compiler.lexer.keywords: - pattern = '\\b'+kw+'\\b' - self.rules.append( (pattern, fmt) ) + #for kw in compiler.lexer.keywords: + # pattern = '\\b'+kw+'\\b' + # self.rules.append( (pattern, fmt) ) # Comments: fmt = QTextCharFormat() diff -r 6efbeb903777 -r fe145e42259d python/ide.py --- a/python/ide.py Mon Dec 24 15:03:30 2012 +0100 +++ b/python/ide.py Mon Dec 24 16:35:22 2012 +0100 @@ -10,8 +10,9 @@ # Compiler imports: from project import Project -from ppci import Compiler -from widgets import CodeEdit, AstViewer +from ppci import KsCompiler +from astviewer import AstViewer +from codeeditor import CodeEdit lcfospng = base64.decodestring(b'iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A\n/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9sJEhMKBk7B678AAAA/SURBVFjD\n7dbBCQAgDATBi9h/y7EFA4Kf2QLCwH1S6XQu6sqoujublc8BAAAAAAAAAAB8B+zXT6YJAAAAAKYd\nWSgFQNUyijIAAAAASUVORK5CYII=\n') @@ -116,7 +117,7 @@ def __init__(self, parent=None): super(Ide, self).__init__(parent) self.setWindowTitle('LCFOS IDE') - self.compiler = Compiler() + self.compiler = KsCompiler() icon = QPixmap() icon.loadFromData(lcfospng) self.setWindowIcon(QIcon(icon)) diff -r 6efbeb903777 -r fe145e42259d python/ppci/compilers/kscompiler.py --- a/python/ppci/compilers/kscompiler.py Mon Dec 24 15:03:30 2012 +0100 +++ b/python/ppci/compilers/kscompiler.py Mon Dec 24 16:35:22 2012 +0100 @@ -1,10 +1,11 @@ import hashlib # Import compiler components: -from ..frontends.ks import Parser -from .codegenerator import CodeGenerator -from .nodes import ExportedSymbol -from .errors import CompilerException -from .. import version +from ..frontends.ks import KsParser +#from .codegenerator import CodeGenerator +#from .nodes import ExportedSymbol +#from .errors import CompilerException +#from .. import version +version='0.0.1' class KsCompiler: def __repr__(self): diff -r 6efbeb903777 -r fe145e42259d python/ppci/frontends/ks/__init__.py --- a/python/ppci/frontends/ks/__init__.py Mon Dec 24 15:03:30 2012 +0100 +++ b/python/ppci/frontends/ks/__init__.py Mon Dec 24 16:35:22 2012 +0100 @@ -1,3 +1,10 @@ -from parser import Parser +""" + Frontend for the K# language. + + This module can parse K# code and create LLVM intermediate code. +""" + +from .parser import KsParser + diff -r 6efbeb903777 -r fe145e42259d python/ppci/frontends/ks/nodes.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/python/ppci/frontends/ks/nodes.py Mon Dec 24 16:35:22 2012 +0100 @@ -0,0 +1,310 @@ +""" +Parse tree elements +""" +class Node: + location = None + def getChildren(self): + children = [] + members = dir(self) + for member in members: + member = getattr(self, member) + if isinstance(member, Node): + children.append(member) + elif type(member) is list: + for mi in member: + if isinstance(mi, Node): + children.append(mi) + return children + +class Symbol(Node): + pass + +class Id(Node): + def __init__(self, name): + self.name = name + def __repr__(self): + return 'ID {0}'.format(self.name) + +# Selectors: +class Field(Node): + def __init__(self, fieldname): + self.fieldname = fieldname + def __repr__(self): + return 'FLD {0}'.format(self.fieldname) + +class Index(Node): + def __init__(self, index, typ): + self.index = index + self.typ = typ + def __repr__(self): + return 'IDX {0}'.format(self.index) + +class Deref(Node): + pass + +class Designator(Node): + def __init__(self, obj, selectors, typ): + self.obj = obj + self.selectors = selectors + self.typ = typ + def __repr__(self): + return 'DESIGNATOR {0}, selectors {1}, type {2}'.format(self.obj, self.selectors, self.typ) + +""" +Type classes +""" +def isType(a, b): + """ Compare types a and b and check if they are equal """ + if type(a) is type(b): + if type(a) is BaseType: + return (a.name == b.name) and (a.size == b.size) + elif type(a) is ArrayType: + return (a.dimension == b.dimension) and isType(a.elementType, b.elementType) + elif type(a) is ProcedureType: + if len(a.parameters) != len(b.parameters): + print('Number of parameters does not match') + return False + for aparam, bparam in zip(a.parameters, b.parameters): + if not isType(aparam.typ, bparam.typ): + print('Parameter {0} does not match parameter {1}'.format(aparam, bparam)) + return False + if a.result is None: + # TODO: how to handle a None return type?? + pass + if not isType(a.result, b.result): + print('Procedure return value mismatch {0} != {1}'.format(a.result, b.result)) + return False + return True + else: + print(a) + print(b) + Error('Not implemented {0}'.format(a)) + else: + return False + +class Type: + def isType(self, b): + return isType(self, b) + +class BaseType(Type): + def __init__(self, name, size): + self.name = name + self.size = size + def __repr__(self): + return '[TYPE {0}]'.format(self.name) + +class NilType(Node): + # TODO: how to handle nil values?? + def __repr__(self): + return 'NILTYPE' + +class ArrayType(Type): + def __init__(self, dimension, elementType): + self.dimension = dimension + self.elementType = elementType + self.size = elementType.size * dimension + def __repr__(self): + return '[ARRAY {0} of {1}]'.format(self.dimension, self.elementType) + +class RecordType(Type): + def __init__(self, fields): + self.fields = fields + self.size = 0 + for fieldname in self.fields: + self.size += self.fields[fieldname].size + def __repr__(self): + return '[RECORD {0}]'.format(self.fields) + +class PointerType(Type): + def __init__(self, pointedType): + self.pointedType = pointedType + self.size = 8 + def __repr__(self): + return '[POINTER {0}]'.format(self.pointedType) + +class ProcedureType(Type): + def __init__(self, parameters, returntype): + self.parameters = parameters + self.returntype = returntype + def __repr__(self): + return '[PROCTYPE {0} RET {1}]'.format(self.parameters, self.returntype) + +class DefinedType(Type): + def __init__(self, name, typ): + self.name = name + self.typ = typ + def __repr__(self): + return 'Named type {0} of type {1}'.format(self.name, self.typ) + +# Classes for constants like numbers and strings: +class StringConstant(Symbol): + def __init__(self, txt): + self.txt = txt + self.typ = 'string' + def __repr__(self): + return "STRING '{0}'".format(self.txt) + +# Variables, parameters, local variables, constants: +class Constant(Symbol): + def __init__(self, value, typ, name=None, public=False): + self.name = name + self.value = value + self.typ = typ + self.public = public + def __repr__(self): + return 'CONSTANT {0} = {1}'.format(self.name, self.value) + +class Variable(Symbol): + def __init__(self, name, typ, public): + self.name = name + self.typ = typ + self.public = public + self.isLocal = False + self.isReadOnly = False + self.isParameter = False + def __repr__(self): + txt = '[public] ' if self.public else '' + return '{2}VAR {0} : {1}'.format(self.name, self.typ, txt) + +class Parameter(Node): + """ A parameter has a passing method, name and typ """ + def __init__(self, kind, name, typ): + self.kind = kind + self.name = name + self.typ = typ + def __repr__(self): + return 'PARAM {0} {1} {2}'.format(self.kind, self.name, self.typ) + +# Operations: +class Unop(Node): + def __init__(self, a, op, typ): + self.a = a + self.op = op # Operation: '+', '-', '*', '/', 'mod' + self.typ = typ + self.place = None + def __repr__(self): + return 'UNOP {0}'.format(self.op) + +class Binop(Node): + def __init__(self, a, op, b, typ): + self.a = a + self.b = b + self.op = op # Operation: '+', '-', '*', '/', 'mod' + self.typ = typ # Resulting type :) + self.place = None + def __repr__(self): + return 'BINOP {0} {1}'.format(self.op, self.typ) + +class Relop(Node): + def __init__(self, a, relop, b, typ): + self.a = a + self.relop = relop + self.b = b + self.typ = typ + def __repr__(self): + return 'RELOP {0}'.format(self.relop) + +# Modules +class Module(Node): + def __init__(self, name): + self.name = name + def __repr__(self): + return 'MODULE {0}'.format(self.name) + +# Imports and Exports: +class ImportedSymbol(Node): + def __init__(self, modname, name): + self.modname = modname + self.name = name + def __repr__(self): + return 'IMPORTED SYMBOL {0}'.format(self.name) + +class ExportedSymbol(Node): + def __init__(self, name, typ): + self.name = name + self.typ = typ + def __repr__(self): + return 'EXPORTED PROCEDURE {0} : {1}'.format(self.name, self.typ) + +# Procedure types +class BuiltinProcedure(Node): + def __init__(self, name, typ): + self.name = name + self.typ = typ + def __repr__(self): + return 'BUILTIN PROCEDURE {0} : {1}'.format(self.name, self.typ) + +class Procedure(Symbol): + """ Actual implementation of a function """ + def __init__(self, name, typ, block, symtable, retexpr): + self.name = name + self.block = block + self.symtable = symtable + self.typ = typ + self.retexpr = retexpr + def __repr__(self): + return 'PROCEDURE {0} {1}'.format(self.name, self.typ) + +# Statements +class StatementSequence(Node): + def __init__(self, statements): + self.statements = statements + def __repr__(self): + return 'STATEMENTSEQUENCE' + +class EmptyStatement(Node): + def __repr__(self): + return 'EMPTY STATEMENT' + +class Assignment(Node): + def __init__(self, lval, rval): + self.lval = lval + self.rval = rval + def __repr__(self): + return 'ASSIGNMENT' + +class ProcedureCall(Node): + def __init__(self, proc, args): + self.proc = proc + self.args = args + self.typ = proc.typ.returntype + def __repr__(self): + return 'CALL {0} '.format(self.proc) + +class IfStatement(Node): + def __init__(self, condition, truestatement, falsestatement=None): + self.condition = condition + self.truestatement = truestatement + self.falsestatement = falsestatement + def __repr__(self): + return 'IF-statement' + +class CaseStatement(Node): + def __init__(self, condition): + self.condition = condition + def __repr__(self): + return 'CASE-statement' + +class WhileStatement(Node): + def __init__(self, condition, statements): + self.condition = condition + self.dostatements = statements + def __repr__(self): + return 'WHILE-statement' + +class ForStatement(Node): + def __init__(self, variable, begin, end, increment, statements): + self.variable = variable + self.begin = begin + self.end = end + self.increment = increment + self.statements = statements + def __repr__(self): + return 'FOR-statement' + +class AsmCode(Node): + def __init__(self, asmcode): + self.asmcode = asmcode + def __repr__(self): + return 'ASM CODE' + diff -r 6efbeb903777 -r fe145e42259d python/ppci/frontends/ks/parser.py --- a/python/ppci/frontends/ks/parser.py Mon Dec 24 15:03:30 2012 +0100 +++ b/python/ppci/frontends/ks/parser.py Mon Dec 24 16:35:22 2012 +0100 @@ -1,49 +1,651 @@ """ - This module define a grammar for the 'K#' language. + This module parses source code into an abstract syntax tree (AST) """ +from .symboltable import SymbolTable from .nodes import * -from .errors import CompilerException, Error -from .modules import loadModule -from .display import printNode -from .builtin import * -from . import assembler +from ...core.errors import CompilerException, Error +#from .modules import loadModule +#from .display import printNode +#from .builtin import * +#from . import assembler + +class KsParser: + def __init__(self, tokens): + """ provide the parser with the tokens iterator from the lexer. """ + self.tokens = tokens + self.NextToken() + self.errorlist = [] + + def Error(self, msg): + raise CompilerException(msg, self.token.row, self.token.col) + + # Lexer helpers: + def Consume(self, typ=''): + if self.token.typ == typ or typ == '': + v = self.token.val + self.NextToken() + return v + else: + self.Error('Excected: "{0}", got "{1}"'.format(typ, self.token.val)) + + def hasConsumed(self, typ): + if self.token.typ == typ: + self.Consume(typ) + return True + return False + + def NextToken(self): + self.token = self.tokens.__next__() + # TODO: store filename in location? + self.location = (self.token.row, self.token.col) + + # Helpers to find location of the error in the code: + def setLocation(self, obj, location): + obj.location = location + return obj + def getLocation(self): + return self.location + + """ + Recursive descent parser functions: + A set of mutual recursive functions. + Starting symbol is the Module. + """ + def parseModule(self): + self.imports = [] + loc = self.getLocation() + self.Consume('module') + modname = self.Consume('ID') + self.Consume(';') + mod = Module(modname) + + # Construct a symbol table for this program + mod.symtable = SymbolTable() + # Add built in types and functions: + for x in [real, integer, boolean, char, chr_func]: + mod.symtable.addSymbol(x) + + self.cst = mod.symtable + self.parseImportList() + + self.parseDeclarationSequence() + # Procedures only allowed in this scope + self.parseProcedureDeclarations() + + if self.hasConsumed('begin'): + mod.initcode = self.parseStatementSequence() + else: + mod.initcode = EmptyStatement() + + self.Consume('end') + endname = self.Consume('ID') + if endname != modname: + self.Error('end denoter must be module name') + self.Consume('.') + + mod.imports = self.imports + return self.setLocation(mod, loc) + + # Import part + def parseImportList(self): + if self.hasConsumed('import'): + self.parseImport() + while self.hasConsumed(','): + self.parseImport() + self.Consume(';') + + def parseImport(self): + loc = self.getLocation() + modname = self.Consume('ID') + mod = loadModule(modname) + self.setLocation(mod, loc) + self.cst.addSymbol(mod) + + # Helper to parse an identifier defenitions + def parseIdentDef(self): + loc = self.getLocation() + name = self.Consume('ID') + ispublic = self.hasConsumed('*') + # Make a node of this thing: + i = Id(name) + i.ispublic = ispublic + return self.setLocation(i, loc) + + def parseIdentList(self): + ids = [ self.parseIdentDef() ] + while self.hasConsumed(','): + ids.append( self.parseIdentDef() ) + return ids + + def parseQualIdent(self): + """ Parse a qualified identifier """ + name = self.Consume('ID') + if self.cst.has(Module, name): + modname = name + mod = self.cst.get(Module, modname) + self.Consume('.') + name = self.Consume('ID') + # Try to find existing imported symbol: + for imp in self.imports: + if imp.modname == modname and imp.name == name: + return imp + # Try to find the symbol in the modules exports: + for sym in mod.exports: + if sym.name == name: + impsym = ImportedSymbol(modname, name) + impsym.typ = sym.typ + impsym.signature = mod.signature + self.imports.append(impsym) + return impsym + self.Error("Cannot find symbol {0}".format(name)) + else: + return self.cst.getSymbol(name) -class Grammar: - # TODO: implement some base class? - pass + # Helper to parse a designator + def parseDesignator(self): + """ A designator designates an object. + The base location in memory is denoted by the qualified identifier + The actual address depends on the selector. + """ + loc = self.getLocation() + obj = self.parseQualIdent() + typ = obj.typ + selectors = [] + while self.token.typ in ['.', '[', '^']: + if self.hasConsumed('.'): + field = self.Consume('ID') + if typ is PointerType: + selectors.append(Deref()) + typ = typ.pointedType + if not type(typ) is RecordType: + self.Error("field reference, type not record but {0}".format(typ)) + typ = typ.fields[field] + selectors.append(Field(field)) + elif self.hasConsumed('['): + indexes = self.parseExpressionList() + self.Consume(']') + for idx in indexes: + if not type(typ) is ArrayType: + self.Error('Cannot index non array type') + if not isType(idx.typ, integer): + self.Error('Only integer expressions can be used as an index') + selectors.append(Index(idx, typ)) + typ = typ.elementType + elif self.hasConsumed('^'): + selectors.append(Deref()) + typ = typ.pointedType + return self.setLocation(Designator(obj, selectors, typ), loc) + + # Declaration sequence + def parseDeclarationSequence(self): + """ 1. constants, 2. types, 3. variables """ + self.parseConstantDeclarations() + self.parseTypeDeclarations() + self.parseVariableDeclarations() + + # Constants + def evalExpression(self, expr): + if type(expr) is Binop: + a = self.evalExpression(expr.a) + b = self.evalExpression(expr.b) + if expr.op == '+': + return a + b + elif expr.op == '-': + return a - b + elif expr.op == '*': + return a * b + elif expr.op == '/': + return float(a) / float(b) + elif expr.op == 'mod': + return int(a % b) + elif expr.op == 'div': + return int(a / b) + elif expr.op == 'or': + return a or b + elif expr.op == 'and': + return a and b + else: + self.Error('Cannot evaluate expression with {0}'.format(expr.op)) + elif type(expr) is Constant: + return expr.value + elif type(expr) is Designator: + if type(expr.obj) is Constant: + return self.evalExpression(expr.obj) + else: + self.Error('Cannot evaluate designated object {0}'.format(expr.obj)) + elif type(expr) is Unop: + a = self.evalExpression(expr.a) + if expr.op == 'not': + return not a + elif expr.op == '-': + return -a + else: + self.Error('Unimplemented unary operation {0}'.format(expr.op)) + else: + self.Error('Cannot evaluate expression {0}'.format(expr)) + + def parseConstExpression(self): + e = self.parseExpression() + return self.evalExpression(e), e.typ + + def parseConstantDeclarations(self): + """ Parse const part of a module """ + if self.hasConsumed('const'): + while self.token.typ == 'ID': + i = self.parseIdentDef() + self.Consume('=') + constvalue, typ = self.parseConstExpression() + self.Consume(';') + c = Constant(constvalue, typ, name=i.name, public=i.ispublic) + self.setLocation(c, i.location) + self.cst.addSymbol(c) + + # Type system + def parseTypeDeclarations(self): + if self.hasConsumed('type'): + while self.token.typ == 'ID': + typename, export = self.parseIdentDef() + self.Consume('=') + typ = self.parseStructuredType() + self.Consume(';') + t = DefinedType(typename, typ) + self.cst.addSymbol(t) + + def parseType(self): + if self.token.typ == 'ID': + typename = self.Consume('ID') + if self.cst.has(Type, typename): + typ = self.cst.get(Type, typename) + while type(typ) is DefinedType: + typ = typ.typ + return typ + else: + self.Error('Cannot find type {0}'.format(typename)) + else: + return self.parseStructuredType() + + def parseStructuredType(self): + if self.hasConsumed('array'): + dimensions = [] + dimensions.append( self.parseConstExpression() ) + while self.hasConsumed(','): + dimensions.append( self.parseConstExpression() ) + self.Consume('of') + arr = self.parseType() + for dimension, consttyp in reversed(dimensions): + if not isType(consttyp, integer): + self.Error('array dimension must be an integer type (not {0})'.format(consttyp)) + if dimension < 2: + self.Error('array dimension must be bigger than 1 (not {0})'.format(dimension)) + arr = ArrayType(dimension, arr) + return arr + elif self.hasConsumed('record'): + fields = {} + while self.token.typ == 'ID': + # parse a fieldlist: + identifiers = self.parseIdentList() + self.Consume(':') + typ = self.parseType() + self.Consume(';') + for i in identifiers: + if i.name in fields.keys(): + self.Error('record field "{0}" multiple defined.'.format(i.name)) + fields[i.name] = typ + # TODO store this in another way, symbol table? + self.Consume('end') + return RecordType(fields) + elif self.hasConsumed('pointer'): + self.Consume('to') + typ = self.parseType() + return PointerType(typ) + elif self.hasConsumed('procedure'): + parameters, returntype = self.parseFormalParameters() + return ProcedureType(parameters, returntype) + else: + self.Error('Unknown structured type "{0}"'.format(self.token.val)) -class Parser: - #TODO - pass + # Variable declarations: + def parseVariableDeclarations(self): + if self.hasConsumed('var'): + if self.token.typ == 'ID': + while self.token.typ == 'ID': + ids = self.parseIdentList() + self.Consume(':') + typename = self.parseType() + self.Consume(';') + for i in ids: + v = Variable(i.name, typename, public=i.ispublic) + self.setLocation(v, i.location) + self.cst.addSymbol(v) + else: + self.Error('Expected ID, got'+str(self.token)) + + # Procedures + def parseFPsection(self): + if self.hasConsumed('const'): + kind = 'const' + elif self.hasConsumed('var'): + kind = 'var' + else: + kind = 'value' + names = [ self.Consume('ID') ] + while self.hasConsumed(','): + names.append( self.Consume('ID') ) + self.Consume(':') + typ = self.parseType() + parameters = [Parameter(kind, name, typ) + for name in names] + return parameters + + def parseFormalParameters(self): + parameters = [] + self.Consume('(') + if not self.hasConsumed(')'): + parameters += self.parseFPsection() + while self.hasConsumed(';'): + parameters += self.parseFPsection() + self.Consume(')') + if self.hasConsumed(':'): + returntype = self.parseQualIdent() + else: + returntype = void + return ProcedureType(parameters, returntype) + + def parseProcedureDeclarations(self): + procedures = [] + while self.token.typ == 'procedure': + p = self.parseProcedureDeclaration() + procedures.append(p) + self.Consume(';') + return procedures + + def parseProcedureDeclaration(self): + loc = self.getLocation() + self.Consume('procedure') + i = self.parseIdentDef() + procname = i.name + proctyp = self.parseFormalParameters() + procsymtable = SymbolTable(parent = self.cst) + self.cst = procsymtable # Switch symbol table: + # Add parameters as variables to symbol table: + for parameter in proctyp.parameters: + vname = parameter.name + vtyp = parameter.typ + if parameter.kind == 'var': + vtyp = PointerType(vtyp) + variable = Variable(vname, vtyp, False) + if parameter.kind == 'const': + variable.isReadOnly = True + variable.isParameter = True + self.cst.addSymbol(variable) + self.Consume(';') + self.parseDeclarationSequence() + # Mark all variables as local: + for variable in self.cst.getAllLocal(Variable): + variable.isLocal = True + + if self.hasConsumed('begin'): + block = self.parseStatementSequence() + if self.hasConsumed('return'): + returnexpression = self.parseExpression() + else: + returnexpression = None + + if proctyp.returntype.isType(void): + if not returnexpression is None: + self.Error('Void procedure cannot return a value') + else: + if returnexpression is None: + self.Error('Procedure must return a value') + if not isType(returnexpression.typ, proctyp.returntype): + self.Error('Returned type {0} does not match function return type {1}'.format(returnexpression.typ, proctyp.returntype)) + + self.Consume('end') + endname = self.Consume('ID') + if endname != procname: + self.Error('endname should match {0}'.format(name)) + self.cst = procsymtable.parent # Switch back to parent symbol table + proc = Procedure(procname, proctyp, block, procsymtable, returnexpression) + self.setLocation(proc, loc) + self.cst.addSymbol(proc) + proc.public = i.ispublic + return proc + + # Statements: + def parseAssignment(self, lval): + loc = self.getLocation() + self.Consume(':=') + rval = self.parseExpression() + if isType(lval.typ, real) and isType(rval.typ, integer): + rval = Unop(rval, 'INTTOREAL', real) + if type(rval.typ) is NilType: + if not type(lval.typ) is ProcedureType and not type(lval.typ) is PointerType: + self.Error('Can assign nil only to pointers or procedure types, not {0}'.format(lval)) + elif not isType(lval.typ, rval.typ): + self.Error('Type mismatch {0} != {1}'.format(lval.typ, rval.typ)) + return self.setLocation(Assignment(lval, rval), loc) + + def parseExpressionList(self): + expressions = [ self.parseExpression() ] + while self.hasConsumed(','): + expressions.append( self.parseExpression() ) + return expressions + + def parseProcedureCall(self, procedure): + self.Consume('(') + if self.token.typ != ')': + args = self.parseExpressionList() + else: + args = [] + self.Consume(')') + parameters = procedure.typ.parameters + if len(args) != len(parameters): + self.Error("Procedure requires {0} arguments, {1} given".format(len(parameters), len(args))) + for arg, param in zip(args, parameters): + if not arg.typ.isType(param.typ): + print(arg.typ, param.typ) + self.Error('Mismatch in parameter') + return ProcedureCall(procedure, args) -class KsParser(Parser): - def __init__(self): - self.loadGrammar(KsGrammar) + def parseIfStatement(self): + loc = self.getLocation() + self.Consume('if') + ifs = [] + condition = self.parseExpression() + if not isType(condition.typ, boolean): + self.Error('condition of if statement must be boolean') + self.Consume('then') + truestatement = self.parseStatementSequence() + ifs.append( (condition, truestatement) ) + while self.hasConsumed('elsif'): + condition = self.parseExpression() + if not isType(condition.typ, boolean): + self.Error('condition of if statement must be boolean') + self.Consume('then') + truestatement = self.parseStatementSequence() + ifs.append( (condition, truestatement) ) + if self.hasConsumed('else'): + statement = self.parseStatementSequence() + else: + statement = None + self.Consume('end') + for condition, truestatement in reversed(ifs): + statement = IfStatement(condition, truestatement, statement) + return self.setLocation(statement, loc) + + def parseCase(self): + # TODO + pass + + def parseCaseStatement(self): + self.Consume('case') + expr = self.parseExpression() + self.Consume('of') + self.parseCase() + while self.hasConsumed('|'): + self.parseCase() + self.Consume('end') + + def parseWhileStatement(self): + loc = self.getLocation() + self.Consume('while') + condition = self.parseExpression() + self.Consume('do') + statements = self.parseStatementSequence() + if self.hasConsumed('elsif'): + self.Error('elsif in while not yet implemented') + self.Consume('end') + return self.setLocation(WhileStatement(condition, statements), loc) + + def parseRepeatStatement(self): + self.Consume('repeat') + stmt = self.parseStatementSequence() + self.Consume('until') + cond = self.parseBoolExpression() + + def parseForStatement(self): + loc = self.getLocation() + self.Consume('for') + variable = self.parseDesignator() + if not variable.typ.isType(integer): + self.Error('loop variable of for statement must have integer type') + assert(variable.typ.isType(integer)) + self.Consume(':=') + begin = self.parseExpression() + if not begin.typ.isType(integer): + self.Error('begin expression of a for statement must have integer type') + self.Consume('to') + end = self.parseExpression() + if not end.typ.isType(integer): + self.Error('end expression of a for statement must have integer type') + if self.hasConsumed('by'): + increment, typ = self.parseConstExpression() + if not typ.isType(integer): + self.Error('Increment must be integer') + else: + increment = 1 + assert(type(increment) is int) + self.Consume('do') + statements = self.parseStatementSequence() + self.Consume('end') + return self.setLocation(ForStatement(variable, begin, end, increment, statements), loc) -# For now, try to parse an expression as test case: -class KsGrammer(Grammar): + def parseAsmcode(self): + # TODO: move this to seperate file + def parseOpcode(): + return self.Consume('ID') + def parseOperand(): + if self.hasConsumed('['): + memref = [] + memref.append(parseOperand()) + self.Consume(']') + return memref + else: + if self.token.typ == 'NUMBER': + return self.Consume('NUMBER') + else: + ID = self.Consume('ID') + if self.cst.has(Variable, ID): + return self.cst.get(Variable, ID) + else: + return ID + + def parseOperands(n): + operands = [] + if n > 0: + operands.append( parseOperand() ) + n = n - 1 + while n > 0: + self.Consume(',') + operands.append(parseOperand()) + n = n - 1 + return operands + self.Consume('asm') + asmcode = [] + while self.token.typ != 'end': + opcode = parseOpcode() + func, numargs = assembler.opcodes[opcode] + operands = parseOperands(numargs) + asmcode.append( (opcode, operands) ) + #print('opcode', opcode, operands) + self.Consume('end') + return AsmCode(asmcode) - def __init__(self): - pass + def parseStatement(self): + try: + # Determine statement type based on the pending token: + if self.token.typ == 'if': + return self.parseIfStatement() + elif self.token.typ == 'case': + return self.parseCaseStatement() + elif self.token.typ == 'while': + return self.parseWhileStatement() + elif self.token.typ == 'repeat': + return self.parseRepeatStatement() + elif self.token.typ == 'for': + return self.parseForStatement() + elif self.token.typ == 'asm': + return self.parseAsmcode() + elif self.token.typ == 'ID': + # Assignment or procedure call + designator = self.parseDesignator() + if self.token.typ == '(' and type(designator.typ) is ProcedureType: + return self.parseProcedureCall(designator) + elif self.token.typ == ':=': + return self.parseAssignment(designator) + else: + self.Error('Unknown statement following designator: {0}'.format(self.token)) + else: + # TODO: return empty statement??: + return EmptyStatement() + self.Error('Unknown statement {0}'.format(self.token)) + except CompilerException as e: + print(e) + self.errorlist.append( (e.row, e.col, e.msg)) + # Do error recovery by skipping all tokens until next ; or end + while not (self.token.typ == ';' or self.token.typ == 'end'): + self.Consume(self.token.typ) + return EmptyStatement() + + def parseStatementSequence(self): + """ Sequence of statements seperated by ';' """ + statements = [ self.parseStatement() ] + while self.hasConsumed(';'): + statements.append( self.parseStatement() ) + return StatementSequence( statements ) # Parsing expressions: """ grammar of expressions: - expression = term { addoperator term } - addoperator = '+' | '-' + expression = SimpleExpression [ reloperator SimpleExpression ] + reloperator = '=' | '<=' | '>=' | '<>' + Simpleexpression = [ '+' | '-' ] term { addoperator term } + addoperator = '+' | '-' | 'or' term = factor { muloperator factor } - muloperator = '*' | '/' - factor = number | "(" expression ")" + muloperator = '*' | '/' | 'div' | 'mod' | 'and' + factor = number | nil | true | false | "(" expression ")" | + designator [ actualparameters ] | 'not' factor """ + def parseExpression(self): + """ The connector between the boolean and expression domain """ + expr = self.parseSimpleExpression() + if self.token.typ in ['>=','<=','<','>','<>','=']: + relop = self.Consume() + expr2 = self.parseSimpleExpression() + # Automatic type convert to reals: + if isType(expr.typ, real) and isType(expr2.typ, integer): + expr2 = Unop(expr2, 'INTTOREAL', real) + if isType(expr2.typ, real) and isType(expr.typ, integer): + expr = Unop(expr, 'INTTOREAL', real) + # Type check: + if not isType(expr.typ, expr2.typ): + self.Error('Type mismatch in relop') + if isType(expr.typ, real) and relop in ['<>', '=']: + self.Error('Cannot check real values for equality') - @rule(Term) - def Expression1(self, term): - return Expression(term) - - @rule(Term, AddOperator, Term) - def Expression2(self, term1, op, term2): - return Expression(term1, op, term2) + expr = Relop(expr, relop, expr2, boolean) + return expr # Parsing arithmatic expressions: def parseTerm(self): @@ -99,7 +701,6 @@ a = self.setLocation(Binop(a, op, b, typ), loc) return a - @rule( def parseFactor(self): if self.hasConsumed('('): e = self.parseExpression() @@ -116,6 +717,9 @@ elif self.token.typ == 'CHAR': val = self.Consume('CHAR') return Constant(val, char) + elif self.token.typ == 'STRING': + txt = self.Consume('STRING') + return StringConstant(txt) elif self.token.typ in ['true', 'false']: val = self.Consume() val = True if val == 'true' else False diff -r 6efbeb903777 -r fe145e42259d python/ppci/frontends/ks/symboltable.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/python/ppci/frontends/ks/symboltable.py Mon Dec 24 16:35:22 2012 +0100 @@ -0,0 +1,80 @@ +from .nodes import * +from ...core.errors import Error + +class SymbolTable: + """ + Symbol table for a current scope. + It has functions: + - hasname for checking for a name in current scope or above + - addSymbol to add an object + """ + def __init__(self, parent=None): + self.parent = parent + self.syms = {} + + def __repr__(self): + return 'Symboltable with {0} symbols\n'.format(len(self.syms)) + + def printTable(self, indent=0): + for name in self.syms: + print(self.syms[name]) + + def getAllLocal(self, cls): + """ Get all local objects of a specific type """ + r = [] + for key in self.syms.keys(): + sym = self.syms[key] + if issubclass(type(sym), cls): + r.append(sym) + return r + + def getLocal(self, cls, name): + if name in self.syms.keys(): + sym = self.syms[name] + if isinstance(sym, cls): + return sym + else: + Error('Wrong type found') + else: + Error('Symbol not found') + + # Retrieving of specific classes of items: + def get(self, cls, name): + if self.hasSymbol(name): + sym = self.getSymbol(name) + if issubclass(type(sym), cls): + return sym + raise SymbolException('type {0} undefined'.format(typename)) + + def has(self, cls, name): + if self.hasSymbol(name): + sym = self.getSymbol(name) + if issubclass(type(sym), cls): + return True + return False + + # Adding and retrieving of symbols in general: + def addSymbol(self, sym): + if sym.name in self.syms.keys(): + raise Exception('Symbol "{0}" redefined'.format(sym.name)) + else: + self.syms[sym.name] = sym + + def getSymbol(self, name): + if name in self.syms.keys(): + return self.syms[name] + else: + if self.parent: + return self.parent.getSymbol(name) + else: + Error('Symbol "{0}" undeclared!'.format(name)) + + def hasSymbol(self, name): + if name in self.syms.keys(): + return True + else: + if self.parent: + return self.parent.hasSymbol(name) + else: + return False + diff -r 6efbeb903777 -r fe145e42259d python/ppci/nodes.py --- a/python/ppci/nodes.py Mon Dec 24 15:03:30 2012 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,310 +0,0 @@ -""" -Parse tree elements -""" -class Node: - location = None - def getChildren(self): - children = [] - members = dir(self) - for member in members: - member = getattr(self, member) - if isinstance(member, Node): - children.append(member) - elif type(member) is list: - for mi in member: - if isinstance(mi, Node): - children.append(mi) - return children - -class Symbol(Node): - pass - -class Id(Node): - def __init__(self, name): - self.name = name - def __repr__(self): - return 'ID {0}'.format(self.name) - -# Selectors: -class Field(Node): - def __init__(self, fieldname): - self.fieldname = fieldname - def __repr__(self): - return 'FLD {0}'.format(self.fieldname) - -class Index(Node): - def __init__(self, index, typ): - self.index = index - self.typ = typ - def __repr__(self): - return 'IDX {0}'.format(self.index) - -class Deref(Node): - pass - -class Designator(Node): - def __init__(self, obj, selectors, typ): - self.obj = obj - self.selectors = selectors - self.typ = typ - def __repr__(self): - return 'DESIGNATOR {0}, selectors {1}, type {2}'.format(self.obj, self.selectors, self.typ) - -""" -Type classes -""" -def isType(a, b): - """ Compare types a and b and check if they are equal """ - if type(a) is type(b): - if type(a) is BaseType: - return (a.name == b.name) and (a.size == b.size) - elif type(a) is ArrayType: - return (a.dimension == b.dimension) and isType(a.elementType, b.elementType) - elif type(a) is ProcedureType: - if len(a.parameters) != len(b.parameters): - print('Number of parameters does not match') - return False - for aparam, bparam in zip(a.parameters, b.parameters): - if not isType(aparam.typ, bparam.typ): - print('Parameter {0} does not match parameter {1}'.format(aparam, bparam)) - return False - if a.result is None: - # TODO: how to handle a None return type?? - pass - if not isType(a.result, b.result): - print('Procedure return value mismatch {0} != {1}'.format(a.result, b.result)) - return False - return True - else: - print(a) - print(b) - Error('Not implemented {0}'.format(a)) - else: - return False - -class Type: - def isType(self, b): - return isType(self, b) - -class BaseType(Type): - def __init__(self, name, size): - self.name = name - self.size = size - def __repr__(self): - return '[TYPE {0}]'.format(self.name) - -class NilType(Node): - # TODO: how to handle nil values?? - def __repr__(self): - return 'NILTYPE' - -class ArrayType(Type): - def __init__(self, dimension, elementType): - self.dimension = dimension - self.elementType = elementType - self.size = elementType.size * dimension - def __repr__(self): - return '[ARRAY {0} of {1}]'.format(self.dimension, self.elementType) - -class RecordType(Type): - def __init__(self, fields): - self.fields = fields - self.size = 0 - for fieldname in self.fields: - self.size += self.fields[fieldname].size - def __repr__(self): - return '[RECORD {0}]'.format(self.fields) - -class PointerType(Type): - def __init__(self, pointedType): - self.pointedType = pointedType - self.size = 8 - def __repr__(self): - return '[POINTER {0}]'.format(self.pointedType) - -class ProcedureType(Type): - def __init__(self, parameters, returntype): - self.parameters = parameters - self.returntype = returntype - def __repr__(self): - return '[PROCTYPE {0} RET {1}]'.format(self.parameters, self.returntype) - -class DefinedType(Type): - def __init__(self, name, typ): - self.name = name - self.typ = typ - def __repr__(self): - return 'Named type {0} of type {1}'.format(self.name, self.typ) - -# Classes for constants like numbers and strings: -class StringConstant(Symbol): - def __init__(self, txt): - self.txt = txt - self.typ = 'string' - def __repr__(self): - return "STRING '{0}'".format(self.txt) - -# Variables, parameters, local variables, constants: -class Constant(Symbol): - def __init__(self, value, typ, name=None, public=False): - self.name = name - self.value = value - self.typ = typ - self.public = public - def __repr__(self): - return 'CONSTANT {0} = {1}'.format(self.name, self.value) - -class Variable(Symbol): - def __init__(self, name, typ, public): - self.name = name - self.typ = typ - self.public = public - self.isLocal = False - self.isReadOnly = False - self.isParameter = False - def __repr__(self): - txt = '[public] ' if self.public else '' - return '{2}VAR {0} : {1}'.format(self.name, self.typ, txt) - -class Parameter(Node): - """ A parameter has a passing method, name and typ """ - def __init__(self, kind, name, typ): - self.kind = kind - self.name = name - self.typ = typ - def __repr__(self): - return 'PARAM {0} {1} {2}'.format(self.kind, self.name, self.typ) - -# Operations: -class Unop(Node): - def __init__(self, a, op, typ): - self.a = a - self.op = op # Operation: '+', '-', '*', '/', 'mod' - self.typ = typ - self.place = None - def __repr__(self): - return 'UNOP {0}'.format(self.op) - -class Binop(Node): - def __init__(self, a, op, b, typ): - self.a = a - self.b = b - self.op = op # Operation: '+', '-', '*', '/', 'mod' - self.typ = typ # Resulting type :) - self.place = None - def __repr__(self): - return 'BINOP {0} {1}'.format(self.op, self.typ) - -class Relop(Node): - def __init__(self, a, relop, b, typ): - self.a = a - self.relop = relop - self.b = b - self.typ = typ - def __repr__(self): - return 'RELOP {0}'.format(self.relop) - -# Modules -class Module(Node): - def __init__(self, name): - self.name = name - def __repr__(self): - return 'MODULE {0}'.format(self.name) - -# Imports and Exports: -class ImportedSymbol(Node): - def __init__(self, modname, name): - self.modname = modname - self.name = name - def __repr__(self): - return 'IMPORTED SYMBOL {0}'.format(self.name) - -class ExportedSymbol(Node): - def __init__(self, name, typ): - self.name = name - self.typ = typ - def __repr__(self): - return 'EXPORTED PROCEDURE {0} : {1}'.format(self.name, self.typ) - -# Procedure types -class BuiltinProcedure(Node): - def __init__(self, name, typ): - self.name = name - self.typ = typ - def __repr__(self): - return 'BUILTIN PROCEDURE {0} : {1}'.format(self.name, self.typ) - -class Procedure(Symbol): - """ Actual implementation of a function """ - def __init__(self, name, typ, block, symtable, retexpr): - self.name = name - self.block = block - self.symtable = symtable - self.typ = typ - self.retexpr = retexpr - def __repr__(self): - return 'PROCEDURE {0} {1}'.format(self.name, self.typ) - -# Statements -class StatementSequence(Node): - def __init__(self, statements): - self.statements = statements - def __repr__(self): - return 'STATEMENTSEQUENCE' - -class EmptyStatement(Node): - def __repr__(self): - return 'EMPTY STATEMENT' - -class Assignment(Node): - def __init__(self, lval, rval): - self.lval = lval - self.rval = rval - def __repr__(self): - return 'ASSIGNMENT' - -class ProcedureCall(Node): - def __init__(self, proc, args): - self.proc = proc - self.args = args - self.typ = proc.typ.returntype - def __repr__(self): - return 'CALL {0} '.format(self.proc) - -class IfStatement(Node): - def __init__(self, condition, truestatement, falsestatement=None): - self.condition = condition - self.truestatement = truestatement - self.falsestatement = falsestatement - def __repr__(self): - return 'IF-statement' - -class CaseStatement(Node): - def __init__(self, condition): - self.condition = condition - def __repr__(self): - return 'CASE-statement' - -class WhileStatement(Node): - def __init__(self, condition, statements): - self.condition = condition - self.dostatements = statements - def __repr__(self): - return 'WHILE-statement' - -class ForStatement(Node): - def __init__(self, variable, begin, end, increment, statements): - self.variable = variable - self.begin = begin - self.end = end - self.increment = increment - self.statements = statements - def __repr__(self): - return 'FOR-statement' - -class AsmCode(Node): - def __init__(self, asmcode): - self.asmcode = asmcode - def __repr__(self): - return 'ASM CODE' - diff -r 6efbeb903777 -r fe145e42259d python/ppci/parsergen.py --- a/python/ppci/parsergen.py Mon Dec 24 15:03:30 2012 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,787 +0,0 @@ -""" - This module parses source code into an abstract syntax tree (AST) -""" - -from .symboltable import SymbolTable -from .nodes import * -from .errors import CompilerException, Error -from .modules import loadModule -from .display import printNode -from .builtin import * -from . import assembler - -class Parser: - def __init__(self, tokens): - """ provide the parser with the tokens iterator from the lexer. """ - self.tokens = tokens - self.NextToken() - self.errorlist = [] - - def Error(self, msg): - raise CompilerException(msg, self.token.row, self.token.col) - - # Lexer helpers: - def Consume(self, typ=''): - if self.token.typ == typ or typ == '': - v = self.token.val - self.NextToken() - return v - else: - self.Error('Excected: "{0}", got "{1}"'.format(typ, self.token.val)) - - def hasConsumed(self, typ): - if self.token.typ == typ: - self.Consume(typ) - return True - return False - - def NextToken(self): - self.token = self.tokens.__next__() - # TODO: store filename in location? - self.location = (self.token.row, self.token.col) - - # Helpers to find location of the error in the code: - def setLocation(self, obj, location): - obj.location = location - return obj - def getLocation(self): - return self.location - - """ - Recursive descent parser functions: - A set of mutual recursive functions. - Starting symbol is the Module. - """ - def parseModule(self): - self.imports = [] - loc = self.getLocation() - self.Consume('module') - modname = self.Consume('ID') - self.Consume(';') - mod = Module(modname) - - # Construct a symbol table for this program - mod.symtable = SymbolTable() - # Add built in types and functions: - for x in [real, integer, boolean, char, chr_func]: - mod.symtable.addSymbol(x) - - self.cst = mod.symtable - self.parseImportList() - - self.parseDeclarationSequence() - # Procedures only allowed in this scope - self.parseProcedureDeclarations() - - if self.hasConsumed('begin'): - mod.initcode = self.parseStatementSequence() - else: - mod.initcode = EmptyStatement() - - self.Consume('end') - endname = self.Consume('ID') - if endname != modname: - self.Error('end denoter must be module name') - self.Consume('.') - - mod.imports = self.imports - return self.setLocation(mod, loc) - - # Import part - def parseImportList(self): - if self.hasConsumed('import'): - self.parseImport() - while self.hasConsumed(','): - self.parseImport() - self.Consume(';') - - def parseImport(self): - loc = self.getLocation() - modname = self.Consume('ID') - mod = loadModule(modname) - self.setLocation(mod, loc) - self.cst.addSymbol(mod) - - # Helper to parse an identifier defenitions - def parseIdentDef(self): - loc = self.getLocation() - name = self.Consume('ID') - ispublic = self.hasConsumed('*') - # Make a node of this thing: - i = Id(name) - i.ispublic = ispublic - return self.setLocation(i, loc) - - def parseIdentList(self): - ids = [ self.parseIdentDef() ] - while self.hasConsumed(','): - ids.append( self.parseIdentDef() ) - return ids - - def parseQualIdent(self): - """ Parse a qualified identifier """ - name = self.Consume('ID') - if self.cst.has(Module, name): - modname = name - mod = self.cst.get(Module, modname) - self.Consume('.') - name = self.Consume('ID') - # Try to find existing imported symbol: - for imp in self.imports: - if imp.modname == modname and imp.name == name: - return imp - # Try to find the symbol in the modules exports: - for sym in mod.exports: - if sym.name == name: - impsym = ImportedSymbol(modname, name) - impsym.typ = sym.typ - impsym.signature = mod.signature - self.imports.append(impsym) - return impsym - self.Error("Cannot find symbol {0}".format(name)) - else: - return self.cst.getSymbol(name) - - # Helper to parse a designator - def parseDesignator(self): - """ A designator designates an object. - The base location in memory is denoted by the qualified identifier - The actual address depends on the selector. - """ - loc = self.getLocation() - obj = self.parseQualIdent() - typ = obj.typ - selectors = [] - while self.token.typ in ['.', '[', '^']: - if self.hasConsumed('.'): - field = self.Consume('ID') - if typ is PointerType: - selectors.append(Deref()) - typ = typ.pointedType - if not type(typ) is RecordType: - self.Error("field reference, type not record but {0}".format(typ)) - typ = typ.fields[field] - selectors.append(Field(field)) - elif self.hasConsumed('['): - indexes = self.parseExpressionList() - self.Consume(']') - for idx in indexes: - if not type(typ) is ArrayType: - self.Error('Cannot index non array type') - if not isType(idx.typ, integer): - self.Error('Only integer expressions can be used as an index') - selectors.append(Index(idx, typ)) - typ = typ.elementType - elif self.hasConsumed('^'): - selectors.append(Deref()) - typ = typ.pointedType - return self.setLocation(Designator(obj, selectors, typ), loc) - - # Declaration sequence - def parseDeclarationSequence(self): - """ 1. constants, 2. types, 3. variables """ - self.parseConstantDeclarations() - self.parseTypeDeclarations() - self.parseVariableDeclarations() - - # Constants - def evalExpression(self, expr): - if type(expr) is Binop: - a = self.evalExpression(expr.a) - b = self.evalExpression(expr.b) - if expr.op == '+': - return a + b - elif expr.op == '-': - return a - b - elif expr.op == '*': - return a * b - elif expr.op == '/': - return float(a) / float(b) - elif expr.op == 'mod': - return int(a % b) - elif expr.op == 'div': - return int(a / b) - elif expr.op == 'or': - return a or b - elif expr.op == 'and': - return a and b - else: - self.Error('Cannot evaluate expression with {0}'.format(expr.op)) - elif type(expr) is Constant: - return expr.value - elif type(expr) is Designator: - if type(expr.obj) is Constant: - return self.evalExpression(expr.obj) - else: - self.Error('Cannot evaluate designated object {0}'.format(expr.obj)) - elif type(expr) is Unop: - a = self.evalExpression(expr.a) - if expr.op == 'not': - return not a - elif expr.op == '-': - return -a - else: - self.Error('Unimplemented unary operation {0}'.format(expr.op)) - else: - self.Error('Cannot evaluate expression {0}'.format(expr)) - - def parseConstExpression(self): - e = self.parseExpression() - return self.evalExpression(e), e.typ - - def parseConstantDeclarations(self): - """ Parse const part of a module """ - if self.hasConsumed('const'): - while self.token.typ == 'ID': - i = self.parseIdentDef() - self.Consume('=') - constvalue, typ = self.parseConstExpression() - self.Consume(';') - c = Constant(constvalue, typ, name=i.name, public=i.ispublic) - self.setLocation(c, i.location) - self.cst.addSymbol(c) - - # Type system - def parseTypeDeclarations(self): - if self.hasConsumed('type'): - while self.token.typ == 'ID': - typename, export = self.parseIdentDef() - self.Consume('=') - typ = self.parseStructuredType() - self.Consume(';') - t = DefinedType(typename, typ) - self.cst.addSymbol(t) - - def parseType(self): - if self.token.typ == 'ID': - typename = self.Consume('ID') - if self.cst.has(Type, typename): - typ = self.cst.get(Type, typename) - while type(typ) is DefinedType: - typ = typ.typ - return typ - else: - self.Error('Cannot find type {0}'.format(typename)) - else: - return self.parseStructuredType() - - def parseStructuredType(self): - if self.hasConsumed('array'): - dimensions = [] - dimensions.append( self.parseConstExpression() ) - while self.hasConsumed(','): - dimensions.append( self.parseConstExpression() ) - self.Consume('of') - arr = self.parseType() - for dimension, consttyp in reversed(dimensions): - if not isType(consttyp, integer): - self.Error('array dimension must be an integer type (not {0})'.format(consttyp)) - if dimension < 2: - self.Error('array dimension must be bigger than 1 (not {0})'.format(dimension)) - arr = ArrayType(dimension, arr) - return arr - elif self.hasConsumed('record'): - fields = {} - while self.token.typ == 'ID': - # parse a fieldlist: - identifiers = self.parseIdentList() - self.Consume(':') - typ = self.parseType() - self.Consume(';') - for i in identifiers: - if i.name in fields.keys(): - self.Error('record field "{0}" multiple defined.'.format(i.name)) - fields[i.name] = typ - # TODO store this in another way, symbol table? - self.Consume('end') - return RecordType(fields) - elif self.hasConsumed('pointer'): - self.Consume('to') - typ = self.parseType() - return PointerType(typ) - elif self.hasConsumed('procedure'): - parameters, returntype = self.parseFormalParameters() - return ProcedureType(parameters, returntype) - else: - self.Error('Unknown structured type "{0}"'.format(self.token.val)) - - # Variable declarations: - def parseVariableDeclarations(self): - if self.hasConsumed('var'): - if self.token.typ == 'ID': - while self.token.typ == 'ID': - ids = self.parseIdentList() - self.Consume(':') - typename = self.parseType() - self.Consume(';') - for i in ids: - v = Variable(i.name, typename, public=i.ispublic) - self.setLocation(v, i.location) - self.cst.addSymbol(v) - else: - self.Error('Expected ID, got'+str(self.token)) - - # Procedures - def parseFPsection(self): - if self.hasConsumed('const'): - kind = 'const' - elif self.hasConsumed('var'): - kind = 'var' - else: - kind = 'value' - names = [ self.Consume('ID') ] - while self.hasConsumed(','): - names.append( self.Consume('ID') ) - self.Consume(':') - typ = self.parseType() - parameters = [Parameter(kind, name, typ) - for name in names] - return parameters - - def parseFormalParameters(self): - parameters = [] - self.Consume('(') - if not self.hasConsumed(')'): - parameters += self.parseFPsection() - while self.hasConsumed(';'): - parameters += self.parseFPsection() - self.Consume(')') - if self.hasConsumed(':'): - returntype = self.parseQualIdent() - else: - returntype = void - return ProcedureType(parameters, returntype) - - def parseProcedureDeclarations(self): - procedures = [] - while self.token.typ == 'procedure': - p = self.parseProcedureDeclaration() - procedures.append(p) - self.Consume(';') - return procedures - - def parseProcedureDeclaration(self): - loc = self.getLocation() - self.Consume('procedure') - i = self.parseIdentDef() - procname = i.name - proctyp = self.parseFormalParameters() - procsymtable = SymbolTable(parent = self.cst) - self.cst = procsymtable # Switch symbol table: - # Add parameters as variables to symbol table: - for parameter in proctyp.parameters: - vname = parameter.name - vtyp = parameter.typ - if parameter.kind == 'var': - vtyp = PointerType(vtyp) - variable = Variable(vname, vtyp, False) - if parameter.kind == 'const': - variable.isReadOnly = True - variable.isParameter = True - self.cst.addSymbol(variable) - self.Consume(';') - self.parseDeclarationSequence() - # Mark all variables as local: - for variable in self.cst.getAllLocal(Variable): - variable.isLocal = True - - if self.hasConsumed('begin'): - block = self.parseStatementSequence() - if self.hasConsumed('return'): - returnexpression = self.parseExpression() - else: - returnexpression = None - - if proctyp.returntype.isType(void): - if not returnexpression is None: - self.Error('Void procedure cannot return a value') - else: - if returnexpression is None: - self.Error('Procedure must return a value') - if not isType(returnexpression.typ, proctyp.returntype): - self.Error('Returned type {0} does not match function return type {1}'.format(returnexpression.typ, proctyp.returntype)) - - self.Consume('end') - endname = self.Consume('ID') - if endname != procname: - self.Error('endname should match {0}'.format(name)) - self.cst = procsymtable.parent # Switch back to parent symbol table - proc = Procedure(procname, proctyp, block, procsymtable, returnexpression) - self.setLocation(proc, loc) - self.cst.addSymbol(proc) - proc.public = i.ispublic - return proc - - # Statements: - def parseAssignment(self, lval): - loc = self.getLocation() - self.Consume(':=') - rval = self.parseExpression() - if isType(lval.typ, real) and isType(rval.typ, integer): - rval = Unop(rval, 'INTTOREAL', real) - if type(rval.typ) is NilType: - if not type(lval.typ) is ProcedureType and not type(lval.typ) is PointerType: - self.Error('Can assign nil only to pointers or procedure types, not {0}'.format(lval)) - elif not isType(lval.typ, rval.typ): - self.Error('Type mismatch {0} != {1}'.format(lval.typ, rval.typ)) - return self.setLocation(Assignment(lval, rval), loc) - - def parseExpressionList(self): - expressions = [ self.parseExpression() ] - while self.hasConsumed(','): - expressions.append( self.parseExpression() ) - return expressions - - def parseProcedureCall(self, procedure): - self.Consume('(') - if self.token.typ != ')': - args = self.parseExpressionList() - else: - args = [] - self.Consume(')') - parameters = procedure.typ.parameters - if len(args) != len(parameters): - self.Error("Procedure requires {0} arguments, {1} given".format(len(parameters), len(args))) - for arg, param in zip(args, parameters): - if not arg.typ.isType(param.typ): - print(arg.typ, param.typ) - self.Error('Mismatch in parameter') - return ProcedureCall(procedure, args) - - def parseIfStatement(self): - loc = self.getLocation() - self.Consume('if') - ifs = [] - condition = self.parseExpression() - if not isType(condition.typ, boolean): - self.Error('condition of if statement must be boolean') - self.Consume('then') - truestatement = self.parseStatementSequence() - ifs.append( (condition, truestatement) ) - while self.hasConsumed('elsif'): - condition = self.parseExpression() - if not isType(condition.typ, boolean): - self.Error('condition of if statement must be boolean') - self.Consume('then') - truestatement = self.parseStatementSequence() - ifs.append( (condition, truestatement) ) - if self.hasConsumed('else'): - statement = self.parseStatementSequence() - else: - statement = None - self.Consume('end') - for condition, truestatement in reversed(ifs): - statement = IfStatement(condition, truestatement, statement) - return self.setLocation(statement, loc) - - def parseCase(self): - # TODO - pass - - def parseCaseStatement(self): - self.Consume('case') - expr = self.parseExpression() - self.Consume('of') - self.parseCase() - while self.hasConsumed('|'): - self.parseCase() - self.Consume('end') - - def parseWhileStatement(self): - loc = self.getLocation() - self.Consume('while') - condition = self.parseExpression() - self.Consume('do') - statements = self.parseStatementSequence() - if self.hasConsumed('elsif'): - self.Error('elsif in while not yet implemented') - self.Consume('end') - return self.setLocation(WhileStatement(condition, statements), loc) - - def parseRepeatStatement(self): - self.Consume('repeat') - stmt = self.parseStatementSequence() - self.Consume('until') - cond = self.parseBoolExpression() - - def parseForStatement(self): - loc = self.getLocation() - self.Consume('for') - variable = self.parseDesignator() - if not variable.typ.isType(integer): - self.Error('loop variable of for statement must have integer type') - assert(variable.typ.isType(integer)) - self.Consume(':=') - begin = self.parseExpression() - if not begin.typ.isType(integer): - self.Error('begin expression of a for statement must have integer type') - self.Consume('to') - end = self.parseExpression() - if not end.typ.isType(integer): - self.Error('end expression of a for statement must have integer type') - if self.hasConsumed('by'): - increment, typ = self.parseConstExpression() - if not typ.isType(integer): - self.Error('Increment must be integer') - else: - increment = 1 - assert(type(increment) is int) - self.Consume('do') - statements = self.parseStatementSequence() - self.Consume('end') - return self.setLocation(ForStatement(variable, begin, end, increment, statements), loc) - - def parseAsmcode(self): - # TODO: move this to seperate file - def parseOpcode(): - return self.Consume('ID') - def parseOperand(): - if self.hasConsumed('['): - memref = [] - memref.append(parseOperand()) - self.Consume(']') - return memref - else: - if self.token.typ == 'NUMBER': - return self.Consume('NUMBER') - else: - ID = self.Consume('ID') - if self.cst.has(Variable, ID): - return self.cst.get(Variable, ID) - else: - return ID - - def parseOperands(n): - operands = [] - if n > 0: - operands.append( parseOperand() ) - n = n - 1 - while n > 0: - self.Consume(',') - operands.append(parseOperand()) - n = n - 1 - return operands - self.Consume('asm') - asmcode = [] - while self.token.typ != 'end': - opcode = parseOpcode() - func, numargs = assembler.opcodes[opcode] - operands = parseOperands(numargs) - asmcode.append( (opcode, operands) ) - #print('opcode', opcode, operands) - self.Consume('end') - return AsmCode(asmcode) - - def parseStatement(self): - try: - # Determine statement type based on the pending token: - if self.token.typ == 'if': - return self.parseIfStatement() - elif self.token.typ == 'case': - return self.parseCaseStatement() - elif self.token.typ == 'while': - return self.parseWhileStatement() - elif self.token.typ == 'repeat': - return self.parseRepeatStatement() - elif self.token.typ == 'for': - return self.parseForStatement() - elif self.token.typ == 'asm': - return self.parseAsmcode() - elif self.token.typ == 'ID': - # Assignment or procedure call - designator = self.parseDesignator() - if self.token.typ == '(' and type(designator.typ) is ProcedureType: - return self.parseProcedureCall(designator) - elif self.token.typ == ':=': - return self.parseAssignment(designator) - else: - self.Error('Unknown statement following designator: {0}'.format(self.token)) - else: - # TODO: return empty statement??: - return EmptyStatement() - self.Error('Unknown statement {0}'.format(self.token)) - except CompilerException as e: - print(e) - self.errorlist.append( (e.row, e.col, e.msg)) - # Do error recovery by skipping all tokens until next ; or end - while not (self.token.typ == ';' or self.token.typ == 'end'): - self.Consume(self.token.typ) - return EmptyStatement() - - def parseStatementSequence(self): - """ Sequence of statements seperated by ';' """ - statements = [ self.parseStatement() ] - while self.hasConsumed(';'): - statements.append( self.parseStatement() ) - return StatementSequence( statements ) - - # Parsing expressions: - """ - grammar of expressions: - expression = SimpleExpression [ reloperator SimpleExpression ] - reloperator = '=' | '<=' | '>=' | '<>' - Simpleexpression = [ '+' | '-' ] term { addoperator term } - addoperator = '+' | '-' | 'or' - term = factor { muloperator factor } - muloperator = '*' | '/' | 'div' | 'mod' | 'and' - factor = number | nil | true | false | "(" expression ")" | - designator [ actualparameters ] | 'not' factor - """ - def parseExpression(self): - """ The connector between the boolean and expression domain """ - expr = self.parseSimpleExpression() - if self.token.typ in ['>=','<=','<','>','<>','=']: - relop = self.Consume() - expr2 = self.parseSimpleExpression() - # Automatic type convert to reals: - if isType(expr.typ, real) and isType(expr2.typ, integer): - expr2 = Unop(expr2, 'INTTOREAL', real) - if isType(expr2.typ, real) and isType(expr.typ, integer): - expr = Unop(expr, 'INTTOREAL', real) - # Type check: - if not isType(expr.typ, expr2.typ): - self.Error('Type mismatch in relop') - if isType(expr.typ, real) and relop in ['<>', '=']: - self.Error('Cannot check real values for equality') - - expr = Relop(expr, relop, expr2, boolean) - return expr - - # Parsing arithmatic expressions: - def parseTerm(self): - a = self.parseFactor() - while self.token.typ in ['*', '/', 'mod', 'div', 'and']: - loc = self.getLocation() - op = self.Consume() - b = self.parseTerm() - # Type determination and checking: - if op in ['mod', 'div']: - if not isType(a.typ, integer): - self.Error('First operand should be integer, not {0}'.format(a.typ)) - if not isType(b.typ, integer): - self.Error('Second operand should be integer, not {0}'.format(b.typ)) - typ = integer - elif op == '*': - if isType(a.typ, integer) and isType(b.typ, integer): - typ = integer - elif isType(a.typ, real) or isType(b.typ, real): - if isType(a.typ, integer): - # Automatic type cast - a = Unop(a, 'INTTOREAL', real) - if isType(b.typ, integer): - b = Unop(b, 'INTTOREAL', real) - if not isType(a.typ, real): - self.Error('first operand must be a real!') - if not isType(b.typ, real): - self.Error('second operand must be a real!') - typ = real - else: - self.Error('Unknown operands for multiply: {0}, {1}'.format(a, b)) - elif op == '/': - # Division always yields a real result, for integer division use div - if isType(a.typ, integer): - # Automatic type cast - a = Unop(a, 'INTTOREAL', real) - if isType(b.typ, integer): - b = Unop(b, 'INTTOREAL', real) - if not isType(a.typ, real): - self.Error('first operand must be a real!') - if not isType(b.typ, real): - self.Error('second operand must be a real!') - typ = real - elif op == 'and': - if not isType(a.typ, boolean): - self.Error('First operand of and must be boolean') - if not isType(b.typ, boolean): - self.Error('Second operand of and must be boolean') - typ = boolean - else: - self.Error('Unknown operand {0}'.format(op)) - - a = self.setLocation(Binop(a, op, b, typ), loc) - return a - - def parseFactor(self): - if self.hasConsumed('('): - e = self.parseExpression() - self.Consume(')') - return e - elif self.token.typ == 'NUMBER': - loc = self.getLocation() - val = self.Consume('NUMBER') - return self.setLocation(Constant(val, integer), loc) - elif self.token.typ == 'REAL': - loc = self.getLocation() - val = self.Consume('REAL') - return self.setLocation(Constant(val, real), loc) - elif self.token.typ == 'CHAR': - val = self.Consume('CHAR') - return Constant(val, char) - elif self.token.typ == 'STRING': - txt = self.Consume('STRING') - return StringConstant(txt) - elif self.token.typ in ['true', 'false']: - val = self.Consume() - val = True if val == 'true' else False - return Constant(val, boolean) - elif self.hasConsumed('nil'): - return Constant(0, NilType()) - elif self.hasConsumed('not'): - f = self.parseFactor() - if not isType(f.typ, boolean): - self.Error('argument of boolean negation must be boolean type') - return Unop(f, 'not', boolean) - elif self.token.typ == 'ID': - designator = self.parseDesignator() - # TODO: handle functions different here? - if self.token.typ == '(' and type(designator.typ) is ProcedureType: - return self.parseProcedureCall(designator) - else: - return designator - else: - self.Error('Expected NUMBER, ID or ( expr ), got'+str(self.token)) - - def parseSimpleExpression(self): - """ Arithmatic expression """ - if self.token.typ in ['+', '-']: - # Handle the unary minus - op = self.Consume() - a = self.parseTerm() - typ = a.typ - if not isType(typ,real) and not isType(typ, integer): - self.Error('Unary minus or plus can be only applied to real or integers') - if op == '-': - a = Unop(a, op, typ) - else: - a = self.parseTerm() - while self.token.typ in ['+', '-', 'or']: - loc = self.getLocation() - op = self.Consume() - b = self.parseTerm() - if op in ['+', '-']: - if isType(a.typ, real) or isType(b.typ, real): - typ = real - if isType(a.typ, integer): - # Automatic type cast - a = Unop(a, 'INTTOREAL', real) - if not isType(a.typ, real): - self.Error('first operand must be a real!') - if isType(b.typ, integer): - b = Unop(b, 'INTTOREAL', real) - if not isType(b.typ, real): - self.Error('second operand must be a real!') - elif isType(a.typ, integer) and isType(b.typ, integer): - typ = integer - else: - self.Error('Invalid types {0} and {1}'.format(a.typ, b.typ)) - elif op == 'or': - if not isType(a.typ, boolean): - self.Error('first operand must be boolean for or operation') - if not isType(b.typ, boolean): - self.Error('second operand must be boolean for or operation') - typ = boolean - else: - self.Error('Unknown operand {0}'.format(op)) - a = self.setLocation(Binop(a, op, b, typ), loc) - return a - diff -r 6efbeb903777 -r fe145e42259d python/ppci/symboltable.py --- a/python/ppci/symboltable.py Mon Dec 24 15:03:30 2012 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,80 +0,0 @@ -from .nodes import * -from .errors import Error - -class SymbolTable: - """ - Symbol table for a current scope. - It has functions: - - hasname for checking for a name in current scope or above - - addSymbol to add an object - """ - def __init__(self, parent=None): - self.parent = parent - self.syms = {} - - def __repr__(self): - return 'Symboltable with {0} symbols\n'.format(len(self.syms)) - - def printTable(self, indent=0): - for name in self.syms: - print(self.syms[name]) - - def getAllLocal(self, cls): - """ Get all local objects of a specific type """ - r = [] - for key in self.syms.keys(): - sym = self.syms[key] - if issubclass(type(sym), cls): - r.append(sym) - return r - - def getLocal(self, cls, name): - if name in self.syms.keys(): - sym = self.syms[name] - if isinstance(sym, cls): - return sym - else: - Error('Wrong type found') - else: - Error('Symbol not found') - - # Retrieving of specific classes of items: - def get(self, cls, name): - if self.hasSymbol(name): - sym = self.getSymbol(name) - if issubclass(type(sym), cls): - return sym - raise SymbolException('type {0} undefined'.format(typename)) - - def has(self, cls, name): - if self.hasSymbol(name): - sym = self.getSymbol(name) - if issubclass(type(sym), cls): - return True - return False - - # Adding and retrieving of symbols in general: - def addSymbol(self, sym): - if sym.name in self.syms.keys(): - raise Exception('Symbol "{0}" redefined'.format(sym.name)) - else: - self.syms[sym.name] = sym - - def getSymbol(self, name): - if name in self.syms.keys(): - return self.syms[name] - else: - if self.parent: - return self.parent.getSymbol(name) - else: - Error('Symbol "{0}" undeclared!'.format(name)) - - def hasSymbol(self, name): - if name in self.syms.keys(): - return True - else: - if self.parent: - return self.parent.hasSymbol(name) - else: - return False - diff -r 6efbeb903777 -r fe145e42259d python/ppci/test.py --- a/python/ppci/test.py Mon Dec 24 15:03:30 2012 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,8 +0,0 @@ - -from core import BitReader - -with open('main.s.bc', 'rb') as f: - br = BitReader(f) - br.parseModule() - print(br) - diff -r 6efbeb903777 -r fe145e42259d python/runtests.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/python/runtests.py Mon Dec 24 16:35:22 2012 +0100 @@ -0,0 +1,274 @@ +#!/usr/bin/python + +import unittest +import os + +from compiler.compiler import Compiler +from compiler.errors import CompilerException, printError +from compiler import lexer +from compiler.parser import Parser +from compiler import assembler +from compiler.codegenerator import CodeGenerator +from project import Project + +class CompilerTestCase(unittest.TestCase): + """ test methods start with 'test*' """ + def testSource1(self): + source = """ + module lcfos; + var + a : integer; + + procedure putchar(num : integer); + begin + end putchar; + + procedure WriteNum( num: integer); + var + d, base : integer; + dgt : integer; + begin + d := 1; + base := 10; + while num div d >= base do + d := d * base + end; + while d <> 0 do + dgt := num div d; + num := num mod d; + d := d div base; + putchar(48 + dgt) + end + end WriteNum; + + begin + a := 1; + while a < 26 + do + putchar(65+a); + a := a * 2 + end; + end lcfos. + """ + pc = Compiler() + pc.compilesource(source) + def testSource2(self): + source = """ + module lcfos; + var + a, b : integer; + arr: array 30 of integer; + arr2: array 10, 12 of integer; + procedure t2*() : integer; + begin + a := 2; + while a < 5 do + b := arr[a-1] + arr[a-2]; + arr2[a,2] := b; + arr2[a,3] := arr2[a,2] + arr2[a,2]*3 + b; + arr[a] := b; + a := a + 1; + end; + return b + end t2; + begin + b := 12; + arr[0] := 1; + arr[1] := 1; + end lcfos. + """ + pc = Compiler() + mod = pc.compilesource(source) + def testSource5(self): + source = """ + module lcfos; + procedure WriteLn() : integer; + const zzz = 13; + var + a, b, c: integer; + begin + a := 2; + b := 7; + c := 10 * a + b*10*a; + return c + end WriteLn; + begin end lcfos. + """ + pc = Compiler() + pc.compilesource(source) + def tstForStatement(self): + source = """ + module fortest; + var + a,b,c : integer; + begin + c := 0; + for a := 1 to 10 by 1 do + b := a + 15; + c := c + b * a; + end; + end fortest. + """ + pc = Compiler() + pc.compilesource(source) + def testSourceIfAndWhilePattern(self): + source = """ + module lcfos; + procedure WriteLn() : integer; + const zzz = 13; + var + a, b, c: integer; + begin + a := 1; + b := 2; + if a * 3 > b then + c := 10*a + b*10*a*a*a*b; + else + c := 13; + end; + while a < 101 do + a := a + 1; + c := c + 2; + end; + return c + end WriteLn; + begin end lcfos. + """ + pc = Compiler() + pc.compilesource(source) + + def testPattern1(self): + """ Test if expression can be compiled into byte code """ + src = "12*13+33-12*2*3" + tokens = lexer.tokenize(src) + ast = Parser(tokens).parseExpression() + code = CodeGenerator().genexprcode(ast) + + def testAssembler(self): + """ Check all kind of assembler cases """ + assert(assembler.shortjump(5) == [0xeb, 0x5]) + assert(assembler.shortjump(-2) == [0xeb, 0xfc]) + assert(assembler.shortjump(10,'GE') == [0x7d, 0xa]) + assert(assembler.nearjump(5) == [0xe9, 0x5,0x0,0x0,0x0]) + assert(assembler.nearjump(-2) == [0xe9, 0xf9, 0xff,0xff,0xff]) + assert(assembler.nearjump(10,'LE') == [0x0f, 0x8e, 0xa,0x0,0x0,0x0]) + + def testCall(self): + assert(assembler.call('r10') == [0x41, 0xff, 0xd2]) + assert(assembler.call('rcx') == [0xff, 0xd1]) + def testXOR(self): + assert(assembler.xorreg64('rax', 'rax') == [0x48, 0x31, 0xc0]) + assert(assembler.xorreg64('r9', 'r8') == [0x4d, 0x31, 0xc1]) + assert(assembler.xorreg64('rbx', 'r11') == [0x4c, 0x31, 0xdb]) + + def testINC(self): + assert(assembler.increg64('r11') == [0x49, 0xff, 0xc3]) + assert(assembler.increg64('rcx') == [0x48, 0xff, 0xc1]) + + def testPush(self): + assert(assembler.push('rbp') == [0x55]) + assert(assembler.push('rbx') == [0x53]) + assert(assembler.push('r12') == [0x41, 0x54]) + def testPop(self): + assert(assembler.pop('rbx') == [0x5b]) + assert(assembler.pop('rbp') == [0x5d]) + assert(assembler.pop('r12') == [0x41, 0x5c]) + + def testAsmLoads(self): + # TODO constant add testcases + assert(assembler.mov('rbx', 'r14') == [0x4c, 0x89, 0xf3]) + assert(assembler.mov('r12', 'r8') == [0x4d, 0x89, 0xc4]) + assert(assembler.mov('rdi', 'rsp') == [0x48, 0x89, 0xe7]) + + def testAsmMemLoads(self): + assert(assembler.mov('rax', ['r8','r15',0x11]) == [0x4b,0x8b,0x44,0x38,0x11]) + assert(assembler.mov('r13', ['rbp','rcx',0x23]) == [0x4c,0x8b,0x6c,0xd,0x23]) + + assert(assembler.mov('r9', ['rbp',-0x33]) == [0x4c,0x8b,0x4d,0xcd]) + #assert(assembler.movreg64('rbx', ['rax']) == [0x48, 0x8b,0x18]) + + assert(assembler.mov('rax', [0xb000]) == [0x48,0x8b,0x4,0x25,0x0,0xb0,0x0,0x0]) + assert(assembler.mov('r11', [0xa0]) == [0x4c,0x8b,0x1c,0x25,0xa0,0x0,0x0,0x0]) + + assert(assembler.mov('r11', ['RIP', 0xf]) == [0x4c,0x8b,0x1d,0x0f,0x0,0x0,0x0]) + + def testAsmMemStores(self): + assert(assembler.mov(['rbp', 0x13],'rbx') == [0x48,0x89,0x5d,0x13]) + assert(assembler.mov(['r12', 0x12],'r9') == [0x4d,0x89,0x4c,0x24,0x12]) + assert(assembler.mov(['rcx', 0x11],'r14') == [0x4c,0x89,0x71,0x11]) + + + assert(assembler.mov([0xab], 'rbx') == [0x48,0x89,0x1c,0x25,0xab,0x0,0x0,0x0]) + assert(assembler.mov([0xcd], 'r13') == [0x4c,0x89,0x2c,0x25,0xcd,0x0,0x0,0x0]) + + assert(assembler.mov(['RIP', 0xf], 'r9') == [0x4c,0x89,0x0d,0x0f,0x0,0x0,0x0]) + + def testAsmMOV8(self): + assert(assembler.mov(['rbp', -8], 'al') == [0x88, 0x45, 0xf8]) + assert(assembler.mov(['r11', 9], 'cl') == [0x41, 0x88, 0x4b, 0x09]) + + assert(assembler.mov(['rbx'], 'al') == [0x88, 0x03]) + assert(assembler.mov(['r11'], 'dl') == [0x41, 0x88, 0x13]) + + def testAsmLea(self): + assert(assembler.leareg64('r11', ['RIP', 0xf]) == [0x4c,0x8d,0x1d,0x0f,0x0,0x0,0x0]) + assert(assembler.leareg64('rsi', ['RIP', 0x7]) == [0x48,0x8d,0x35,0x07,0x0,0x0,0x0]) + + assert(assembler.leareg64('rcx', ['rbp', -8]) == [0x48,0x8d,0x4d,0xf8]) + + def testAssemblerCMP(self): + assert(assembler.cmpreg64('rdi', 'r13') == [0x4c, 0x39, 0xef]) + assert(assembler.cmpreg64('rbx', 'r14') == [0x4c, 0x39, 0xf3]) + assert(assembler.cmpreg64('r12', 'r9') == [0x4d, 0x39, 0xcc]) + + assert(assembler.cmpreg64('rdi', 1) == [0x48, 0x83, 0xff, 0x01]) + assert(assembler.cmpreg64('r11', 2) == [0x49, 0x83, 0xfb, 0x02]) + def testAssemblerADD(self): + assert(assembler.addreg64('rbx', 'r13') == [0x4c, 0x01, 0xeb]) + assert(assembler.addreg64('rax', 'rbx') == [0x48, 0x01, 0xd8]) + assert(assembler.addreg64('r12', 'r13') == [0x4d, 0x01, 0xec]) + + assert(assembler.addreg64('rbx', 0x13) == [0x48, 0x83, 0xc3, 0x13]) + assert(assembler.addreg64('r11', 0x1234567) == [0x49, 0x81, 0xc3, 0x67, 0x45,0x23,0x1]) + assert(assembler.addreg64('rsp', 0x33) == [0x48, 0x83, 0xc4, 0x33]) + + def testAssemblerSUB(self): + assert(assembler.subreg64('rdx', 'r14') == [0x4c, 0x29, 0xf2]) + assert(assembler.subreg64('r15', 'rbx') == [0x49, 0x29, 0xdf]) + assert(assembler.subreg64('r8', 'r9') == [0x4d, 0x29, 0xc8]) + + assert(assembler.subreg64('rsp', 0x123456) == [0x48, 0x81, 0xec, 0x56,0x34,0x12,0x0]) + assert(assembler.subreg64('rsp', 0x12) == [0x48, 0x83, 0xec, 0x12]) + + def testAssemblerIDIV(self): + assert(assembler.idivreg64('r11') == [0x49, 0xf7, 0xfb]) + assert(assembler.idivreg64('rcx') == [0x48, 0xf7, 0xf9]) + assert(assembler.idivreg64('rsp') == [0x48, 0xf7, 0xfc]) + + def testAssemblerIMUL(self): + assert(assembler.imulreg64_rax('rdi') == [0x48, 0xf7, 0xef]) + assert(assembler.imulreg64_rax('r10') == [0x49, 0xf7, 0xea]) + assert(assembler.imulreg64_rax('rdx') == [0x48, 0xf7, 0xea]) + + assert(assembler.imulreg64('r11', 'rdi') == [0x4c, 0xf, 0xaf, 0xdf]) + assert(assembler.imulreg64('r12', 'rbx') == [0x4c, 0xf, 0xaf, 0xe3]) + # nasm generates this machine code: 0x4d, 0x6b, 0xff, 0xee + # This also works: 4D0FAFFE (another variant?? ) + assert(assembler.imulreg64('r15', 'r14') == [0x4d, 0x0f, 0xaf, 0xfe]) + def testProject(self): + p = Project('test.xml', isnew=True) + p.name = "Test project" + p.files.append('main.mod') + p.files.append('test.mod') + p.save('test.xml') + + q = Project('test.xml') + + assert(p.name == q.name) + assert(p.files == q.files) + # TODO: remove test.xml test file + os.remove('test.xml') + +if __name__ == '__main__': + unittest.main() + diff -r 6efbeb903777 -r fe145e42259d python/tests/main.s.bc Binary file python/tests/main.s.bc has changed diff -r 6efbeb903777 -r fe145e42259d python/tests/ppcitest.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/python/tests/ppcitest.py Mon Dec 24 16:35:22 2012 +0100 @@ -0,0 +1,8 @@ + +from core import BitReader + +with open('main.s.bc', 'rb') as f: + br = BitReader(f) + br.parseModule() + print(br) + diff -r 6efbeb903777 -r fe145e42259d python/tests/runtests.py --- a/python/tests/runtests.py Mon Dec 24 15:03:30 2012 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,272 +0,0 @@ -import unittest -import os - -from compiler.compiler import Compiler -from compiler.errors import CompilerException, printError -from compiler import lexer -from compiler.parser import Parser -from compiler import assembler -from compiler.codegenerator import CodeGenerator -from project import Project - -class CompilerTestCase(unittest.TestCase): - """ test methods start with 'test*' """ - def testSource1(self): - source = """ - module lcfos; - var - a : integer; - - procedure putchar(num : integer); - begin - end putchar; - - procedure WriteNum( num: integer); - var - d, base : integer; - dgt : integer; - begin - d := 1; - base := 10; - while num div d >= base do - d := d * base - end; - while d <> 0 do - dgt := num div d; - num := num mod d; - d := d div base; - putchar(48 + dgt) - end - end WriteNum; - - begin - a := 1; - while a < 26 - do - putchar(65+a); - a := a * 2 - end; - end lcfos. - """ - pc = Compiler() - pc.compilesource(source) - def testSource2(self): - source = """ - module lcfos; - var - a, b : integer; - arr: array 30 of integer; - arr2: array 10, 12 of integer; - procedure t2*() : integer; - begin - a := 2; - while a < 5 do - b := arr[a-1] + arr[a-2]; - arr2[a,2] := b; - arr2[a,3] := arr2[a,2] + arr2[a,2]*3 + b; - arr[a] := b; - a := a + 1; - end; - return b - end t2; - begin - b := 12; - arr[0] := 1; - arr[1] := 1; - end lcfos. - """ - pc = Compiler() - mod = pc.compilesource(source) - def testSource5(self): - source = """ - module lcfos; - procedure WriteLn() : integer; - const zzz = 13; - var - a, b, c: integer; - begin - a := 2; - b := 7; - c := 10 * a + b*10*a; - return c - end WriteLn; - begin end lcfos. - """ - pc = Compiler() - pc.compilesource(source) - def tstForStatement(self): - source = """ - module fortest; - var - a,b,c : integer; - begin - c := 0; - for a := 1 to 10 by 1 do - b := a + 15; - c := c + b * a; - end; - end fortest. - """ - pc = Compiler() - pc.compilesource(source) - def testSourceIfAndWhilePattern(self): - source = """ - module lcfos; - procedure WriteLn() : integer; - const zzz = 13; - var - a, b, c: integer; - begin - a := 1; - b := 2; - if a * 3 > b then - c := 10*a + b*10*a*a*a*b; - else - c := 13; - end; - while a < 101 do - a := a + 1; - c := c + 2; - end; - return c - end WriteLn; - begin end lcfos. - """ - pc = Compiler() - pc.compilesource(source) - - def testPattern1(self): - """ Test if expression can be compiled into byte code """ - src = "12*13+33-12*2*3" - tokens = lexer.tokenize(src) - ast = Parser(tokens).parseExpression() - code = CodeGenerator().genexprcode(ast) - - def testAssembler(self): - """ Check all kind of assembler cases """ - assert(assembler.shortjump(5) == [0xeb, 0x5]) - assert(assembler.shortjump(-2) == [0xeb, 0xfc]) - assert(assembler.shortjump(10,'GE') == [0x7d, 0xa]) - assert(assembler.nearjump(5) == [0xe9, 0x5,0x0,0x0,0x0]) - assert(assembler.nearjump(-2) == [0xe9, 0xf9, 0xff,0xff,0xff]) - assert(assembler.nearjump(10,'LE') == [0x0f, 0x8e, 0xa,0x0,0x0,0x0]) - - def testCall(self): - assert(assembler.call('r10') == [0x41, 0xff, 0xd2]) - assert(assembler.call('rcx') == [0xff, 0xd1]) - def testXOR(self): - assert(assembler.xorreg64('rax', 'rax') == [0x48, 0x31, 0xc0]) - assert(assembler.xorreg64('r9', 'r8') == [0x4d, 0x31, 0xc1]) - assert(assembler.xorreg64('rbx', 'r11') == [0x4c, 0x31, 0xdb]) - - def testINC(self): - assert(assembler.increg64('r11') == [0x49, 0xff, 0xc3]) - assert(assembler.increg64('rcx') == [0x48, 0xff, 0xc1]) - - def testPush(self): - assert(assembler.push('rbp') == [0x55]) - assert(assembler.push('rbx') == [0x53]) - assert(assembler.push('r12') == [0x41, 0x54]) - def testPop(self): - assert(assembler.pop('rbx') == [0x5b]) - assert(assembler.pop('rbp') == [0x5d]) - assert(assembler.pop('r12') == [0x41, 0x5c]) - - def testAsmLoads(self): - # TODO constant add testcases - assert(assembler.mov('rbx', 'r14') == [0x4c, 0x89, 0xf3]) - assert(assembler.mov('r12', 'r8') == [0x4d, 0x89, 0xc4]) - assert(assembler.mov('rdi', 'rsp') == [0x48, 0x89, 0xe7]) - - def testAsmMemLoads(self): - assert(assembler.mov('rax', ['r8','r15',0x11]) == [0x4b,0x8b,0x44,0x38,0x11]) - assert(assembler.mov('r13', ['rbp','rcx',0x23]) == [0x4c,0x8b,0x6c,0xd,0x23]) - - assert(assembler.mov('r9', ['rbp',-0x33]) == [0x4c,0x8b,0x4d,0xcd]) - #assert(assembler.movreg64('rbx', ['rax']) == [0x48, 0x8b,0x18]) - - assert(assembler.mov('rax', [0xb000]) == [0x48,0x8b,0x4,0x25,0x0,0xb0,0x0,0x0]) - assert(assembler.mov('r11', [0xa0]) == [0x4c,0x8b,0x1c,0x25,0xa0,0x0,0x0,0x0]) - - assert(assembler.mov('r11', ['RIP', 0xf]) == [0x4c,0x8b,0x1d,0x0f,0x0,0x0,0x0]) - - def testAsmMemStores(self): - assert(assembler.mov(['rbp', 0x13],'rbx') == [0x48,0x89,0x5d,0x13]) - assert(assembler.mov(['r12', 0x12],'r9') == [0x4d,0x89,0x4c,0x24,0x12]) - assert(assembler.mov(['rcx', 0x11],'r14') == [0x4c,0x89,0x71,0x11]) - - - assert(assembler.mov([0xab], 'rbx') == [0x48,0x89,0x1c,0x25,0xab,0x0,0x0,0x0]) - assert(assembler.mov([0xcd], 'r13') == [0x4c,0x89,0x2c,0x25,0xcd,0x0,0x0,0x0]) - - assert(assembler.mov(['RIP', 0xf], 'r9') == [0x4c,0x89,0x0d,0x0f,0x0,0x0,0x0]) - - def testAsmMOV8(self): - assert(assembler.mov(['rbp', -8], 'al') == [0x88, 0x45, 0xf8]) - assert(assembler.mov(['r11', 9], 'cl') == [0x41, 0x88, 0x4b, 0x09]) - - assert(assembler.mov(['rbx'], 'al') == [0x88, 0x03]) - assert(assembler.mov(['r11'], 'dl') == [0x41, 0x88, 0x13]) - - def testAsmLea(self): - assert(assembler.leareg64('r11', ['RIP', 0xf]) == [0x4c,0x8d,0x1d,0x0f,0x0,0x0,0x0]) - assert(assembler.leareg64('rsi', ['RIP', 0x7]) == [0x48,0x8d,0x35,0x07,0x0,0x0,0x0]) - - assert(assembler.leareg64('rcx', ['rbp', -8]) == [0x48,0x8d,0x4d,0xf8]) - - def testAssemblerCMP(self): - assert(assembler.cmpreg64('rdi', 'r13') == [0x4c, 0x39, 0xef]) - assert(assembler.cmpreg64('rbx', 'r14') == [0x4c, 0x39, 0xf3]) - assert(assembler.cmpreg64('r12', 'r9') == [0x4d, 0x39, 0xcc]) - - assert(assembler.cmpreg64('rdi', 1) == [0x48, 0x83, 0xff, 0x01]) - assert(assembler.cmpreg64('r11', 2) == [0x49, 0x83, 0xfb, 0x02]) - def testAssemblerADD(self): - assert(assembler.addreg64('rbx', 'r13') == [0x4c, 0x01, 0xeb]) - assert(assembler.addreg64('rax', 'rbx') == [0x48, 0x01, 0xd8]) - assert(assembler.addreg64('r12', 'r13') == [0x4d, 0x01, 0xec]) - - assert(assembler.addreg64('rbx', 0x13) == [0x48, 0x83, 0xc3, 0x13]) - assert(assembler.addreg64('r11', 0x1234567) == [0x49, 0x81, 0xc3, 0x67, 0x45,0x23,0x1]) - assert(assembler.addreg64('rsp', 0x33) == [0x48, 0x83, 0xc4, 0x33]) - - def testAssemblerSUB(self): - assert(assembler.subreg64('rdx', 'r14') == [0x4c, 0x29, 0xf2]) - assert(assembler.subreg64('r15', 'rbx') == [0x49, 0x29, 0xdf]) - assert(assembler.subreg64('r8', 'r9') == [0x4d, 0x29, 0xc8]) - - assert(assembler.subreg64('rsp', 0x123456) == [0x48, 0x81, 0xec, 0x56,0x34,0x12,0x0]) - assert(assembler.subreg64('rsp', 0x12) == [0x48, 0x83, 0xec, 0x12]) - - def testAssemblerIDIV(self): - assert(assembler.idivreg64('r11') == [0x49, 0xf7, 0xfb]) - assert(assembler.idivreg64('rcx') == [0x48, 0xf7, 0xf9]) - assert(assembler.idivreg64('rsp') == [0x48, 0xf7, 0xfc]) - - def testAssemblerIMUL(self): - assert(assembler.imulreg64_rax('rdi') == [0x48, 0xf7, 0xef]) - assert(assembler.imulreg64_rax('r10') == [0x49, 0xf7, 0xea]) - assert(assembler.imulreg64_rax('rdx') == [0x48, 0xf7, 0xea]) - - assert(assembler.imulreg64('r11', 'rdi') == [0x4c, 0xf, 0xaf, 0xdf]) - assert(assembler.imulreg64('r12', 'rbx') == [0x4c, 0xf, 0xaf, 0xe3]) - # nasm generates this machine code: 0x4d, 0x6b, 0xff, 0xee - # This also works: 4D0FAFFE (another variant?? ) - assert(assembler.imulreg64('r15', 'r14') == [0x4d, 0x0f, 0xaf, 0xfe]) - def testProject(self): - p = Project('test.xml', isnew=True) - p.name = "Test project" - p.files.append('main.mod') - p.files.append('test.mod') - p.save('test.xml') - - q = Project('test.xml') - - assert(p.name == q.name) - assert(p.files == q.files) - # TODO: remove test.xml test file - os.remove('test.xml') - -if __name__ == '__main__': - unittest.main() -