Index

Standard/Extended Java Abstract Syntax Tree Format


What is the "Standard Java Abstract Syntax Tree" (standard Java AST) and the "Extended Java Abstract Syntax Tree" (extended Java AST) ?

Since the Tree class that expresses the abstract syntax tree is very versatile, it can even express meaningless things like the following.
 (foo (literalTree xyz "123") (bar (id X) (baz) (id "a b c")))
However, the actual abstract syntax tree generated as a result of parsing Java source code is very limited.

For example, the number of subtrees for a tree with a tag named ":class" is always six. Furthemore, the type of a subtree is fixed in certain areas, such as the tag of the number zero subtree being ":modifiers" and the third being ":extends".

This type of abstract syntax tree format that the EPP parser can generated is called the "standard Java abstract syntax tree".

Programs that process the abstract syntax tree of the Java language may assume that the abstract syntax tree complies to the format of the "standard Java abstract syntax tree". (Without such assumption, in order to process a class, you would first need to check if the number of subtree is six, the tag of the number zero subtree is ":modifiers", and so on, which would be a very troublesome.

Plug-ins can extend the syntax of the Java language and add new types of nodes to the abstract syntax tree. However, an over-extended abstract syntax tree format may cause problems when used with other plug-ins. That is, the composability reduces.

Thus, the "extended Java abstract syntax tree", a recommended framework for the extension of the abstract syntax tree comes into play.

Plug-ins that define new nodes by extending the abstract syntax tree should stay within the framework of the "extended Java abstract syntax tree". In turn, plug-ins that process the abstract syntax tree may assume that the abstract syntax tree complies to the "extended Java abstract syntax tree" format when processing.

All plug-ins must translate the "extended Java abstract syntax tree" into the "standard Java abstract tree" before entering the code-emitting pass. The code-emitting pass only assures to translate the "standard Java abstract syntax tree" format tree into character strings. If you pass a tree that does not comply to the format, the code-emitting pass will generate a fatal error or would output code that does not comply to the Java language specification.

Note, however, that strings translated from the "standard Java abstract syntax tree" may not necessarily be a correct Java program. Javac may produce syntax erros or type errors, or even if no errors occur, may interpret not as expected. For example, while the expression (id "a b c") complies to the "standard java abstract syntax tree", and the code-emitting pass will translate it into a character string, a syntax error will occur if you compile the result with javac.

Also note that the framework of the "extended Java abstract syntax tree" is not absolute. For example, there may be times when you would want to use a special abstract syntax tree format even at the expense of composability. This is because, the language that can be expressed within the framework of the "extended Java abstract syntax tree" is very limited. The plug-in implementor may determine the trade-off between the limitation and the composability. (However, the implementor must explicitly document that the composability is low.) Also, since proof of concept will be of greater matter than composability when prototyping plug-ins, deviation from the "extended Java abstract syntax tree" format should be allowed for easier implementation.


Standard/Extended Java Abstract Syntax Tree Format

----

CompilationUnit:
	(compilationUnit
		(packageDeclaration PackageName)
        	(imports ImportDeclaration*)
                (typeDeclarations TypeDeclaration*))


PackageName:
	Name
	<e>


ImportDeclaration:
	(importSingle Name)
	(importOnDemand Name)
	<extended alternatives>

TypeDeclaration:
	Class
	(emptyTypeDeclaration)
	<extended alternatives>


Class:
	(class  (modifiers Modifier*)
		ClassKeyword
		Identifier
		(extends Name*)
		(implements Name*)
		(classBody ClassBodyDeclaration*))


ClassKeyword:
	(id class)
	(id interface)


ClassBodyDeclaration:
	(emptyClassBodyDeclaration)
	VariableDeclaration
	Class
	(staticInitializer Block)
	(instanceInitializer Block)
	(constructor
		(modifiers Modifier*)
		(name (id void))
		Identifier
		(formalParameters FormalParameter*)
		(throws Name*)
		MethodBody)
	(method 
		(modifiers Modifier*)
		Type
		Identifier
		(formalParameters FormalParameter*)
		(throws Name*)
		MethodBody)
	<extended alternatives>


FormalParameter:
	(pdecl (modifiers Modifier*) Type VariableDeclaratorId)


MethodBody:
	(emptyStatement)
	Block
	<extended alternatives>


Block:
	(block Statement*)


Statement:
	VariableDeclaration
	Block
	Class
	(emptyStatement)
	(expressionStatement Expression)
	(labeled Identifier Statement)
	(switch Expression Block)
	(case Expression)
	(default)
	(do Block Expression)
	(break)
	(breakWithLabel Identifier)
	(continue)
	(continueWithLabel Identifier)
	(return)
	(returnWithExp Expression)
	(synchronized Expression Block)
	(throw Expression)
	(try Block (catch FormalParameter Block)*)
	(try Block (catch FormalParameter Block)* (finally Block))
	(if Expression Statement Statement)
	(while Expression Statement)
	(for VariableDeclaration ForArg ForArg Statement)
	(for ForArg ForArg ForArg Statement)
	<extended alternatives>


VariableDeclaration:
	(decl	(modifiers Modifier*)
		Type
		(vardecls VariableDeclarator*))


VariableDeclarator:
	VariableDeclaratorId
	(varInit VariableDeclaratorId VariableInitializer)


VariableDeclaratorId:
	Identifier
	(arrayVar VariableDeclaratorId)
	<extended alternatives>


VariableInitializer:
	Expression
	(arrayInitializer Expression*)
	<extended alternatives>


ForArg:
	(forArg Expression*)


Expression:
	Identifier
	(literalTree LiteralTag String)
	InstanceCreation
	(newEnclosingInstance Expression InstanceCreation)
	(classLiteral Type)
	(this)
	(invokeThisConstructor ArgumentList)
	(invokeSuperConstructor ArgumentList)
	(fieldLong Name Identifier)
	(fieldExp Expression Identifier)
	(fieldStatic Name Identifier)
	(fieldSuper Identifier ArgumentList)
	(invoke1 Identifier ArgumentList)
	(invokeLong Name Identifier ArgumentList)
	(invokeExp Expression Identifier ArgumentList)
	(invokeStatic Name Identifier ArgumentList)
	(invokeSuper Identifier ArgumentList)
	(cast Type Expression)
	(paren Expression)
	(anonymousClass Name ArgumentList (classBody ClassBodyDeclaration*))
	(newArray NewArrayArg)
	(anonymousArray Type (arrayInitializer Expression*))
	(array Expression Expression)
	(postInc Expression)
	(postDec Expression)
	(~ Expression)
	(! Expression)
	(preInc Expression)
	(preDec Expression)
	(unaryPlus Expression)
	(unaryMinus Expression)
	(instanceof Expression Type)
	(* Expression Expression)
	(/ Expression Expression)
	(% Expression Expression)
	(+ Expression Expression)
	(- Expression Expression)
	(<< Expression Expression)
	(>> Expression Expression)
	(>>> Expression Expression)
	(< Expression Expression)
	(> Expression Expression)
	(<= Expression Expression)
	(>= Expression Expression)
	(== Expression Expression)
	(!= Expression Expression)
	(& Expression Expression)
	(^ Expression Expression)
	(| Expression Expression)
	(&& Expression Expression)
	(|| Expression Expression)
	(conditionalExpression Expression Expression Expression)
	(= Expression Expression)
	(*= Expression Expression)
	(/= Expression Expression)
	(%= Expression Expression)
	(+= Expression Expression)
	(-= Expression Expression)
	(<<= Expression Expression)
	(>>= Expression Expression)
	(>>>= Expression Expression)
	(&= Expression Expression)
	(^= Expression Expression)
	(|= Expression Expression)
	<extended alternatives>


LiteralTag:
	long
	char
	float
	double
	string
	<extended alternatives>


InstanceCreation:
	(new Type ArgumentList)


ArgumentList:
	(argumentList Expression*)


NewArrayArg:
	Type
	(dimExpr NewArrayArg Expression)
	<extended alternatives>


Type:
	Name
	(arrayOf Type)
	(innerClassExp Expression Identifier)          // Since epp1.1.0beta7
	(innerClassStatic Type Identifier)             // Since epp1.1.0beta7
	<extended alternatives>


Modifier:
	Identifier
	<extended alternatives>


Name:
	(name Identifier Identifier*)


Identifier:
	(id Symbol)
	(id String)


Symbol: Java language identifier


String: Java lanaguage string literal


NOTE

I plan to provide a library function that automatically checks if an abstract syntax tree complies to the "standard/extended Java abstract symbol tree" format. Furthermore, when in debug-mode of the plug-in, the format checking library function will be called automatically, and if it finds any deviation, it will identify the plug-in and method name that caused the deviation.

Referenece: Generic Format of the Abstrat Syntax Tree

The abstract syntax tree that can be expressed with the Tree class can be defined as follows. The "standard/extended Java abstract syntax tree" format is a subset of this definition.
Tree :
	LiteralTree
	Identifier
	(Symbol Tree*)

LiteralTree :
	(literalTree Symbol String)

Identifier :
	(id Symbol)
	(id String)


Index