The case for AST-based computer language representation and specification

This is the idea:

(Note: AST == Abstract Syntax Tree, the internal representation of the source code which is the output of the parser in a compiler.)

Some advantages:

Some disadvantages:

One of the advantages of making the AST the primary specification of the language is that it is half-way between the user and the tools. So refactoring is easier because a refactoring tool doesn't have to worry about how the refactored code should be formatted. A compiler should be faster because parsing an AST dump is less complex.

Looking at source code from an AST point of view, a project is just an immense tree. Taking Java as an example, the package names form a tree with classes/enums/etc as leaves, and below those appear methods and fields, and below those come type specifications and nested lists of statements and expressions.

So in an abstract form we have this huge tree. We could dump it out in folders and files, similar to how Java source is structured right now -- which is probably the best way to interact with a version control system (git, mercurial, etc).

Alternatively we could load it up into some kind of a database optimised for storing trees. As a database, we could maintain indexes for cross-referencing. This would make it very easy to do rename refactoring -- if every reference to class A is a database pointer to A, then we can rename A with one operation -- and similarly for local variables and all other named entities. Reverse indexes could optimise searches for references to a definition. This is an ideal environment for refactoring and analysis. It could even be used for global optimisation -- make a temporary copy of the database and eliminate dead code or inline code globally, before it passing on to the compiler.

Putting this into practice

As a practical test of some of these ideas, I'm working on a Java editor which edits the AST not the source. The source is loaded up and parsed, then the resulting AST reformatted to the user's preferred syntax for display and editing. The edited AST is then reformatted back into standard syntax before white space is adjusted to minimise changes and it is saved.

Like this I will get a number of advantages:

I'm using Eclipse JDT as a base for now (org.eclipse.jdt.core.dom) to see whether I can make this fly.

The future

I wonder whether we will ever see a language as I envisage? Having worked through all this in my head, it is now frustrating to see languages still being designed in terms of their ASCII representation. It just seems so limiting. We will have to see how far I can develop these ideas myself, and what the future may hold.

-- Peru, 19-Dec-2011

UAZUNextUpPrev These pages and files, including applets and artwork, are Copyright (c) 1997-2019 Jim Peters unless otherwise stated. Please contact me if you'd like to use anything not explicitly released, or if you have something interesting to discuss.