Thursday, September 20, 2007

Some reflections on Eclipse-based language IDEs

A few years ago, when Eclipse's popularity was quickly growing, an interesting phenomenon was going on: a number of small projects were popping up (for a time, at the rate of a dozen or more per week) that implemented some extension to Eclipse, often for supporting some programming language. Some of them died quickly away when the interest of the founders moved on to something else, some of them became obsolete when their main idea was implemented by the Eclipse project itself. But there were some language support plugins that remained active and continued development until today. (EclipseFP is one of them, others include PHPEclipse, RDT (Ruby), Erlide (Erlang), PyDev (Python), EPIC (Perl) and Eclipse.org's CDT project.)

The obvious paradigm for all Eclipse-based IDE's are are the Java Development Tools (JDT) from Eclipse.org. They were a most important factor in making Eclipse popular, and they pioneered many of the cool IDE features that we have come to expect from Java IDEs nowadays. However, keeping up with JDT proved hard for each of the language support projects. One obvious reason for this is of course that the effort that went into the JDT project exceeded by far the means available to the other projects, all of them (with the exception of CDT) being volunteer projects. But there are other reasons, too; I've got a few observations about one of them.


The small step and the big step

For building language support on the basis of the Eclipse Platform, there is a small step and a big step: the small step is to get basic support, the big step is deep support. In order to make the small step, you just have to do a bit Java coding, learn a few Eclipse APIs, and you need some modest understanding of the target language. To make the big step, you have to implement, or integrate someone else's implementation of, deeply language-aware functionality (i.e. functionality aware of the syntax and semantics of the target language).

The equivalent of the small step can be done even with many text editors: they often have a feature where you can enter some syntax rules and get syntax coloring for content formats such as HTML or Java source code. In Eclipse, the small step includes not only syntax coloring, but project types (called 'Project Natures' in Eclipse terminology), integration of external build tools (normally compilers), output parsers to populate the Problems View, import and export wizards, etc.

The big step, on the other hand, includes all the attractive features such as automated refactorings, search for references to language elements ('Find all calls to this function'), navigation to declarations, debugger integration and so on. (A while ago, I've co-authored an article in a German journal that discussed which features belong into the language-sensitive category; my co-author exemplified the main points from the RDT project.)


What the big step involves

What everybody wants, of course, is language support that has taken the big step. But this means that a backend must be built that can analyze the syntactical and semantical structure of source code written in the target language. (And not only source code in the language itself, but also supporting formats, for example - in the Haskell world - Cabal configuration files, GHC package configuration files, etc.) It must understand the concepts behind the structure of the language (e.g. it must know what a module is, and what it means to find the name of a module in the name of a source file, or in a declaration in a Cabal configuration, and so on).

JDT has achieved this. The Java IDE in Eclipse includes a fully standards- compliant Java compiler, and much of the interesting functionality relies on the source code parsing and incremental compilation facilities provided by it. In addition, JDT maintains a model of all language elements in the workspace; it has representations for all methods, types, (Java) packages and so on, both from source files and attached libraries. All this is naturally implemented in Java. (It is also certainly helpful that the JDT project can practice 'Eat your own dogfood', i.e. that Eclipse can be built using Eclipse itself.)

But what are you going to do if your target language isn't the same language as the one Eclipse is implemented in? First of all, there is not really a way around implementing Eclipse plugins at least partially in Java. That's unfavorable: even if you actually want to build all that functionality yourself, chances are that you want to do it in the target language, not in Java. If you don't want to re-implement everything (and that should be the common case), then it's squarely improbable that the existing stuff is written in Java. Haskell is a very good example: there is almost everything you'd like, in various degrees of maturity: source parsers (both for Haskell 98 code and many language extensions), type inference and typchecking engine (GHC API), refactoring (HaRe), API search (Hoogle), a debugger (in GHCi), and lots and lots more. But guess in what language all that is written ... ;-)


Interoperability

There are the usual options for language interoperability: you can run executables written in other languages and communicate via standard input and output, or use language interoperability interfaces (such as Haskell's FFI) in conjunction with Java's native interface, JNI. Some languages can be run in an interpreter; others can be compiled to Java Virtual Machine bytecode. Although this is feasible, it's sometimes ugly and complicated, and it often limits deeper integration (there's only so much information that you can send via console i/o, and in any case if your data is highly structured, you have to put additional effort in serializing and de-serializing it).

Apart from that, Eclipse has its own dynamic module system, which relies on loading code at runtime on demand - most of the plugins that are actually in an installation may not be loaded at all in a particular session. Furthermore, the entire extensibility model in Eclipse relies on the idea that plugins cannot just plug into pre-existing extension points, but may very well (and very easily) provide new extension points themselves, which are available for anyone else to extend. In order to honor these features of Eclipse, there is much more to interoperate with than just the programming language (Java) and its runtime (the JVM). It's the Eclipse platform concept that must be taken into account. And there is no obvious way to do that in any other language than Java. (There has been some experimentation recently with executing scripts in the place of Eclipse extensions, usually scripts written in languages that can be compiled to Java bytecode. If you are interested, look at this Google Summer of Code project and this post about Eclipse plugins in Scala).

Thus taking the big step is not just more difficult for others than JDT for lack of resources. It is inherently more complicated because it adds the burden of interoperating with the platform from a different language and/or runtime. It also doesn't help of course that naturally the group of potential contributors is smaller if they are required to be fluent in two different languages and sets of APIs. (I know from years of experience as a trainer for Eclipse plugin development that it takes some time to learn to find your way around in the set of Eclipse APIs, even for experienced Java programmers - simply because of their substantial size and the number of associated concepts.)


The way out

The lesson to be learned from this is, I think, clearly that a crucial piece in building an Eclipse-based IDE for a language must be to make code written in that language interoperate with the Eclipse Platform - that is, it is necessary to make it possible to write Eclipse plugins in that language. Once this is in place it becomes possible to

  • re-use existing tools written in the target language,

  • find and motivate contributors (i.e. who have an interest and a good knowledge about that language), and finally

  • make the big step :-)

9 comments:

David said...

So, for someone who doesn't know much about Cohatoe, how does it relate to all of this?

Leif Frenzel said...

Cohatoe is my attempt to provide a way to implement Eclipse plugins partly in Haskell. The goal is to be able to plug in Haskell code without having to manage a lot of Eclipse-related stuff on the Haskell side and without having to do unusual things on the Eclipse side.

I.e. on the Eclipse side you declare an extension in the plugin.xml as usual, provide some Java interface and implementation class, and get an API to call your Haskell code. On the Haskell side, you get an entry point function (similar to a main function), where you can start to basically do anything the usual Haskell way.

There are some earlier posts in this blog that describe how it is used :-)

Thanks && ciao,
Leif

David said...

Thanks for the info! I haven't have time to read up on Cotahoe yet, but I will. In your post you talk about extension points provided by plugins. I suppose that isn't possible with Cotahoe yet?

David said...

Sorry for misspelling Cohatoe :-)

Leif Frenzel said...

>In your post you talk about
>extension points provided by
>plugins. I suppose that isn't
>possible with Cotahoe yet?
Well, in one sense, it is: With a solution such as Cohatoe, what is possible is that you can provide ('contribute') Haskell code from any plugin. You just have to declare an extension in the plugin.xml of that plugin. Therefore, your Haskell code could be spread over multiple plugins.

Compare this with an approach where you link all your Haskell code into a single dll and run that via FFI/JNI (which is exactly what we did in an earlier version of EclipseFP, our Haskell IDE). There you can only extend the set of Haskell functions that you use by compiling a new version of that dll. If EclipseFP would have extension points, these would be Java-only, and the possible extensions would be in Java only, unless they would do the same and provide their own dll, thus duplicating all the complicated FFI/JNI bridging.

Now the elegant thing with Cohatoe is that we could provide extension points in EclipseFP (whose plugins are partly in Haskell and partly in Java) and anybody else could write another plugin that connects to these extension points and may again, using Cohatoe, be implemented partly in Haskell, without any extra infrastructure.

So in this sense, Cohatoe takes the modularity in Eclipse's extension/extension point system more seriously.

Of course, this goes always via plugin.xml and thus necessarily via some Java interfaces, there is not yet anything in Cohatoe to have Haskell code from a Cohatoe-plugin 'directly' extend Haskell code in another plugin via an extension point. (Would be an interesting idea for Cohatoe 2.0 or 3.0 :-). What is possible, however, is code re-use between the Haskell code in Cohatoe plugins, if they share a library. E.g. you could have that Haskell library in a GHC package and then you could compile both plugins's Haskell code against that GHC package.

Hope this makes sense ... :-)

Thanks && ciao,
Leif

Leif Frenzel said...

I might perhaps add one more thing:

>I.e. on the Eclipse side you declare
>an extension in the plugin.xml as
>usual, provide some Java interface
>and implementation class, and get an
>API to call your Haskell code.
Most of this stuff on the Java side is just boilerplate and there is already a wizard in the Cohatoe SDK that generates much of this Java code.

David Waern said...

Yes, this makes sense to me and I think it
sounds very good!

Thanks

jervin said...

You know that your comments seem to me to fit what the Groovy Eclipse community and I have been going through in the last year or so. Even with the fact that Groovy is a "Java" language and as such does not incur all of the issues in your blog posting, there is still a jump from the little to the big step. I myself have gained a fair amount of appreciation for what the JDT people did, it isn't easy.

Still there is another issue, one that we have encountered. The varying degree to which Eclipse platform and JDT APIs are made open , reusable and the quality of some of those APIs. I don't know how many times I have had to go fishing for a particular piece of functionality, it seems 9 or 10 layers deep. Some of the code seems to take the CS axiom, "everything can be solved with another layer of indirection" to a whole new level.

And just don't get me started about EMF (in particular WTP's use of it)... ick....

Still Eclipse is the most modular, extensible and most open framework for development today, gotta love it...

Leif Frenzel said...

You're quite right: the more one depends not only on the Eclipse Platform itself, but also on frameworks on top of it (such as JDT, EMF, WTP etc.), the more complicated it gets. They have less API stability, they're sometimes not well documented, they tend to lag behind the Platform when it comes to adoption of the latest features etc. I think JDT is the positive exception sometimes. That's often because new features have originally been developed in JDT and then pushed down, which means that the integration is already done there. There is, after all, a sort of a 'special relationship' between JDT and the Platform, because they have co-evolved.

Still, the situation is much worse with other frameworks outside the Eclipse world, so Eclipse seems still the best horse to bet on :-)

As far as Cohatoe and EclipseFP are concerned, luckily that doesn't affect us much :-) We have only to deal with the Platform. But I presume that your experiences are also shared by people who are working on languages closer integrated with Java (e.g. Scala).

Thanks && ciao,
Leif