Sunday, September 30, 2007

Eclipse's IDE Metatooling project

There is a new initiative at Eclipse.org for supporting projects who want to build Eclipse-based IDEs for new programming languages: the IMP project ('Eclipse IDE Meta-tooling Platform'). It seems it is a continuation of an IBM-internal project called SAFARI (of which we have seen a presentation at the last EclipseCON). This is great news; I think it will make the 'small step' that I have described in an earlier post much more extensive.

The IMP approach

Basically the idea behind the IMP approach is that for any IDE you create, there is a lot of stuff that you have (or someone else has) done already for other IDEs. The language specifics (such as the concrete syntax and semantics) may differ, but when you want to color syntax in the editor, show an Outline view and so on, you find yourself doing basically the same things that are done for any other language IDE. Much of the specifics of a language can be captured in a condensed description such as a grammar - why not derive the common stuff from there?

With IMP, you start with identifying your language and some basic information (e.g. file extension), and then providing a syntactical description, from which the lexer and parser are generated. From this, IMP can already derive some development support (e.g. mark syntax errors in the code editor). Now the rest of your IDE is built up from 'IDE services', which you can add one by one.

These IDE services are an abstraction over the well-known Eclipse APIs that one would have to implement when creating an IDE, e.g. project natures, New Project wizards, Outline page, syntax colorer, Code Folding, Content Assist etc. Since the parser is already in place, such IDE services can be generated in order to make use of the AST delivered by it; the generated code can then be customized to particular needs.

More information in detail can be found in the IMP Users Guide and in the SAFARI presentation from EclipseCON 2007.


The small step and the big step

So much for the small step (that is, what you get for free). What the IMP approach doesn't help with, though, is making the big step, once the small step is done.

Well, perhaps for many languages an extensive small step is all that is needed. An IMP-based IDE will have a lot of nice features. Here's a list from their website:

  • syntax highlighting, outline view population, package explorer-like navigation, content assistance, project natures and builders, error markers
  • refactoring support (not only "Move" and "Rename", but type- and code-related refactorings requiring non-trivial analysis, e.g. "Extract Method" and "Infer Type Arguments")
  • static program analysis (pointer analysis, type analysis, etc.) in support of the above
  • execution and debugging support
For many languages, IDE support of this order will get them quite some way, in particular if it comes mostly for free, or is at any rate easier done than with the current Eclipse APIs :-) Especially experimental languages or languages with a focused field of application (DSLs) will benefit from having such IDE support, and they might have not a real demand for making the big step.

IMP and established languages

But what about established, general-purpose languages? They have been around for a while; there is a community, and there are programming tools. Such languages tend to be self-hosting: given one that is sufficiently general-purpose, as soon as there is some community around it, people will start to write development tools in that language, for that language. Furthermore, its community works in the target language. They're not so much interesting in building their IDE in Java. (Perhaps even less so in hacking around in generated Java files to 'customize' it?) For taking the big step, making the project attractive to a larger community is essential. To avoid duplication of functionality, it is important to integrate existing tools - but these are written in the target language and don't necessarily integrate quite well with Java and Eclipse concepts.

Quite possibly, having gained so much by the small step, it might even become more difficult to get started with the big step. If the big step is not done in Java, it will involve much re-work, and there is a certain discouragement for this in the fact that so much already works ...

As far I can see, the IMP approach doesn't take this into account, and will thus mostly be helpful in areas where languages are restricted to some area of application, or are expected to be experimental and short-lived (or are sufficiently close to Java).

However that may be, this is definitely an interesting project that addresses a real need, and I'll be interested to follow it and see how it develops. (As a side note: there may well be some potential to integrate Cohatoe with IMP, in order to further reduce the amount of code one has to write on the Eclipse/Java side when implementing plugins in Haskell. I'll have a closer look into this sometime in the future and keep you posted.)

Saturday, September 22, 2007

What Cohatoe is not

For understanding what a tool or framework is good for, it is sometimes helpful to eliminate a few potential misunderstandings, i.e. what one might think from a first glace at a description or screenshot, but what is in fact not the case. There is always room for such misunderstandings, since people come from different backgrounds and with different expectations, and since descriptions are never perfect.

So here is what Cohatoe is not:

  • a set of Haskell bindings to the Eclipse APIs

    Cohatoe is not for manipulating a running Eclipse instance (and its UI) from Haskell code, or for 'scripting Eclipse in Haskell'. What we want to do in Haskell is the internal logic, the UI and the general layout of the application are structured by Eclipse concepts.

    In this respect Cohatoe is different from the Haskell-JVM-Bridge (which enables to create and manipulate Java objects in a JVM instance from Haskell code), and from approaches such as EclipseMonkey (where JavaScript code is executed that binds to some Eclipse APIs).

  • support for Haskell-Java language interoperability

    The game is not in particular to make Haskell code talk to Java code, or vice versa. It is rather to integrate logic that is written in Haskell with Eclipse's plugin model in a way that preserves the modularity and extensibility that comes with this plugin model as it is implemented in Java. There will be some more support for transporting data structures between the Haskell side and the Java side, but the interoperability part of Cohatoe is just a generic vehicle (currently simply a list of strings), and the clients of Cohatoe on both sides (Haskell and Java) have currently to do their marshalling and unmarshalling themselves.

    Much more important than the language interoperability aspect is the modularity and extensibility that is so important in the architecture of Eclipse. We want to preserve this modularity and extensibility even when we write or integrate large parts of an application in Haskell.

  • an approach to UI programming in Haskell

    Basically the same as the first point above. The idea is to have the UIs built using Eclipse APIs. Only the logic is in Haskell.

  • a plugin system for Haskell

    This is what hs-plugins is, and Cohatoe uses hs-plugins in order to load (possibly compile) and execute Haskell code.

Thursday, September 20, 2007

Some reflections on Eclipse-based language IDEs

A few years ago, when Eclipse's popularity was quickly growing, an interesting phenomenon was going on: a number of small projects were popping up (for a time, at the rate of a dozen or more per week) that implemented some extension to Eclipse, often for supporting some programming language. Some of them died quickly away when the interest of the founders moved on to something else, some of them became obsolete when their main idea was implemented by the Eclipse project itself. But there were some language support plugins that remained active and continued development until today. (EclipseFP is one of them, others include PHPEclipse, RDT (Ruby), Erlide (Erlang), PyDev (Python), EPIC (Perl) and Eclipse.org's CDT project.)

The obvious paradigm for all Eclipse-based IDE's are are the Java Development Tools (JDT) from Eclipse.org. They were a most important factor in making Eclipse popular, and they pioneered many of the cool IDE features that we have come to expect from Java IDEs nowadays. However, keeping up with JDT proved hard for each of the language support projects. One obvious reason for this is of course that the effort that went into the JDT project exceeded by far the means available to the other projects, all of them (with the exception of CDT) being volunteer projects. But there are other reasons, too; I've got a few observations about one of them.


The small step and the big step

For building language support on the basis of the Eclipse Platform, there is a small step and a big step: the small step is to get basic support, the big step is deep support. In order to make the small step, you just have to do a bit Java coding, learn a few Eclipse APIs, and you need some modest understanding of the target language. To make the big step, you have to implement, or integrate someone else's implementation of, deeply language-aware functionality (i.e. functionality aware of the syntax and semantics of the target language).

The equivalent of the small step can be done even with many text editors: they often have a feature where you can enter some syntax rules and get syntax coloring for content formats such as HTML or Java source code. In Eclipse, the small step includes not only syntax coloring, but project types (called 'Project Natures' in Eclipse terminology), integration of external build tools (normally compilers), output parsers to populate the Problems View, import and export wizards, etc.

The big step, on the other hand, includes all the attractive features such as automated refactorings, search for references to language elements ('Find all calls to this function'), navigation to declarations, debugger integration and so on. (A while ago, I've co-authored an article in a German journal that discussed which features belong into the language-sensitive category; my co-author exemplified the main points from the RDT project.)


What the big step involves

What everybody wants, of course, is language support that has taken the big step. But this means that a backend must be built that can analyze the syntactical and semantical structure of source code written in the target language. (And not only source code in the language itself, but also supporting formats, for example - in the Haskell world - Cabal configuration files, GHC package configuration files, etc.) It must understand the concepts behind the structure of the language (e.g. it must know what a module is, and what it means to find the name of a module in the name of a source file, or in a declaration in a Cabal configuration, and so on).

JDT has achieved this. The Java IDE in Eclipse includes a fully standards- compliant Java compiler, and much of the interesting functionality relies on the source code parsing and incremental compilation facilities provided by it. In addition, JDT maintains a model of all language elements in the workspace; it has representations for all methods, types, (Java) packages and so on, both from source files and attached libraries. All this is naturally implemented in Java. (It is also certainly helpful that the JDT project can practice 'Eat your own dogfood', i.e. that Eclipse can be built using Eclipse itself.)

But what are you going to do if your target language isn't the same language as the one Eclipse is implemented in? First of all, there is not really a way around implementing Eclipse plugins at least partially in Java. That's unfavorable: even if you actually want to build all that functionality yourself, chances are that you want to do it in the target language, not in Java. If you don't want to re-implement everything (and that should be the common case), then it's squarely improbable that the existing stuff is written in Java. Haskell is a very good example: there is almost everything you'd like, in various degrees of maturity: source parsers (both for Haskell 98 code and many language extensions), type inference and typchecking engine (GHC API), refactoring (HaRe), API search (Hoogle), a debugger (in GHCi), and lots and lots more. But guess in what language all that is written ... ;-)


Interoperability

There are the usual options for language interoperability: you can run executables written in other languages and communicate via standard input and output, or use language interoperability interfaces (such as Haskell's FFI) in conjunction with Java's native interface, JNI. Some languages can be run in an interpreter; others can be compiled to Java Virtual Machine bytecode. Although this is feasible, it's sometimes ugly and complicated, and it often limits deeper integration (there's only so much information that you can send via console i/o, and in any case if your data is highly structured, you have to put additional effort in serializing and de-serializing it).

Apart from that, Eclipse has its own dynamic module system, which relies on loading code at runtime on demand - most of the plugins that are actually in an installation may not be loaded at all in a particular session. Furthermore, the entire extensibility model in Eclipse relies on the idea that plugins cannot just plug into pre-existing extension points, but may very well (and very easily) provide new extension points themselves, which are available for anyone else to extend. In order to honor these features of Eclipse, there is much more to interoperate with than just the programming language (Java) and its runtime (the JVM). It's the Eclipse platform concept that must be taken into account. And there is no obvious way to do that in any other language than Java. (There has been some experimentation recently with executing scripts in the place of Eclipse extensions, usually scripts written in languages that can be compiled to Java bytecode. If you are interested, look at this Google Summer of Code project and this post about Eclipse plugins in Scala).

Thus taking the big step is not just more difficult for others than JDT for lack of resources. It is inherently more complicated because it adds the burden of interoperating with the platform from a different language and/or runtime. It also doesn't help of course that naturally the group of potential contributors is smaller if they are required to be fluent in two different languages and sets of APIs. (I know from years of experience as a trainer for Eclipse plugin development that it takes some time to learn to find your way around in the set of Eclipse APIs, even for experienced Java programmers - simply because of their substantial size and the number of associated concepts.)


The way out

The lesson to be learned from this is, I think, clearly that a crucial piece in building an Eclipse-based IDE for a language must be to make code written in that language interoperate with the Eclipse Platform - that is, it is necessary to make it possible to write Eclipse plugins in that language. Once this is in place it becomes possible to

  • re-use existing tools written in the target language,

  • find and motivate contributors (i.e. who have an interest and a good knowledge about that language), and finally

  • make the big step :-)

Tuesday, September 18, 2007

Progress on Cohatoe

Here is another update about the latest additions to the Cohatoe repo :-)

We now manage the library files for the Coahatoe API (the GHC package against which Haskell code for Cohatoe-based Eclipse plugins must be compiled) next to the server executable in platform-specific fragments. They are automatically found, possibly extracted (when the plugin lives in a .jar file), and then provided to the hs-plugins library both when object code is loaded and when source code is compiled, in the form of an 'on the fly'-GHC package. This removes the requirement that Cohatoe users had to have the cohatoe-api package installed in addition to their GHCi installation.

So far, Cohatoe worked only on machines that had a Cohatoe API installed as GHC package. This is clearly not good, partly because it is not a reasonable pre-requisite (the API should only be needed for developing Cohatoe-based plugins, but not for using them), and partly because there might be version conflicts (e.g. if a plugin has been compiled against a later API version than is installed on the user's system).

The situation is now better: a user who has a GHC installation, but no cohatoe-api package installed, will be able to run Cohatoe-based plugins. The Cohatoe server locates the Cohatoe API library that is shipped together with it. The additional advantage is that the server executable will always get the version of the API that itself was compiled against. The shipped version has precedence. (Note that this does not guarantee that the plugins which are loaded are also compiled against that version of the API.) If you want to compile against the Cohatoe API, you still need to install the cohatoe-api package, of course.

I have also finalized the version of the cohatoe-api package to 1.0 (the final release version). There remains a minimal risk that the API must be changed again before the release, in which case some early adopters would possibly have to recompile.

On the other hand, I was having some trouble with object files that I had compiled against versions of the API (not actually different code, the only difference was in version number). So I think for this package I'm switching to a versioning policy that only updates the version number when the content actually changes. This means that even after the release the API will only change in version if there are actual changes to the interface. Version numbers will not be increased in step with the other Cohatoe plugins.

Later versions of Cohatoe will probably need some mechanism to enable running both code that was targeted at the 1.0 version and code that was written for later versions. This means that the old API will probably have to remain available, and the server will have to determine which API to use and to load against. (This will cause some work, but that is definitely not a topic for the 1.0 release, and I'm confident that it can be sorted out.)

Apart from that, I have also continued working on getting the Linux version ready, and I have cleaned up the Haskell code of the server executable a bit. I think I will be able to get the next preview version out soon.

Thursday, September 13, 2007

Cohatoe presentation video online

The video of my presentation at the Haskell in Leipzig 2 meeting is now online :-) You can watch it on Tobias' blog, along with the other talks from the event. (And don't ask me about the Stolperstein ;-)

As already said, the talk is in German; there is a summary that I've posted to the EclipseFP mailing list earlier, and here are the slides.

Thanks to Tobias and everybody who have made this fun possible :-)

Wednesday, September 12, 2007

Cohatoe development progress

Here's a brief status update about the latest few patches that just went into the Darcs repo.

I have fixed a bug that caused Cohatoe to select the wrong platform-specific fragment, and thus the wrong server executable, if there was more than one present. In other words, if you had Cohatoe running with both the Linux and the Windows fragment installed, it crashed under Windows because it tried to run the Linux executable - even though the Linux fragment wasn't activated. Cohatoe can now figure out which fragment is the correct one. (I suspect that has troubled nobody but me so far, but it was nagging me a bit. I wanted to catch up with the Linux version, but I had the fragment project disabled because of this bug. I think I will be able to get a well-working Linux version along with the Windows one for the next preview release. I've already re-activated my old Gentoo box for this :-)

I have also built in an automatism to pass on GHC runtime options to the Cohatoe runtime. If you have some Haskell code contributed via Cohatoe which needs to enable RTS options, you can now specify them on the command line for the Eclipse executable (or better, in the eclipse.ini file), and they are detected by Cohatoe and passed on to the Haskell server. You can specify the +RTS ... -RTS as you would pass them to any GHC-compiled executable. (In the eclipse.ini, take care that you have exactly one option per line, because that's how Eclipse expects them.)

At the moment, options that are not accepted by the GHC RTS break the Cohatoe server (i.e. the server executable doesn't even start at all). So if you are playing around with the options, you better have Cohatoe's tracing enabled to see what actually happens. Without that, all you see is an IllegalStateException that complains about a dead server. I will build in some more sophisticated recovery with more info within one of the next iterations.

Have fun :-)

Wednesday, September 05, 2007

Cohatoe 0.5 preview

Here comes another preview version of Cohatoe. I think it is now time to take a bold step and declare it 'beta' (no longer 'alpha' :-). This means in practice that the features which are in now will pretty much remain the same until version 1.0, but I still expect to have to do some more updates, mainly with fixes and documentation (and perhaps a little more tool support for PDE).

This version is only available from the new download site at EclipseFP! Please make sure that you update your bookmarks :-) Likewise, I have pushed the latest patches only to the new repo location. See this previous post for links and a little more info.

Version 0.5 has the extension wizardry that I have mentioned before, and it is well able to handle running extensions from Haskell source files now. (There were several fixes I had to do for this since my last post.) It is basically the version that I have presented at the Haskell in Leipzig 2 meeting.

Any feedback is appreciated - have fun :-)