Showing posts with label JSON. Show all posts

02 April 2013

Why the PhyloPic Relaunch Took So Long

Or, A Lesson in Development Strategy.

As I announced last week, my website, PhyloPic, has been relaunched with a massive update. One of the key updates is a public API for developers. A lot of people have been looking forward to this, and it was actually almost ready for release last summer. So why didn't I release it?

Failure to Branch

Basal tracheophyte.
Public Domain.

As I was writing up the documentation for the API, I learned of Bootstrap, a CSS/JavaScript framework. I realized that it could solve a lot of the design issues I was having — problems with the site on mobile devices, older browsers, etc.

What I should have done: Created a new development branch for adding Bootstrap while continuing to polish up the API branch. That way, I could have released the API shortly while still being able to work on the design issues in parallel.

What I actually did: Continued working in the same branch, ensuring that I couldn't release the API update until the Bootstrap update was complete.

Having Other Projects

By the end of summer I was mostly done with the revisions, but there was still some cleanup to do. By now some other projects I'm attached to, one with other collaborators, were suffering. So I spent most of my free time in the autumn working on those. (I have a full-time job and a toddler, so that isn't much.)

Homo habilis.
Public Domain.

Becoming Enamored of New Technology

In the autumn, Microsoft release a preview version of TypeScript, and I quickly saw that it was going to be extremely useful. So I rewrote PhyloPic's client-side code — it wasn't too hard and it made further development a lot easier. This caused some delay up-front, but I don't regret it.

Becoming Enamored of the Wrong New Technology

Around this time I also realized that I could finally do away with the last bit of Flash on the website: the Image Submission Tool. HTML5 had become mature enough to do all the image manipulation in the browser itself. I did a lot of research, learning about the Canvas, Typed Arrays, etc. And after a lot of work I actually created an image-processing workflow that work in HTML5-enabled browsers. As a bonus, I got a little standalone project out of it: Pictish.

But there were problems. One is that the best existing JavaScript library for creating PNG files doesn't use Typed Arrays — it uses strings, which means that it is slow for large files. I tried creating my own PNG encoder, or adapting that one, but soon realized it was far too much work. Another problem is that I was no longer supporting older browsers (although this was a trade-off against supporting mobile platforms, so I didn't feel too bad about it).

But there was a much more fundamental danger: doing the image-processing in the client side meant that the API had to trust the client to do it properly. What if some developer used the PhyloPic API to add images to the database but didn't do it right? That could be disastrous.

Octopus bimaculatus.
Public Domain.

I realized I would have to do things the old-fashioned way: on the server. After a bit of research, I identified Image Magick and Inkscape as the best tools. The new methodology was so completely different that I ended up making a lot of database changes, too. Until recently, all files were stored in the database — now they're just stored as flat files. The good news is that this makes load times faster.

Doing Things the "Right" Way

Throughout all this I had been making an effort to "dogfood" my own API, i.e., to use it on the site itself. This has the advantage of making load times faster, since the basic page can be cached and then the data can be loaded in secondarily in a much smaller format. Unfortunately this meant a lot of rewrites for how the pages are rendered.

After a while, the code to generate pages from the data had gotten really complex (mostly involving on-the-fly element generation using jQuery). Around the time I was redoing the Image Submission Page, I realized my whole approach was untenable. I needed a cleaner way to divorce presentation logic from control logic.

I ended up using Knockout for the entire site. It made things a lot more manageable.

In Summary

The biggest problem was my branching model, or, rather, my lack of one. Solitary developers often fall into this trap: we think that, since we're doing all the work, there's no need to have more than a single branch of development. At work, we've been using this model and found it very successful. Going forward, I plan to do this on PhyloPic as well. No more massive updates where everything is different. Just incremental features and fixes.

15 February 2013

JSEN: JavaScript Expression Notation

That idea I was talking about yesterday? Storing mathematical expressions as JSON? I went ahead and made it as a TypeScript project and released it on GitHub:

JavaScript Expression Notation (JSEN)

Still need to complete the unit test coverage and add a couple more features. I made a change from my original post to the syntax for namespace references. (The reason? I realized I needed to be able to use "*" as a local identifier for multiplication.) ~~They work within Namespace declaration blocks, but I need to make them work at the higher level of Namespaces declaration blocks as well.~~ (Done.) ~~I also want to allow functions to be used as namespaces.~~ (Done.)

This is possible right now:

jsen.decl('my-fake-namespace', {
   'js': 'http://ecma-international.org/ecma-262/5.1',

   'x': 10,
   'y': ['js:Array', 1, 2, 3],
   'z': ['js:[]', 'y', 1]
});

jsen.eval('my-fake-namespace', 'x'); // 10
jsen.eval('my-fake-namespace', 'y'); // [1, 2, 3]
jsen.eval('my-fake-namespace', 'z'); // 2

jsen.expr('my-fake-namespace', 'x'); // 10 // Deprecated
jsen.expr('my-fake-namespace', 'y'); // Deprecated
    // ["http://ecma-international.org/ecma-262/5.1:Array", 1, 2, 3]
jsen.expr('my-fake-namespace', 'z'); // Deprecated
    // ["http://ecma-international.org/ecma-262/5.1:[]", "y", 1]

Eventually something like this will be possible as well:

Mathematical expressions as JSON (and phyloreferencing)

For Names on Nodes I did a lot of work with MathML (specifically MathML-Content), an application of XML for representing mathematical concepts. But now, as XML wanes and JSON waxes, I've started to look at ideas for porting Names on Nodes concepts over to JSON.

I've been drawing up a very basic and extensible way to interpret JSON mathematically. Each of the core JSON values translates like so:

Null, Boolean, and Number values are interpreted as themselves.
Strings are interpreted as qualified identifiers (if they include ":") or local identifiers (otherwise).
Arrays are interpreted as the application of an operation, where the first element is a string identifying the operation and the remaining elements are arguments.
Objects are interpreted either as:

a set of declarations, where each key is a [local] identifier and each value is an evaluable JSON expression (see above), or
a namespace, where each key is a URI and each value is a series of declarations (see previous).

Examples

Here's a simple object declaring some mathematical constants (approximately):

{
    "e": 2.718281828459045,
    "pi": 3.141592653589793
}

Supposing we had declared some operations (only possible in JavaScript, since JSON doesn't have functions) equivalent to those of MathML (whose namespace URI is "http://www.w3.org/1998/Math/MathML"), we could do this:

{
    "x":

        ["http://www.w3.org/1998/Math/MathML:plus",

1,

        ],
    "y":

        ["http://www.w3.org/1998/Math/MathML:sin",

            ["http://www.w3.org/1998/Math/MathML:divide",

                "http://www.w3.org/1998/Math/MathML:pi",

]
}

Once evaluated, x would be 3 and y would be 1 (or close to it, given that this is floating-point math).

Now for the interesting stuff. Suppose we had declared Names on Nodes operations and some taxa using LSIDs:

{
    "Homo sapiens": "urn:lsid:ubio.org:namebank:109086",
    "Ornithorhynchus anatinus": "urn:lsid:ubio.org:namebank:7094675",
    "Mammalia":

        ["http://namesonnodes.org/ns/math/2013:clade",

            ["http://www.w3.org/1998/Math/MathML:union",

                "Homo sapiens",

                "Ornithorhynchus anatinus"

Voilá, a phylogenetic definition of Mammalia in JSON!

I think this could be pretty useful. My one issue is the repetition of long URIs. It would be nice to have a mechanism to import them using shorter handles. Maybe something like this?

{
    "mathml":   "http://www.w3.org/1998/Math/MathML:*",
    "namebank": "urn:lsid:ubio.org:namebank:*",
    "NoN":      "http://namesonnodes.org/ns/math/2013:*",

    "Mammalia":

        ["NoN:clade",

            ["mathml:union",

                "namebank:109086",

                "namebank:7094675"

]
}

Something to ponder. Another thing to ponder: what should I call this? MathON? MaSON?

28 January 2013

Using TypeScript to Define JSON Data

JSON has gradually been wearing away at XML's position as the primary format for data communication on the Web. In some ways, that's a good thing: JSON is much more compact and readable. In other ways, it's not so great: JSON lacks some of XML's features.

One of these features is document type definitions. For XML, there are a variety of formats (DTD, XML Schema, RELAX NG, etc.) for specifying exactly what your XML data looks like: what are the tag names, possible attributes, etc. JSON is a lot more loosey-goosey here.

Okay, that's not entirely true: there is JSON Schema. I've never known anyone to use it, but it's there. It's awfully verbose, though. (So are the definitional formats for XML, but it's XML — you expect it!)

I was thinking about this the other day, and I realized that there is actually a great definitional format for JSON already in existence: TypeScript! If you haven't heard of it, TypeScript is a superset of JavaScript which introduces optional strict typing. And since JSON is a subset of JavaScript, TypeScript is applicable to JSON as well.

One of the great features of TypeScript is that interface implementation is implicit. In Java or ActionScript, you have to specifically say that a type "implements MyInterface". In TypeScript, if it fits, it fits. For example:

interface List

{

length: number;

}

function isEmpty(list: List): bool

{

return list.length === 0;

}

console.log(isEmpty("")); // true

console.log(isEmpty("foo")); // false

console.log(isEmpty({ length: 0 })); // true
console.log(isEmpty({ length: 3 })); // false
console.log(isEmpty({ size: 1})); // Compiler error!

(Note: for some reason that I can't fathom, isEmpty() doesn't work on arrays. Well, TypeScript is still in development — version 0.8.2 right now. Update: I filed this as a bug.)

Note that you can use interfaces even on plain objects. So of course you can use it to describe a JSON format. Here's an example from a project I hope to release before too long:

interface Model

{

uid: string;

}

interface Name extends Model

{

citationStart?: number;

html?: string;

namebankID?: string;

root?: bool;

string?: string;

type?: string;

uri?: string;

votes?: number;

}

interface Taxon

{

canonicalName?: Name;

illustrated?: bool;

names?: Name[];

}

Now, for example, I can declare that an API search method will return data as an array of Taxon objects (Taxon[]). And look how compact and readable it is!

Note that there is one drawback here: there is no way to enforce this at run-time. JSON Schema might be a better choice if that's what you need. But for compile-time checking and documentation, it's a pretty great tool.

A Three-Pound Monkey Brain