Showing posts with label systematics. Show all posts

18 October 2013

The PhyloCode Has a Deadline

As most of you probably know, the PhyloCode (more verbosely, the International Code of Phylogenetic Nomenclature) is a proposed nomenclatural code, intended as an alternative to the rank-based codes. It was first drafted in April 2000, and at that time the starting date was given as "1 January 200n". On this date the code would be enacted and published along with a companion volume, which would provide the first definitions under the code, establishing best practices and defining the most commonly-used clade names across all fields of biology.

Well, the '00s (the zeroes? the aughts?) came and went without the code being enacted. The hold-up was not the code itself, which has been at least close to its final form since 2007. (The last revision, in January 2010, was minor.) And it hasn't been the software for the registration database, which has been completed. The hold-up was the companion volume, which turned out to be a much more daunting project than expected. (And considering that the zoological code took 66 years to go from being proposed to being published, perhaps the initial estimate should have been hedged, anyway.)

At the 2008 meeting of the International Society for Phylogenetic Nomenclature (ISPN), this problem was discussed. It was decided that the companion volume should be narrowed in scope. Instead of waiting to get definitions for commonly-used clade names across all fields of biology (many of which did not even have willing authors), entries would be limited to those already in progress. Later on, a revision was also made to the editorial process to help speed things up.

Now for some news: at the website for the ISPN (recently revamped by yrs trly), there is a new progress report for Phylonyms, the companion volume to the PhyloCode. There will be at most 268 entries. Currently 186 of those (over two thirds) have already been accepted. The rest are at various stages of review. But perhaps most excitingly, there is a deadline:

The contract with University of California Press calls for the manuscript to be submitted by September 1, 2014.

Yes, folks, we will see the PhyloCode enacted in our lifetime! (Pending nuclear holocaust or alien invasion.)

15 February 2013

JSEN: JavaScript Expression Notation

That idea I was talking about yesterday? Storing mathematical expressions as JSON? I went ahead and made it as a TypeScript project and released it on GitHub:

JavaScript Expression Notation (JSEN)

Still need to complete the unit test coverage and add a couple more features. I made a change from my original post to the syntax for namespace references. (The reason? I realized I needed to be able to use "*" as a local identifier for multiplication.) ~~They work within Namespace declaration blocks, but I need to make them work at the higher level of Namespaces declaration blocks as well.~~ (Done.) ~~I also want to allow functions to be used as namespaces.~~ (Done.)

This is possible right now:

jsen.decl('my-fake-namespace', {
   'js': 'http://ecma-international.org/ecma-262/5.1',

   'x': 10,
   'y': ['js:Array', 1, 2, 3],
   'z': ['js:[]', 'y', 1]
});

jsen.eval('my-fake-namespace', 'x'); // 10
jsen.eval('my-fake-namespace', 'y'); // [1, 2, 3]
jsen.eval('my-fake-namespace', 'z'); // 2

jsen.expr('my-fake-namespace', 'x'); // 10 // Deprecated
jsen.expr('my-fake-namespace', 'y'); // Deprecated
    // ["http://ecma-international.org/ecma-262/5.1:Array", 1, 2, 3]
jsen.expr('my-fake-namespace', 'z'); // Deprecated
    // ["http://ecma-international.org/ecma-262/5.1:[]", "y", 1]

Eventually something like this will be possible as well:

All Known Great Ape Individuals (Messinian to Present)

Happy 2013, everyone!

Recently I announced a code package I was working on, called Haeckel, for generating vector-based charts related to evolutionary biology. Here's an image I've created using it:

Known Great Ape Individuals

This chart represents all known hominid individuals (Hominidae = great apes, including humans and stem-humans) from the Messinian to the present, erring on the conservative side when the material is too poor to determine the exact number.

If you've been following this blog for a few years you may remember an earlier version of this. I've done a lot of refinement to the data since then. The earlier versions were dissatisfying to me because the horizontal axis was essentially arbitrary. For this version I used matrices from a phylogenetic analysis (Strait and Grine 2004, Table 3 and Appendix C) of craniodental characters to generate a distance matrix, and then inferred positions for other taxa based on phylogenetic proximity and containing clade. This is similar to the metric I used in this chart, except that it incorporates Appendix C, uses inference, and averages distance from humans against distance from [Bornean] orangutans. Don't be mistaken — this is still arbitrary. But it's a bit closer to something real.

Stray notes:

I'm pretty sure there are Pliocene stem-orangutans somewhere, right? Might have some work left to do on that data.
The dot with no taxon above "Australopithecus" is an indeterminate stem-human from Laetoli. It should probably go further left.
The Ardipithecus bubble includes the poorly-known "Australopithecus" praegens. (Although in some runs it moves outside — there's a random element to the plotting.)
The Holocene is barely visible up at the top. What a worthless epoch.
Homo floresiensis (hobbits) are far to the left of Homo sapiens because I placed them outside Clade(Homo erectus ∪ Homo sapiens).
You may recall Lufengpithecus? wushanensis as "Wushan Man", as it was originally placed in Homo erectus. (Hey, it's just teeth.)
A couple of fossil chimpanzees, lots of fossil orangutans, but no fossil gorillas. :(

(Unless you count Chororapithecus, but that's pre-Messinian. Very pre-Messinian. Suspiciously pre-Messinian....)

Look at all that overlap between Homo, Paranthropus, and Australopithecus!

I have a feeling, though, that if I added another dimension, Paranthropus and Homo would jut out in opposite directions.
Reclassifying Australopithecus sediba as Homo sediba would also decrease the overlap. (Although its position is inferred — actually scoring it might do the same thing.)
It's frustrating that the type species of Australopithecus and Paranthropus are also just about the most similar species across the two genera.

Kenyanthropus and Praeanthropus have been provisionally sunk into Australopithecus.
Should we just sink Orrorin and Sahelanthropus into Ardipithecus? Why not?
My guess is that if I added postcranial characters, the stem-humans would all shift right (humanward). Oh, for a good matrix of postcranial characters....

Update
Oh yeah, and if you want a peek at the data, go here.

02 April 2012

An Idea for the EOL Phylogenetic Tree Challenge

Earlier this year, the Encyclopedia of Life announced the EOL Phylogenetic Tree Challenge. The goal: to produce "a very large, phylogenetically-organized set of scientific names suitable for ingestion into the Encyclopedia of Life as an alternate browsing hierarchy". The prize: an all-expenses-paid trip to iEvoBio 2012 in Ottawa!

This interested me greatly, because:

It's exactly the sort of thing I'm working on for PhyloPic.
I can't really justify paying for a trip to iEvoBio this year. (Phyloinformatics is my hobby, not my profession!)

After reading Rod Page's thoughts on the challenge, I came up with a basic idea, and started to implement it. Unfortunately, now that we're two weeks from the deadline, I'm realizing that:

I do not have the time to complete it.
Even if it were paid for, I can't justify a trip on my own out of town right now.

Why not? Simply put, this.

So, instead, I'm going to outline the general approach I was going to take, and if someone else wants to run with it, knock yourself out. (Just give me partial credit.)

What Is Phylogenetic Nomenclature?

Sometimes when discussing the PhyloCode, I get the feeling a lot of potentially interested parties don't understand what phylogenetic nomenclature actually is. I have gone into excruciating detail on this topic elsewhere, but who wants to be excruciated? So here's a brief summary of the process of creating a phylogenetic taxonomy.

1. Declare Operational Taxonomic Units

Result: Alpha Taxonomy

The very first step is to decide what your units are. Are you dealing with individual organisms? Populations? Species? Which ones? Whatever you select, there should be an unambiguous way of referring to these taxonomic units (specimen numbers, species names, etc.).

Phylogenetic nomenclature is flexible as to how you determine and name taxonomic units. (Although the names must be relateable to those used in definitions [see Step 3].)

Example: My operational taxonomic units are the whale species Aetiocetus cotylalveus, Balaena mysticetus, Balaenoptera physalus, Delphinus delphis, and Monodon monoceros.

Operational Taxonomic Units
Silhouettes by Chris huh and T. Michael Keesey, taken from PhyloPic.
Image license: CC-BY-SA 3.0

Human Clades: A Look at a Complex Phylogeny

Most methods of phylogenetic analysis deal with simple trees. In these phylogenies, every taxonomic unit has a single direct ancestor (or "parent"). But we know that phylogeny is often more complex than this. Our own species is an excellent example—while we are all primarily descended from one population in Africa, different peoples around the globe have inherited smaller percentages of ancestry from preexisting populations.

A new study by Reich & al. looks in some detail at peoples who have inherited DNA from the Denisovans, a fossil group known from Siberia. Ancient DNA has been retrieved from these fossils, although unfortunately the fossils are otherwise too scant to tell us much about what Denisovans looked like (other than "humanlike").

Reich & al. posit a complex phylogeny wherein populations are often descended from multiple ancestral populations. Lets take a look at the clades posited in this study.

Operational Taxonomic Units

Reich & al. used the following nine populations, seven extant and two extinct, as operational taxonomic units.

Yoruba.—An ethnicity from West Africa (Nigeria, Benin, Ghana, etc.)
(Photo by Marc Trip.)

Han.—The most populous Chinese ethnicity.
(Photo by Brian Yap.)

Mamanwa.—One of the "Lumad" ("indigenous") ethnicities of the southern Philippines.
(Photo by Richard Parker.)

Jehai.—One of the Orang Asli ("original people") groups of Malaysia.
Note: this photo is of a woman from a different Orang Asli tribe, the Batik.
(Photo by Wazari Wazir.)

Onge.—A group of Andaman Islanders, from the Bay of Bengal.
(Photo from The Andamanese, by George Weber.)

Australians.—The indigenous ("aboriginal") peoples of Australia.
(Photo by Rusty Stewart.)

Papuans.—The indigenous peoples of the New Guinean highlands.
(Photo owned by the Center for International Forestry Research.)

Neandertals.—An extinct group of robust near-human peoples from West Eurasia.
(Photo by myself, of a sculpture by John Gurche.)

Denisovans.—An extinct group of near-human peoples known from Siberia but thought to have had a wider range.
Note: The photo is of a sculpture of Homo heidelbergensis, thought to be the common ancestor of humans, Neandertals, and Denisovans. Denisovans may not have looked exactly like this.
(Photo by myself, of a sculpture by John Gurche.)

Phylogeny

Reich & al. postulated the simplest phylogeny that could possibly explain their data. (Note that the actuality is likely more complex than this, but it's a good starting point.) More recent groups are to the right, and the thickness of the lines indicates the percentage of DNA contributed from population to population.

My diagram, not theirs. Any inaccuracies are my own.
Free for reuse under Public Domain.

I've added a line for the Denisovans' mitochondrial (motherline) ancestor, even though it's not part of the paper's phylogeny. More on that as we start looking through the various clades.

For looking at the clades I'll use a different diagram that does not reflect percentage of ancestry, but simply shows direct descent as unweighted arcs connecting parent and child taxonomic units.

Phylogeny of human and near-human populations according to Reich & al. 2011.
Created using Names on Nodes.
Free for reuse under Public Domain.

Characters that Support the Great Ape Clades

Anyone familiar with the current state of great ape phylogeny knows that the following structure is well-supported:

Great apes are a clade.

Orangutans are a subclade of great apes.
African great apes are a subclade of great apes.

Gorillas are a subclade of African great apes.
Mangani are a subclade of African great apes.

Humans are a subclade of mangani.
Chimpanzees are a subclade of mangani.

And most such people probably know that the primary evidence for this structure is molecular. But there has to be morphological data to back this up, right?

Great Apes

I've been trying to hunt down such morphological data, but it's been a bit hard. There really aren't that many morphology-based cladistic studies of primates, and the few that exist either exclude humans or focus on stem-humans more than living great apes.

An example of a study that looks at a wide array of fossil and living primates, but fails to include humans:

Rossie & Seiffert (2006). Continental paleobiogeography as phylogenetic evidence. Pages 469–522 in Lehman & Fleagle (eds.) Primate Biogeography: Progress and Prospects. Springer, New York. 546pp. isbn:0387298711

An example of a study that includes some living great apes, but focuses on stem-humans:

Strait & al. (1997). A reappraisal of early hominid phylogeny. Journal of Human Evolution 32(1):17–82. pmid:9034954

I've compiled some shared character lists from these:

Introducing PhyloPic: An Open Database of Reusable Silhouettes

Ever had this problem? "Boy, I could sure use a silhouette of [some kind of organism] for this diagram I'm working on. But I can't find anything on the web! Well, except for a few images which are copyrighted...."

What if there were a website with an open database of reusable images, available under Creative Commons licenses? What if you could do phylogenetic searches, so that, even if there weren't a silhouette for the taxon in question, you could at least find something close? What if you could build images like this...

Evolution of the Aardvark

...without having to look all over the web for figures?

Well, now you can! I've launched a new site called:

PHYLOPIC

It's currently in public alpha, which means it's not quite done. So, I have some caveats:

I'm pulling most taxonomic data from uBio. It's great because it's really comprehensive. But it's also a huge mess because it stores multiple classifications, many of which are outdated and disagree with each other. (This isn't uBio's fault, as its goal is to store all these classifications, not to offer one nice, neat classification.) So you may (will) find some errata in the phylogenetic system. I'm working on cleaning it up, but there are a lot of taxonomic names out there....
It's still early on, so there are only about a hundred images in the database. It will grow over time, but don't be surprised if the closest image it has for your favorite invertebrate is some kind of indiscriminate worm.
There are some known bugs (and I don't mean Hemiptera). The Issues Page is open to all, though, so you can read the known issues and report new ones. (Please do!)

It's a work in progress, but I think it has enormous potential. And I think it's reached a state where it's ready for public use and feedback. So have a look, see what you think, and let me know! (And, if you're artistically inclined, please consider submitting some silhouettes of your own.)

17 August 2010

An Example of Why We Need the PhyloCode

I just ran Radish on Zea mays (maize, a.k.a. corn). Look what the combined taxonomies from uBio look like:

What a mess! Keep in mind that the multiplicity of paths is not due to differing phylogenies (they all seem to agree on that), but to differing nomenclature. Even if uBio were to add some of the more obvious synonymies (e.g., Embryophyta and "Embryophytes"), it'd still be pretty wild.

Eventually I plan to have Radish work with automatically-generated taxonomies, made by placing PhyloCode names (from RegNum) onto TreeBase phylogenies using Names on Nodes algorithms, but until then I guess this is the best option.

And Zea mays is just a particularly egregious example. In contrast, here's a nice, neat "radish" for Scarabaeus sacer:

Apart from the one errant use of Animalia, pretty nice!

27 May 2010

Upcoming Names on Nodes Presentation

I'll also be presenting Names on Nodes at iEvoBio, at the Software Bazaar on June 29. Here's the abstract:

Names on Nodes: Automating the Application of Taxonomic Names within a Phylogenetic Context
Names on Nodes¹ is an open-source² Flex application which utilizes a mathematical approach to automate the application of phylogenetically-defined names to phylogenetic hypotheses. Phylogenetic hypotheses are modeled as directed, acyclic graphs, and may be read from bioinformatics or graph files (Nexus, NexML, Newick, and GraphML) or created de novo. Hypotheses may also be merged from multiple sources. Names on Nodes stores hypotheses as MathML, an XML-based language for representing mathematical content and presentation. Phylogenetic definitions may be constructed using a visual editor and exported in MathML. Thus, it is possible to create a dictionary of defined names and automatically apply them to phylogenetic hypotheses. In the current version of the application, such dictionaries exist only as MathML files, but in future versions definitions may also be loaded from databases (e.g., RegNum).
Additional functionality in Names on Nodes includes the ability to coarsen a phylogenetic graph (thereby simplifying it while still reflecting the overall structure) or to export it as an image file (raster or vector, potentially with semantic annotations).
Source code available at: http://bitbucket.org/keesey/namesonnodes-sa/

MIT license

I have my work cut out for me....

09 April 2010

Biota: Another Example of Coarse vs. Fine Phylogenies

Yesterday I posted an example of how graph-coarsening algorithms can be used to make the high-level patterns of a phylogeny more immediately visible. That example used a phylogenetic hypothesis about placental mammals. The hypothesis involves a lot of nodes (i.e., taxonomic units), but not much branching complexity. By which I mean each node has only a single parent.

So here's an example where nodes may have multiple parents. This is a phylogeny of Biota, i.e., Life:

Eukaryotes (organisms with cellular nuclei, i.e., plants [Embryophyta, etc.], animals [Metazoa], fungi [Eumycota], and "protists") have been highlighted in yellow. Some nodes have multiple parents due to one of two phenomena:

Lateral transfer. Many organisms (especially bacteria) are capable of acquiring genetic material from unrelated organisms.
Endosymbiosis. Some organisms have evolved into organelles within the cells of other organisms, notably mitochondria (descended from proteobacteria related to those that cause rickets) and plastids (photosynthesizing organelles in plants, descended from cyanobacteria). In these cases, the organelle often retains its own DNA, although much of it may have leapt over to the "host's" nuclear DNA. In some cases, all of it may have leapt over (as with mitochondria descendants like mitosomes).

Both lateral transfer and endosymbiosis are considered valid forms of descent in this hypothesis.

Here's the graph coarsened one step:

We can see the general patterns more clearly here. Eukaryotes share a relationship with archaeans, but also have descent from proteobacteria (via mitochondria). One clade of eukaryotes (Plastida) is also descended from a basal form of cyanobacteria (via plastids). A few cases of lateral transfer are visible, but not in detail. We can also see there there is a lot of bacterial diversity, although the details are not spelled out.

Here's the graph coarsened another step:

The endosymbiosis is made even clearer, although most other relationships are obscured.

Disclaimer: This hypothesis was cobbled together from a number of sources and does not represent any rigorous research on my part. I suspect parts of it are outdated, but this area of the Tree of Life is not my bailiwick. I just wanted to throw something together for a demonstration.

01 March 2010

The Great PhyloCode Land Run

Sometime in the near future, the PhyloCode will be enacted. For this to happen, two things need to happen concurrently:

1. The registration database (called "RegNum") must be completed and opened to the public. This is necessary because the PhyloCode requires all names to be registered electronically.

2. Phylonyms: a Companion to the PhyloCode must be published. This is a multi-authored volume that will include the earliest definitions under the PhyloCode.

Which names will be defined in Phylonyms? The original goal was to cover the most historically important names (what Alain Dubois calls "sozonyms"). However, proponents of phylogenetic nomenclature tend to be clustered in several fields (most notably vascular plant botany and vertebrate zoology—note that the code's authorship reflects this). This means certain parts of the Tree of Life (e.g., entomology) will unfortunately be underrepresented, due to lack of interest in those fields. (The alternative, having non-specialists define such names in Phylonyms, does not bear consideration.) So Phylonyms will be less about providing coverage and more about providing sturdy, well-reasoned definitions that can serve as examples.

What about all the names that it omits? What will happen to those once the PhyloCode is enacted? That will be interesting to see.

One thing I could envision is a sort of "land run". I picture it working this way. Let's consider a field, say, anthropology, where phylogenetic nomenclature has not taken much of a hold. Currently there is debate about how to use some taxonomic names related to the field. Some workers like to use the familial name "Hominidae" to refer to a large taxon, including humans and great apes. Others prefer to restrict it to the human total clade (i.e., humans and everything closer to them than to other extant taxa). Similarly, some workers use the generic name "Homo" in a broad sense to include short, small-brained species like Homo habilis, while others prefer to restrict it to the tall, large-brained clade (relegating H. habilis to another genus, e.g., Australopithecus).

Let's say there's a researcher out there named Dr. Statler, who prefers a strict usage for "Hominidae" and a broad use for "Homo". But his colleague, Dr. Waldorf, prefers a broad usage for "Hominidae". Dr. Waldorf isn't really that interested in phylogenetic nomenclature, but when he notes that "Hominidae" is not in the registration database, he sees an opportunity. He writes a quick paper defining "Hominidae" as a node-based clade: "The clade originating with the last common ancestor of humans (Homo sapiens Linnaeus 1758), Bornean orangutans (Pongo pygmaeus Linnaeus 1760), common chimpanzees (Pan troglodytes Oken 1816, originally Simia troglodytes Blumenbach 1775), and western gorillas (Gorilla gorilla Geoffroy 1852, originally Troglodytes gorilla Savage 1847)."

Dr. Statler is, of course, outraged. Not that he cares that much about phylogenetic nomenclature, but what if anthropologists do start using it? What if someone ruins another taxonomic name? His colleagues Drs. Honeydew and Beaker prefer a strict definition of "Homo"—what if they author a paper cementing that definition under the PhyloCode?

This cannot come to pass! Dr. Statler does some reading on the code and decides that a branch-based definition would work nicely for his broader usage. He defines "Homo" as, "The clade consisting of Homo sapiens Linnaeus 1758 and all organisms that share a more recent common ancestor with H. sapiens than with Australopithecus africanus Dart 1925, Paranthropus robustus Broom 1938, Zinjanthropus boisei Leakey 1959, or Australopithecus afarensis Johanson & White 1978." This sets off another anthropologist, and soon all sorts of anthropological/primatological names are being defined under the PhyloCode, as workers struggle to assert their usages.

This is not an ideal situation. It would be much nicer if a group of anthropologists were to come together, discuss the matters rationally, and arrive at an agreement which they then publish together. But it's still not a horrible situation—at least people are defining phylogenetic names and at least interest in phylogenetic nomenclature is being spread. I can't predict the future, but I feel like this sort of "land run" is bound to occur at least in some fields—and maybe that's okay.

04 January 2010

The Mangani Holotypes, Entry I: Carl Linnaeus (Uppsala domkyrka)

I, Human

Humans are an egotistic species. Ancient writers considered humans to be created in the image of the gods, destined to rule all other entities. We humans have not one, but two major fields of study devoted to ourselves and named accordingly (anthropology and the humanities). Pick up a book at random and its main topic is likely to be humans (or at least anthropomorphized non-humans).

Yet we are also an outward-looking species. Alone among the life forms of Earth, we regard the skies, the deep, the land. We observe what is, fashion tests to determine how it came to be, and speculate on where things are going. We are self-centered, but our curiosity about things other than ourselves is boundless.

One of the best examples of this apparent paradox lies with systematics, the naming and organizing of life. And no one person illustrates it better than the founder of systematics, Swedish botanist Carl Linnaeus.

Out of Chaos, Order

Carl Linnaeus lived during the 18th century, a time when science, in its modern meaning, was still emerging from what had been called "natural philosophy". The term "biology" had not even been invented yet. Microbes and cells had been discovered, but things like evolution, germ theory, genetics, biochemistry, etc. were a long way off. The study of life was largely a chaotic mess.

Carl Linnaeus as a young adventurer, dressed in Sámi clothing, painted by Martin Hoffman.

Enter the organizer: Linnaeus observed natural entities and saw order, not chaos. He began to arrange animals, plants, and minerals into hierarchical groups, first in his notes, then in pamphlets, and finally as a series of volumes, Systema Naturae. He was not the only naturalist of his time to do this, but he went further than most, and enjoyed more success. Unlike many scholars, his brilliance was recognized in his own time.

Perhaps nobody recognized it more than Linnaeus himself. True to his species, he had a healthy ego. "Deus creavit, Linnaeus disposuit," he was fond of saying: God created, Linnaeus organized. He thought enough of himself to slave over his autobiography almost as much as his systematic work. And he thought enough of his species to give it the name Homo sapiens—"wise human"—and place it in an order called Primates—"primary ones".

But religious leaders of the day took a different view of Primates. To them, the idea that humans could possibly be grouped alongside such lowly creations as lemurs, apes, and monkeys (and bats, originally included in Primates but long since removed) was sacrilege. (Compounding this, "primate" is a religious title as well.) The Roman Catholic Papa Clement XIII banned Linnaeus's books outright in 1758 (although in 1774 Papa Clement XIV actually fired his Professor of Botany for deficient knowledge of Linnaeus's system!) (Soulsby 1993:39). Even Linnaeus's own religious leader, the Lutheran Bishop of Uppsala, considered him impious (Aczel 2007), although this was no bar to Linnaeus being ennobled later on, whereafter he was known as Carl von Linné.

Carl von Linné in 1775, painted by Alexander Roslin

Privately, Linnaeus confessed that he would have liked to go even further in arranging humans with other members of Order Primates. He saw no anatomical reason not to include apes, monkeys, and humans in the same genus (which was a much broader category as he used it than as we use it today), let alone the same order. The only reason he did not name us Simia sapiens was because he feared theological backlash. (Linnaeus 1747)

So here we have a man who saw his species as "wise" and "primary", but recognized that it did not stand apart from other species. Subsequent biological research has upheld our connection to other living things. Ethologists have found that other species use tools, communicate vocally, and even domesticate other life forms. Geneticists have discovered that our DNA is little different from that of a chimpanzee. Paleontologists have found series of extinct species showing that we evolved from ancestors that we share with other animals. Phylogenetically, his inclination was correct—we are one of many kinds of monkey.

Today we struggle to find things that make humans unique. There are still a few—for one thing, no other terrestrial species has attempted to catalogue its fellow life forms. Ironically, this effort, which brings us into the fold with other life forms, also sets us apart.

Naming the Animals

"And Yahweh [of the El Gods] sculpted from the ground every living thing of the field and every flier of the sky-waters. And he brought the Human in to see how he would call them. And whatever the Human called it, that was that living animal's name. And the Human called names to all the beasts, to the fliers of the sky-waters, and to every living thing of the field."

—Anonymous Yahudi, Bereshith 2:19–20a (my translation)

Modern zoological nomenclature, as governed by the International Commission on Zoological Nomenclature according to the International Code of Zoological Nomenclature (ICZN), descends directly from Linnaeus's Systema Naturae. Many of his groupings seem quaint or even laughable today, but, on the other hand, many don't, and a large number of the names he coined are still in use (albeit often for somewhat different groups). The tenth edition of Systema Naturae, published in 1758, is considered one of the founding works of zoological nomenclature (along with Carl Alexander Clerck's lesser-known 1757 work, Aranei Suecici ["Swedish Spiders"]). By the ICZN's rules, these are the earliest works to contain valid zoological names.

The ICZN's way of doing things is a bit different from that of Linnaeus and other early systematists. In some respects this may be regrettable (e.g., the tying of names to ranks has led to much nomenclatural instability—in Linnaeus's time names were free to be ranked however the systematist saw fit, without any spelling change required [de Queiroz 2005]). In other ways, there has been improvement. One notable improvement is the mandating of type specimens.

In Linnaeus's works, names are paired with diagnoses—descriptions of the entities which the name signifies. But diagnoses are an unstable way to define biological groups. They may be too general, bringing unrelated forms into the same group. They may be too specific, excluding forms which should rightly belong. Sometimes they are flat-out wrong. Whatever the case, they are constantly revised in the literature.

What biological nomenclature needed was a way of anchoring definitions. Thus, the ICZN (as well as other nomenclatural codes) uses the concept of a type, one entity which "sets the standard" for the entire group. One specimen (a specimen being some object that has been catalogued within a collection) is selected as the standard-bearer for each species name. There are various types of types in zoological nomenclature, but the most important one is the holotype, the one specimen that anchors the name. Other individuals may be included or excluded as the systematist sees fit, but the one represented by the holotype must remain. (Note that, as practiced, this is different from the Platonic concept of an archetype, in that the holotype need not be a "typical" specimen. That concept is too subjective to be useful in science.)

The Human Holotype

The requirement that zoological names must have a holotype was not grandfathered in, or too many old names would have been invalidated. Instead, provisions were made such that subsequent authors could select a holotype if the original author did not. There are certain restrictions on this, set up to guarantee that the holotype is something that the original author would have included.

When Linnaeus named Homo sapiens, he diagnosed it much more succinctly than usual. "Homo, nosce te ipse," "HOMO nosce Te ipsum," he wrote: "HUMAN know yourself." Nothing further needed, at least at the time.

In 1959, in honor of the tenth edition of Systema Naturae's 200^th anniversary, W. T. Stearn wrote a commemorative article that, among other topics, addressed the lack of a holotype specimen for Homo sapiens:

"Since for nomenclatorial purposes the specimen most carefully studied and recorded by the author is to be accepted as the type, clearly Linnaeus himself, who was much addicted to autobiography, must stand as the type of his Homo sapiens!"

Although stated jokingly, this meets the ICZN's requirements for the designation of a type specimen. Linnaeus's remains, interred at the Uppsala Dome-Church, are the standard-bearer for the species Homo sapiens (and, by proxy, Genus Homo, Family Hominidae, etc.). A fitting tribute to his brilliance ... and his ego.

The Mangani Holotypes

Like any good human, I am fascinated by my own species. I spend much of my spare time studying our origins. It's tough going at times, because many people are fascinated by the same topic, and so there is a huge wealth of hypotheses, ranging from crackpot to well-substantiated. On one hand, the wealth of material is great, but, on the other hand, it's hard to sort out the solid ideas from the less solid. In short, it's a chaotic mess.

I am no Linnaeus (and I'm sure he would agree), but I like to organize my thoughts. So this is the first post in a series where I will take a look at what anchors we do have in this sea of confusion. One by one, I intend to look at each holotype specimen within the human-chimpanzee group, which I informally call "mangani", as explained in an earlier post.

I haven't decided on a particular order, but in many ways it seems that the most apt way to begin is with the first species to be named.

Carl Linnaeus (Uppsala domkyrka)

Collection	Uppsala domkyrka, Uppsala, Sweden (Sverige), Europe
Name	Carl Linnaeus
Other Names	Carolus Linnaeus (Latin) Carl von Linné (after ennoblement) Carolus von Linné (Latin, after ennoblement) L. (standard abbreviation in botanical literature)
Remains	interred corpse
Geography	born in Älmhult, Småland, Sweden (Sverige), Europe died in Uppsala, Sweden (Sverige), Europe
Chronology	born 1707 CE May 23 died 1778 CE January 10
Sex	male
Age	71 years
Height	~1.8m? ~1.6–1.7m?
Typified Taxa Names	Species Homo sapiens Linnaeus 1758 [holotype] Superspecies Homo (sapiens) Linnaeus 1758 [holotype] Subspecies Homo sapiens sapiens Linnaeus 1758 [holotype] Homo sapiens typifies: Genus Homo Linnaeus 1758 Subgenus Homo (Homo) Linnaeus 1758 Homo typifies: Superfamily Hominoidea Gray 1825 Family Hominidae Gray 1825 Subfamily Homininae Gray 1825 Tribe Hominini Gray 1825 Subtribe Hominina Gray 1825
Taxonomy	Although most of the higher taxa have varying usages, the species Homo sapiens is used fairly stably nowadays to include all living humans and their ancestors for approximately the past 200,000 years. More inclusive usages in the past included forms now generally placed in other species, such as Homo neanderthalensis and Homo heidelbergensis. (Genetic data has supported this for H. neanderthalensis [Krings & al. 1997].) Early specimens are similar to Homo heidelbergensis and Homo rhodesiensis, and are often placed in subspecies other than Homo sapiens sapiens (to be detailed in later entries). Of the higher taxa, the most stable is Hominoidea, which is generally used for the clade of tailless primates (gibbons and great apes, the latter including humans).
Comments	Many things about the designation of this specimen as the holotype are odd, not the least of which is that the individual represented by the specimen founded biological nomenclature. Apart from that, this specimen is not "typical" of its species in several ways. Notably, although Homo sapiens originated in Africa, this specimen is from a boreal peninsula of Europe, where members of the species exhibit some aberrant local adaptations, notably marked depigmentation. Even so, the individual still bears the distinctive hallmarks of the species: extremely high, vaulted cranium with high capacity, large body size coupled with gracile build, extremely flat face and small brow ridges, etc. Designation of this specimen as a holotype is problematic in that it is not available for study, on religious and cultural grounds. However, the individual is otherwise well-documented, both in writings and paintings, and was physically normal. Additionally, he has dozens of living descendants, via two of his daughters.

Carl von Linné's gravestone, at Uppsala domkyrka. Photo by Wrote.

Biotechnologist Martin Nervall with a painting of his great great great great great great grandfather, Carl Linnaeus. Photo by Teddy Thörnlund, appearing on Uppsala Universitet's page here.

References

Aczel, A. D. (2007). The Jesuit and the Skull: Teilhard de Chardin, Evolution, and the Search for Peking Man. Riverhead Books. isbn:1594489564
Clerck, C. A. (1757). Aranei suecici, descriptionibus et figuris oeneis illustrati, ad genera subalterna redacti speciebus ultra LX determinati. Svenska spindlar, uti sina hufvud-slagter indelte samt. Stockholmiae.
de Queiroz, K. (2005). Linnaean, rank-based, and phylogenetic nomenclature: restoring primacy to the link between names and taxa. Symb. Bot. Ups. 33(3):127–140. Available online at http://si-pddr.si.edu/dspace/bitstream/10088/4506/1/VZ_2005deQueirozSymBotUps.pdf
International Commission on Zoological Nomenclature (ICZN) (1999). International Code of Zoological Nomenclature, 4^th Ed.
London: International Trust for Zoological Nomenclature.
Krings, M., A. Stone, R. Schmitz, H. Krainitzki, M. Stoneking & S. Pääbo (1997). Neandertal DNA sequences and the origin of modern humans. Cell 90(1):19-30. doi:10.1016/S0092-8674(00)80310-4
Linnaeus, C. (1747). [Letter to J. G. Gmelin]. Available via The Linnaean Correspondence, http://linnaeus.c18.net, letter L0783 (consulted 2009 Jan 31).
Linnaeus, C. (1758). Systema naturae per regna tria naturae, secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis. ed. X, tom. I–II. Holmiae: Impensis L. Salvii.
Soulsby, B. H. (1933). A Catalogue of the Works of Linnaeus in the British Museum (2^nd Ed.). British Museum. Available online in partim at http://www.nhm.ac.uk/resources-rx/files/xi-zoological-works-23636.pdf
Stearn, W. T. (1959). The background of Linnaeus's contributions to the nomenclature and methods of systematic biology. Systematic Zoology 8:4–22.

23 July 2009

Two "Names on Nodes"-Related Launches

I'm still a clear way away from launching the beta application, but I've just made a couple of launches related to my long-time work-in-progress, Names on Nodes.

First up, and probably of more interest to most people, I've begun the documentation for the MathML definitions used by Names on Nodes. The document includes general reviews of relevant mathematical and biological concepts, a quick review of MathML and the technologies it's based on, some comments on correlating mathematical and biological concepts, and definitions for all entities (including operations) used by Names on Nodes. Note that this covers a lot of the same ground as in my 2007 paper, with a few minor changes in the symbols and terminology (e.g., I now call the ancestor of a clade a "cladogen" rather than a "cladogenetic set").

Secondly, I've made the project open-source, by moving it to Google Code. If you are a developer interested in checking this out, go here. It's incomplete, so I don't know if anyone will have any real interest in looking at it yet. (Honestly, I mostly posted so that, on the off chance that I unexpectedly kick the bucket, my magnum opus won't be lost forever.)

This information is also on the new Names on Nodes home page.

22 July 2009

"The Case for Human Evolution" - Illustrations

I have been working on an essay entitled The Case for Human Evolution for a while. I've just posted some illustrations I've been working on:

The Case for Human Evolution (Flickr Set)

Enjoy!

24 June 2009

Human-Chimpanzee Systematics

I've been working on a couple of projects to do with stem-humans. Naturally, these efforts necessitate creating a working phylogeny. I thought I'd post what I more or less have so far. I haven't done any rigorous work here; I'm just trying to piece things together from various publications.

This is a phylogeny of all known species within Clade(Homo sapiens Linnaeus 1758 ← Troglodytes gorilla Savage vide Savage & Wyman 1847), including some unnamed, fragmentary species that can only be differentiated from other species by location and/or time. (Note: Sahelanthropus tchadensis Brunet & al. 2002 is excluded because it doesn't seem to be clear that it does fall within this clade.) I've included links for all citations with permanent identifiers, when available, or popups with fuller information, when not. The phylogeny is interspersed with a rank-based taxonomy. (Unfortunately, there are no published phylogeny-based names to apply here.) Outlined circles indicate that the species may be ancestral to what are shown as sister groups. Species names are listed with their original prenomina (genera), regardless of current placement. I've added a note when the listed species is the type of its prenominal genus or another genus.

- ?Orrorin tugenensis Senut & al. 2001 [typus]
- ≅ Tribus Hominini Gray 1825
  - Ardipithecus kadabba Haile-Selassie 2001
  - ≅ Genus Pan Oken 1816
    - Pan sp. innom. McBrearty & Jablonski 2005
    - Pan paniscus Schwarz 1929
    - Simia troglodytes Blumenbach 1776 [typus Pan]
  - ≅ Subtribus Hominina Gray 1825
    - Australopithecus ramidus White & al. 1994 [typus Ardipithecus]
    - - Australopithecus anamensis Leakey & al. 1995 (?= praegens)
      - Homo praegens Ferguson 1989
      - Australopithecus afarensis Johanson & al. 1978 [typus Praeanthropus]
        Australopithecus bahrelghazali Brunet & al. 1995 (?= afarensis)
        Australopithecus garhi Asfaw & al. 1999
        Kenyanthropus platyops Leakey & al. 2001 [typus] (?= afarensis)
        ≅ Genus Australopithecus Dart 1925
        Australopithecus aethiopicus Olson 1985
        Australopithecus africanus Dart 1925 [typus]
        Zinjanthropus boisei Leakey 1959 [typus]
        Australopithecus robustus Broom 1938 [typus Paranthropus]
        ≅ Genus Homo Linnaeus 1758
        Homo sp. innom. Kimbel & al. 1996
        Homo habilis Leakey & al. 1965
        Homo rudolfensis Alexeev 1986
        ≅ Subgenus Homo (Homo) Linnaeus 1758
        Anthropopithecus erectus Dubois 1892 [typus]
        Homo ergaster Groves & Mazák 1975
        ?Homo floresiensis Brown & al. 2004
        Homo georgicus Vekua & al. 2002
        ≅ Superspecies Homo (sapiens) Linnaeus 1758
        Homo antecessor Bermúdez de Castro & al. 1997
        Homo cepranensis Mallegni & al. 2003 (?= antecessor)
        Homo heidelbergensis Schoetensack 1908
        Homo neanderthalensis King 1864
        Homo rhodesiensis Woodward 1921 (?= heidelbergensis)
        Homo sapiens Linnaeus 1758 [typus]

17 March 2009

Refactoring "Names on Nodes" Entities, Part II

As I discussed previously, the Names on Nodes project had reached a point where the schema just wasn't working out. I went through a list of what was wrong with it: confusing nomenclature, various unnecessary classes, unnecessary references, and major practical problems with looking up contextual relations.

Another big problem was the home-brewed keyword search system I had going. Synchronizing the keyword lists was becoming problematic, and I realized there are already perfectly good (better, even) tools out there such as Hibernate Search. That's a chief rule of programming: don't reinvent something that people smarter than you, with more time on their hands, have already invented.

After a clear, honest look at the contextual relations, I came to a realization: they should be in the client, not the back end. No need to bog down the server with computing definition applications when it can be done in the client. That simplified things a great deal.

Another thing I didn't really need was categories. They were basically an ad hoc form of class inheritance, e.g., a species name is a nomen, a nomenclatural code is a publication, etc. For a little while I considered implementing this as a class hierarchy, as I had in earlier versions. But, really, this is irrelevant data—Names on Nodes doesn't really need to know what category an identifier falls in.

Finally, I had another problem in the way datasets and taxon identifiers (=signifiers) used qualified names. Each one was supposed to have a unique qualified name. While I was able to guarantee uniqueness within datasets and within taxon identifiers, I wasn't able to guarantee that qualified names would be unique between datasets and taxon identifiers.

So, here's the new version (click to magnify):

Again, white arrows indicate "is-a" relationships ("inheritance")—so a PhyloDefinition is a type of Definition, a Dataset is a type of Qualified object, etc. And black diamonds indicate "has-a" relationships ("composition")—so a TaxonIdentifier has one (and only one) Taxon, an Equation has at least two TaxonIdentifier objects, etc. (I've left out a few non-core classes, like BioFile and UserAccount.)

Brief discussions of each class:

Authority.—An authority can be a publication, a person, a bioinformatics file, a database, a specimen catalogue, etc. Each authority has a canonical name (e.g., "Yale Peabody Museum: Vertebrate Paleontology Collection") and an optional abbreviation (e.g., "YPM-VP").

AuthorityIdentifer.—One or more identifiers may be used to indicate an authority, each one associated with a unique URI. Examples:

<urn:isbn:0853010064> (The International Code of Zoological Nomenclature, 4th Edition)
<http://iczn.org/iczn> (Another way of referring to the ICZN.)
<mailto:keesey@gmail.com> (myself)
<http://peabody.yale.edu/collections/vp> (Yale Peabody Museum: Vertebrate Paleontology Collection)
<urn:sha1:bc0ccc8a379edc44cf91b013d2da6238d4258a56> (a bioinformatics file, indicated by its SHA-1 hash key)

Qualified.—This new abstract class makes it possible for qualified names to be unique across all classes that use them. Each refers to an authority identifier and contains a local name, which is unique to that identifier. When combined, the identifier's URI and the local name form a qualified name, e.g., <urn:isbn:0853010064::Homo+sapiens> or <http://peabody.yale.edu/collections/vp::1450>.

TaxonIdentifier & Taxon.—Formerly called "signifiers", taxon identifiers are qualified objects that each refer to a taxon. Taxon identifiers may be scientific names, vernacular names, specimen identifiers, character state descriptions, etc. As with authorities, each taxon may have more than one identifier referring to it. For example, the following qualified names all refer to the same species: <urn:isbn:0853010064::Abeillia+abeillei>, <http://iucnredlist.org::species:142883>, and <http://iucnredlist.org::common_name:Eng:Emerald-chinned+Hummingbird>.

Label.—Authorities, datasets, and taxon identifiers are all labelled entities, possessing one label object. Each label has a name, an optional abbreviation, and a flag telling whether it should be italicized. Labels are merely cosmetic, and need not be unique. They are used as the targets of searches, using Hibernate Search.

Definition.—Each definition has one taxon identifier, and only one definition pertains to that taxon identifier. How do I accommodate differing definitions, then? I use a concept from the PhyloCode: conversion. Consider the name "Aves". Under the ICZN, it refers to a suprafamilial ranked taxon with no type. According to Sereno's TaxonSearch, it refers to a node-based clade including Archaeopteryx. According to Gauthier and de Queiroz (2001), it refers to a crown group. But instead of having multiples definitions for the same identifier, I consider each definition to define a different identifier, each indicating a (potentially) different taxon: <urn:isbn:0853010064::Aves>, <http://www.taxonsearch.org/Archive/stem-archosauria-1.0.php::Aves>, and <urn:bici:0912532572(200112)%3C7:FDFDCD%3E2.0.TX;2-H::Aves>, respectively. In cases of conversion, the definition also indicates the original identifier.

PhyloDefinition & RankDefinition.—These have not changed much, except that they now refer directly to their specifers and types, respectively. No more useless "Anchor" class.

Dataset.—Instead of storing a bunch of relations of unspecified type, each type of relation falls within its own set. I've also added optional ratios for converting weights in phylogenetic networks to generations and/or years.

Equation.—I almost called this "Synonymy". This is a new type of relation, which asserts that two or more identifiers refer to the same taxon.

Heredity & Inclusion.—Heredity was previously called "Parentage". The new nomenclature better reflects its real meaning, since it models ancestor-descendant relationships, not necessarily parent-child. These two classes are little changed, except that now they don't both descend from a useless Relation class, so their nomenclature can be clearer (predecessor and superset used to be "a"; successor and subset used to be "b").

This schema is much cleaner, and will make for a more efficient server-side. I've already implemented the entities, removed deprecated code, and updated the relevant code. After some hiccups with a Hibernate upgrade, unit tests are working again. The back-end should be complete fairly soon (pending some ideas about user accounts), and then it will be time to look at some massive refactorings for the front end!