Most Archaea and Bacteria are Nameless. SeqCode Could Change That
Let’s say you’ve discovered a new bacterium or archaea and want to give it a name so that it can be classified and placed on the appropriate branch of the tree of life. According to the International Committee on Systematics of Prokaryotes (ICSP), the community of scientists that decides how to handle the naming of bacteria and other prokaryotes, there are a few steps you need to take before you can designate it a name according to the International Code of Nomenclature for Prokaryotes (ICNP). Chief among those steps is successfully culturing your new organism and sending it to at least two separate collections, which must be located in different countries, for storage.
Unfortunately, that remains impossible for the vast majority of prokaryotic life on Earth. Most are what microbiologists call “fastidious organisms,” meaning that their nutritional or environmental requirements are complex or extremely particular, making it difficult, if not impossible, to grow them in culture. And if that was the case for your discovered microbe, it would have to remain nameless, a part of the planet’s so-called microbial dark matter—for now. As The Scientist previously reported, a number of new systems of nomenclature have emerged in recent years to accommodate the uncultured masses. Among them is SeqCode, a registry described in a Nature Microbiology paper published earlier this month (September 19), that uses genome sequences as the basis for nomenclatural types (entries) rather than cultured specimens.
To learn more about why naming the planet’s uncultured microbes is so essential, The Scientist spoke with study coauthor William Whitman, a microbiologist at the University of Georgia, who first proposed in 2016 that the ICSP allow genomes of bacteria and archaea that can’t (yet) be cultured to count as types and, when his proposal was voted down, began work on SeqCode instead.
The Scientist: How did your research as a microbiologist lead you to this interest in systematics and nomenclature?
William Whitman: Well, I’ve been an editor for Bergey’s Manual, which is an encyclopedia of bacteria, for about 15 years or so. And we often found organisms that were really interesting biologically, [for which] the nomenclature—the names—kept changing, or they weren’t formally named. It didn’t seem to make any sense. For instance, I do my research on archaea, and there’s a group of really interesting organisms that are symbionts that couldn’t be named because they couldn’t be deposited into culture collections. Some of the major groups of organisms that do these really important geochemical transformations couldn’t be named under the current rules. It just really screws things up.
TS: Could you talk a little bit more about the history of SeqCode and where this project started?
WW: So, [at] Bergey’s, we try to have a chapter on every type of bacteria or archaea. And I was really frustrated with the naming of fastidious organisms, [which] couldn’t be deposited in culture collections. And so, because of my interest in Bergey’s, I was also a member of the ICSP. I came up with the idea of using the genome as the type material for the fastidious organisms about the same time that metagenomic technology was developed, where you could get reliable genome sequences from uncultured organisms. So, the two goals sort of morphed together. Now it’s possible to get a genome just from isolated DNA.
TS: Can you walk me through a little bit more of what SeqCode offers to researchers working in microbiology that the ICNP, as it currently operates, cannot? Not to pit them against each other, but as an alternative set of standards.
WW: Well, the problem with the ICNP is there’s nothing fundamentally wrong with the basic concept. So, the SeqCode uses many of the same ideas. For instance, in biological nomenclature, you say, “What is the species?” and in fact, there is no definition of a species, or a genus, or a family, of any of the taxa. Nobody really knows what they are. So, the way you actually define them is [saying] a species is a group of organisms that includes the type for the species. In the ICNP, the type has to be an organism, which is fine. But in the SeqCode, a type can be a genome.
You can’t create a database if you don’t have a stable nomenclature.
The advantage to that is, first of all, you can have the genome of . . . a fastidious organism. The second advantage is that in any ICNP, if the type of the organism is lost from a culture collection, then you can’t rename the taxon. And this has turned out to be a real big problem, especially for fastidious organisms, because the technology for keeping the organisms in culture collections is not perfect. For instance, I worked with some people on the taxonomy of the sulfate-reducing bacteria, which are semifastidious anaerobic organisms that use sulfate the way people use oxygen, basically. The genomes were known, but the culture collections had lost some of the strains, so we couldn’t reclassify the organisms even though they were clearly in the wrong group. If you’re thinking [over] a period of 10 years, the ICNP makes sense because you can keep cultures alive for 10 years. But you think in terms of centuries, it’s just not going to happen. There are probably millions of species; it’d be too expensive to have culture collections with a million species.
However, you could have a million genomes. In fact, we probably have that many already in databases. So, I think the ICNP was fine. The technology was developed in the 1970s, and it was fine for the 1970s. But it’s been 50 years; it’s time to grow up.
TS: And this dates back to when species were classified based on chemotaxis [the migration of organisms toward or away from certain compounds], right? Technology has progressed quite a bit.
WW: Right, exactly.
TS: I hadn’t realized that the organisms actually need to be maintained in culture. I thought it was just whether they could be cultured at all. Is that so they can be kept around for future reclassifications?
WW: Yeah, they have to be deposited in two international culture collections, and so on. And that means they have to remain viable. So, usually, they’re stored as frozen or dried samples. And to make it even more difficult, many countries prohibit the distribution of living organisms from their countries, like India and Brazil and South Africa, I think. And that actually means that about a quarter of the microbiologists on Earth can’t deposit [and therefore] can’t name new organisms from their countries, because they can’t distribute [the microbes] outside their country.
TS: Can you talk a little bit about how new submissions or entries to the SeqCode work?
WW: We’re creating the SeqCode registry. A lot of the basic bones have been made, but it’s not fully functional. We’re looking for funding to complete it. But the basic idea is if I do a study and I decide I want to name a new species or name any new group of organisms, I would do two things. One is I would get the genome sequence. And it could be an isolate or it could be a metagenome. Then I’d write a paper that I could submit to a journal. And what I do is I deposit the sequence in a database like GenBank. When the paper’s accepted, then I would go to the registry, enter the accession number for the sequence, then I will enter the name I want to use. And then, when the effective publication is accepted, then you enter the DOI. And that’s it.
If you’re thinking [over] a period of 10 years, the ICNP makes sense because you can keep cultures alive for 10 years. But you think in terms of centuries, it’s just not going to happen.
This registry will have some checks in and, ideally, we’ll have as many automated checks as possible. For the name, it’ll probably require manual curation, at least initially, although we’d like to automate that as well. So, the idea is people can just put their stuff in there and it’ll be in the registry. They’ll be able to search the registry, it’ll be able to be compatible, say, with the NCBI [National Center for Biotechnology Information], so that the names could immediately go into their database.
TS: I was going to ask about overlap with the NCBI database. How do you plan to handle any conflicts in nomenclature or names between the two?
WW: Any name in the ICNP automatically . . . is recognized by the SeqCode.
If there’s a conflict—so if I name a taxon in the SeqCode, and someone names the taxon in the ICNP—then the person who named it first would have priority and we would use the first name. Now, that worked out well for species; when we get to the higher-level taxa, there are some differences in some of the rules. The SeqCode uses a different rule for claiming priority of, say, families and orders and the higher taxa. And that will be settled by negotiation. I think our rules are a little bit more sensible than the ICNP’s. And that’s why we changed it.
TS: I know this has been in the works for years now, but now that you’ve published in Nature [Microbiology], what has the response been from other researchers?
WW: You know, it just came out. I’ve gotten a few emails where people said they really liked it. But I’ve gotten maybe three emails like that. I don’t think people really notice it. . . . We’re kind of waiting for the shoe to drop.
TS: You mentioned that you’re now looking for funding. What else lies in the path ahead for you and getting the SeqCode registry off the ground?
WW: I think the most important thing is getting the registry off the ground and getting it working. I think we’ll need funding to get the registry going. We’ve explored a number of different sources. I think we’ll be able to get the money from that. In the long run, I think it really needs to be self-supported, where you might have a small fee to register names.
The ICNP gets some money from the publication of its journal. And the SeqCode won’t have a journal, so it doesn’t really have any other funding source. Historically, systematics is not a topic that gets a lot of respect from NSF [the National Science Foundation] or NIH [the National Institutes of Health], so you don’t have large sources of funding. Until very recently, even the funding for the ICNP was minimal. Until about 10 years ago, it didn’t have any funding, and was mostly a volunteer organization. I think, probably, in the long run, the same will be true for the SeqCode.
TS: I can imagine it’s been an uphill battle. When I wrote about phyla reclassifications earlier this year, there were researchers who study these organisms and had no idea that [name] changes were even in the works.
WW: That’s kind of bizarre because in this systematics community, it was widely known. And basically, it shows that people don’t really care. And there are reasons they don’t care: Because the biology itself is really fascinating, who cares what the name is? But the name is important. I mean, I believe the name is important. I really care about it, but I care about it because I really think the organisms are just fascinating.
I’ve been thinking about it, and I think this question of “Why is a name important?” is really fundamental. And I think it isn’t as appreciated as it should be. . . . First of all, you can’t create a database if you don’t have a stable nomenclature. And the second thing is that the types are fundamental to defining the name.
The only way you really know what the name [of a new organism] means is if your organism includes a type or is similar to a type. So like E. coli, you know, what the heck is E. coli? Nobody knows. But if you have an organism that’s similar to the type for E. coli, by whatever criteria you use, then your organism is E. coli. And that’s a key scientific point. So, [defining types is] not a trivial matter.
TS: To circle back to how SeqCode started as this alternative to your proposal to include uncultivated bacteria in the ICNP, is there any chance this leads to a single integrated code?
WW: I think our plan is that it will. At least our intention is that it will lead to a single integrated code. We originally went to the ICSB to make the change, and the members of the ICSB rejected it. We want it to be one code. In fact, [SeqCode] is much less useful if it’s only for uncultured organisms. Its best use is for everything.
We want to integrate, but that will depend upon the willingness of the people in the ICSP to allow it to happen.
TS: And that certainly won’t be tomorrow.
WW: Right. But it might not be too long. You know, some of the people there have indicated that they thought it was interesting. And I think a third voted for it, when we first approached it and tried to get the genomes as types. So, two-thirds voted against, but some people voted for it.
Editor’s note: This interview has been edited for brevity.