David Nelson, PhD
Next-gen sequencing is exploding the number of P450 sequences available either from genomes or transcriptomes. This is good in that it uncovers the true biodiversity in the gene superfamily, but it also poses challenges of how to process this much information? There are now about 400 fungal genomes, 144 animal genomes, 64 plant genomes that are sequenced as well as more than 1100 plant transcriptomes. The number of P450s already uncovered in these datasets is over 200,000. The problem is over the past ~27 years only 26,000 P450 sequences have been annotated and assigned names. Databases like Ensembl are tagging obvious orthologs with the equivalent name from a reference genome like human, but there are many gene clusters that cannot be named that way and these require human curation. I am currently working on the marmoset (Callithrix jacchus) genome (a new world monkey) to name all the P450s, both orthologs and non-orthologs. In Nov. 2014 I will be travelling to Hinxton Hall, UK to meet with my collaborators in vertebrate P450 nomenclature to assess the current status and to plan for the next year in a four year NIH grant to name complex vertebrate gene families like P450s and olfactory receptors. For that visit I am upgrading my database skills to be able to add manually annotated P450s as a track in the Ensembl genome browsers. This will greatly aid sharing of this information and editing the gene models for accuracy. The immediate goal will be to go from human and marmoset to annotate 12 more primates (and in my spare time to annotate 172,000 plant P450s).