The soybean cyst nematode (Heterodera glycines) is a sedentary plant parasite that exceeds a billion dollars in yield losses annually. It has spread across the soybean-producing world, emerging as the primary pathogen of soybeans. This problem is exacerbated by H. glycines populations overcoming the limited sources of natural resistance in soybean and by the lack of effective and safe alternative treatments. Although there are genetic determinants that render soybean plants resistant to certain nematode genotypes, resistant soybean cultivars are increasingly ineffective because their multi-year usage has selected for virulent H. glycines populations. Successful H. glycines infection relies on the comprehensive re-engineering of soybean root cells into a syncytium, as well as the long-term suppression of host defenses to ensure syncytial viability. At the forefront of these complex molecular interactions are effectors, the proteins secreted by H. glycines into host root tissues. The mechanisms that control genomic effector acquisition, diversification, and selection are important insights needed for the development of essential novel control strategies. As a foundation to obtain this understanding, we developed a nine scaffold, 158Mb pseudomolecule assembly of the H. glycines genome using PacBio, Chicago, and Hi-C sequencing. An annotation of 22,465 genes was predicted using a Mikado pipeline informed by published short- and long-read expression data. Here we present results from our assembly and annotation of the H. glycines genome.
[Definitive version of this article may be found here] The mitochondrial gene cytochrome-c-oxidase subunit 1 (COI) is useful in many taxa for phylogenetics, population genetics, metabarcoding, and rapid species identifications. However, the phylum Ctenophora (comb jellies) has historically been difficult to study due to divergent mitochondrial sequences and the corresponding inability to amplify COI with degenerate and standard COI ‘barcoding’ primers. As a result, there are very few COI sequences available for ctenophores, despite over 200 described species in the phylum. Here, we designed new primers and amplified the COI fragment from members of all major groups of ctenophores, including many undescribed species. Phylogenetic analyses of the resulting COI sequences revealed high diversity within many groups that was not evident from more conserved 18S rDNA sequences, in particular among the Lobata. The COI phylogenetic results also revealed unexpected community structure within the genus Bolinopsis, suggested new species within the genus Bathocyroe, and supported the ecological and morphological differences of some species such as Lampocteis cruentiventer and similar lobates (Lampocteis sp. ‘V’ stratified by depth, and ‘A’ differentiated by color). The newly described primers reported herein provide important tools to enable researchers to illuminate the diversity of ctenophores worldwide via quick molecular identifications, improve the ability to analyze environmental DNA by improving reference libraries and amplifications, and enable a new breadth of population genetic studies.
Many model organisms have obtained a prominent status due to an advantageous combination of their life-history characteristics, genetic properties and also practical considerations. In non-crop plants, Arabidopsis thaliana is the most renowned model and has been used as study system to elucidate numerous biological processes at the molecular level. Once a complete genome sequence was available, research has markedly accelerated and further established A. thaliana as the reference to stimulate studies in other species with different biology. Within the Brassicaceae family, the arctic-alpine perennial Arabis alpina has become a model complementary to A. thaliana to study life-history evolution and ecological genomics in harsh environments. In this review, we provide an overview of the properties that facilitated the rapid emergence of A. alpina as a plant model. We summarize the evolutionary history of A. alpina, including the diversification of its mating system, and discuss recent progress in the molecular dissection of developmental traits that are related to its perennial life history and environmental adaptation. We indicate open questions from which future research might be developed in other Brassicaceae species or more distantly related plant families.
Identifying local adaptation in bottlenecked species is essential for conservation management. Selection detection methods have an important role in species management plans, assessments of adaptive capacity, and looking for responses to climate change. Yet, the allele frequency changes exploited in selection detection methods are similar to those caused by the strong neutral genetic drift expected during a bottleneck. Consequently, it is often unclear what accuracy selection detection methods have across bottlenecked populations. In this study, simulations were used to explore if signals of selection could be confidently distinguished from genetic drift across 23 bottlenecked and reintroduced populations of Alpine ibex (Capra ibex). The meticulously recorded demographic history of the Alpine ibex was used to generate comprehensive simulated SNP data. The simulated SNPs were then used to benchmark the confidence we could place in outliers identified in empirical Alpine ibex SNP data. Within the simulated dataset, the false positive rates were high for all selection detection methods but fell substantially when two or more methods were combined. True positive rates were consistently low and became negligible with increased stringency. Despite finding many outlier loci in the empirical Alpine ibex SNPs, none could be distinguished from genetic drift-driven false positives. Unfortunately, the low true positive rate also prevents the exclusion of recent local adaptation within the Alpine ibex. The baselines and stringent approach outlined here should be applied to other bottlenecked species to ensure the risk of false positive, or negative, signals of selection are accounted for in conservation management plans.
Mapping the genes underlying ecologically-relevant traits in natural populations is fundamental to develop a molecular understanding of species adaptation. Current sequencing technologies enable the characterisation of a species' genetic diversity across the landscape or even over its whole range. The relevant capture of the genetic diversity across the landscape is critical for a successful genetic mapping of traits and there are no clear guidelines on how to achieve an optimal sampling and which sequencing strategy to implement. Here we determine through simulation, the sampling scheme that maximises the power to map the genetic basis of a complex trait in an outbreeding species across an idealised landscape and draw genomic predictions for the trait, comparing individual and pool sequencing strategies. Our results show that QTL detection power and prediction accuracy are higher when more populations over the landscape are sampled and this is more cost-effectively done with pool sequencing than with individual sequencing. Additionally, we recommend sampling populations from areas of high genetic diversity. As progress in sequencing enables the integration of trait-based functional ecology into landscape genomics studies, these findings will guide study designs allowing direct measures of genetic effects in natural populations across the environment.
DNA metabarcoding is an important tool for molecular ecology. However, its effectiveness hinges on the quality of reference sequence databases and classification parameters employed. Here we evaluate the performance of MiFish 12S taxonomic assignments using a case study of California Current Large Marine Ecosystem fishes to determine best practices for metabarcoding. Specifically, we use a taxonomy cross-validation by identity framework to compare classification performance between a global database comprised of all available sequences and a curated database that only includes sequences of fishes from the California Current Large Marine Ecosystem. We demonstrate that the curated, regional database provides higher assignment accuracy than the comprehensive global database. We also document a tradeoff between accuracy and misclassification across a range of taxonomic cutoff scores, highlighting the importance of parameter selection for taxonomic classification. Furthermore, we compared assignment accuracy with and without the inclusion of additionally generated reference sequences. To this end, we sequenced tissue from 605 species using the MiFish 12S primers, adding 253 species to GenBank’s existing 550 California Current Large Marine Ecosystem fish sequences. We then compared species and reads identified from seawater environmental DNA samples using global databases with and without our generated references, and the regional database. The addition of new references allowed for the identification of 16 native taxa and 17.0% of total reads from eDNA samples, including species with vast ecological and economic value. Together these results demonstrate the importance of comprehensive and curated reference databases for effective metabarcoding and the need for locus-specific validation efforts.
Current knowledge on environmental distribution and taxon richness of free-living bacteria is mainly based on cultivation-independent investigations employing 16S rRNA gene sequencing methods. Yet, 16S rRNA genes are evolutionarily rather conserved, resulting in limited taxonomic and ecological resolutions provided by this marker. We used a faster evolving protein-encoding marker to reveal ecological patterns hidden within a single OTU defined by >99% 16S rRNA sequence similarity. The studied taxon, subcluster PnecC of the genus Polynucleobacter, represents a ubiquitous group of planktonic freshwater bacteria with cosmopolitan distribution, which is very frequently detected by diversity surveys of freshwater systems. Based on genome taxonomy and a large set of genome sequences, a sequence similarity threshold for delineation of species-like taxa could be established. In total, 600 species-like taxa were detected in 99 freshwater habitats scattered across three regions representing a latitudinal range of 3400 km (42°N to 71°N) and a pH gradient of 4.2 to 8.6. Besides the unexpectedly high richness, the increased taxonomic resolution revealed structuring of Polynucleobacter communities by a couple of macroecological trends, which was previously only demonstrated for phylogenetically much broader groups of bacteria. A unexpected pattern was the almost complete compositional separation of Polynucleobacter communities of Ca2+-rich and Ca2+-poor habitats, which strongly resembled the vicariance of plant species on silicate and limestone soils. The presented new cultivation-independent approach opened a window to an incredible, previously unseen diversity, and enables investigations aiming on deeper understanding of how environmental conditions shape bacterial communities and drive evolution of free-living bacteria.
Scale insects are hemimetabolous, showing “incomplete” metamorphosis and no true pupal stage. Ericerus pela, commonly known as the white wax scale insect (hereafter, WWS), is a wax-producing insect found in Asia and Europe. WWS displays dramatic sexual dimorphism, with notably different metamorphic fates in males and females. Males develop into winged adults, while females are neotenic and maintain a nymph-like appearance, which are flightless and remain stationary. Here we report the de novo assembly of the WWS genome with its size of 638.30 Mb (69.68Mb for scaffold N50) by PacBio sequencing and Hi-C. From these data, we constructed a robust phylogenetic analysis of 24,923 gene families from 16 representative insect genomes, which indicates that holometabola evolved from incomplete metamorphosis insects in the Late Carboniferous, about 50 million years earlier than previously thought. To study the distinct development of males and females, we analyzed the methylome landscape in either sex. Surprisingly, WWS displayed high levels of methylation (4.42% for males) when compared to other insects. We observed differential methylation patterns for genes involved in steroid and sesquiterpenoids production as well as related fatty acid metabolism pathways. We show here that both males and females exhibit distinct titer profiles for ecdysone, the principal insect steroid hormone, and juvenile hormone (a sesquiterpenoid), suggesting that these hormones are the primary drivers of sexually dimorphic features. Our results provide a comprehensive genomic and epigenomic resource of scale insects that provide new insights into the evolution of metamorphosis and sexual dimorphism in insects.
Managing endangered species in fragmented landscapes requires estimating dispersal rates between populations over contemporary timescales. Here we develop a new method for quantifying recent dispersal using genetic pedigree data for close and distant kin. Specifically, we describe an approach that infers missing shared ancestors between pairs of kin in habitat patches across a fragmented landscape. We then apply a stepping-stone model to assign unsampled individuals in the pedigree to probable locations based on minimizing the number of movements required to produce the observed locations in sampled kin pairs. Finally, we use all pairs of reconstructed parent-offspring sets to estimate dispersal rates between habitat patches under a Bayesian model. Our approach measures connectivity over the timescale represented by the small number of generations contained within the pedigree and so is appropriate for estimating the impacts of recent habitat changes due to human activity. We used our method to estimate recent movement between newly discovered populations of threatened Eastern Massasauga Rattlesnakes (Sistrurus catenatus) using data from 2996 RAD-based genetic loci. Our pedigree analyses found no evidence for contemporary connectivity between five genetic groups, but, as validation of our approach, showed high dispersal rates between sample sites within a single genetic cluster. We conclude that these five genetic clusters of Eastern Massasauga Rattlesnakes have small numbers of resident snakes and are demographically isolated conservation units. More broadly, our methodology can be widely applied to determine contemporary connectivity rates, independent of bias from shared genetic similarity due to ancestry that impacts other approaches.
The hyper-diverse order Coleoptera comprises a staggering ~25% of known species on Earth. Despite recent breakthroughs in next generation sequencing, there remains a limited representation of beetle diversity in assembled genomes. Most notably, the ground beetle family Carabidae, comprising more than 40,000 described species, has not been studied in a comparative genomics framework using whole genome data. Here we generate a high-quality genome assembly for Nebria riversi, to examine sources of novelty in the genome evolution of beetles, as well as genetic changes associated with specialization to high elevation alpine habitats. In particular, this genome resource provides a foundation for expanding comparative molecular research into mechanisms of insect cold adaptation. Comparison to other beetles shows a strong signature of genome compaction, with N. riversi possessing a relatively small genome (~147 Mb) compared to other beetles, with associated reductions in repeat element content and intron length. Small genome size is not, however, associated with fewer protein-coding genes, and an analysis of gene family diversity shows significant expansions of genes associated with cellular membranes and membrane transport, as well as protein phosphorylation and muscle filament structure. Finally, our genomic analyses show that these high elevation beetles have endosymbiotic Spiroplasma, with several metabolic pathways (e.g. propanoate biosynthesis) that might complement N. riversi, although its role as a beneficial symbiont or as a reproductive parasite remains equivocal.
We used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudo-chromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia. We highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.
The bean bug (Riptortus pedestris) causes great economic losses of soybeans by piercing and sucking pods and seeds. Although R. pedestris has become the focus of numerous studies associated with insect–microbe interactions, plant–insect interactions, and pesticide resistance, a lack of genomic resources has limited deeper insights. In this study, we report the first R. pedestris genome at the chromosomal level using PacBio, Illumina, and Hi-C technologies. The assembled genome was 1.193 Gb in size with a contig N50 of 13.97 Mb. More than 95.7% of the total genome bases were successfully anchored to 6 unique chromosomes, with the scaffold N50 reaching 181.34 Mb. Genome resequencing of male and female individuals and chromosomic staining demonstrated that the sex chromosome system of R. pedestris is XO, and the shortest chromosome is the X chromosome. In total, 21,562 protein-coding genes were predicted, 21,320 of which were validated as being expressed in different tissues or different developmental stages. Evolutionary analysis demonstrated that R. pedestris and Oncopeltus fasciatus formed a sister group and split ∼35 million years ago. Additionally, a 5.04 Mb complete genome of symbiotic Serratia marcescens Rip1 was assembled, and the virulence factors that account for successful colonization in the host midgut were identified. The high-quality R. pedestris genome provides a valuable resource for further research, as well as for the pest management of bug pests.
Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is an artifact of the sequencing platform and as a result such data are compositional. To avoid library size dependency, one common way of analyzing multivariate compositional data is to perform a principal component analysis (PCA) on data transformed with the centered log-ratio, hereafter called a log-ratio PCA. Two aspects typical of amplicon sequencing data are the large differences in library size and the large number of zeroes. In this paper we show on real data and by simulation that, applied to data that combines these two aspects, log-ratio PCA is nevertheless heavily dependent on the library size. This leads to a reduction in power when testing against any explanatory variable in log-ratio redundancy analysis. If there is additionally a correlation between the library size and the explanatory variable, then the type 1 error becomes inflated. We explore putative solutions to this problem.
Sea Lettuce (Ulva spp.; Ulvophyceae, Ulvales, Ulvaceae) is an important ecological and economical entity, with a worldwide distribution and is a well-known source of near-shore blooms blighting many coastlines. Species of Ulva are frequently misidentified in public repositories, including herbaria and gene banks, making species identification based on traditional barcoding hazardous. We investigated the species distribution of 295 individual distromatic foliose strains from the North East Atlantic by traditional barcoding or next generation sequencing. We found seven distinct species, and compared our results with all worldwide Ulva spp sequences present in the NCBI database for the three barcodes rbcL, tufA and the ITS1. Our results demonstrate a large degree of species misidentification in the NCBI database. We estimate that 21% of the entries pertaining to foliose species are misannotated. In the extreme case of U. lactuca, 65% of the entries are erroneously labelled specimens of another Ulva species, typically U. fenestrata. In addition, 30% of U. rigida entries are misannotated, U. rigida being relatively rare and often misannotated U. laetevirens. Furthermore, U. armoricana and U. scandinavica present as being synonymous to U. laetevirens. An analysis of the global distribution of registered samples from foliose species also indicates possible geographical isolation for some species, and the absence of U. lactuca from Northern Europe. Altogether, exhaustive taxonomic clarification by aggregation of a library of barcode sequences highlights misannotations, and delivers an improved representation of Ulva species diversity and distribution. This approach could be easily adapted to other taxa.
Interactions of organisms with their environment are complex and environmental regulation at different levels of biological organization is often non-linear. Therefore, the genotype to phenotype continuum requires study at multiple levels of organization. While studies of transcriptome regulation are now common for many species, quantitative studies of environmental effects on proteomes are needed. Here we report the generation of a data-independent acquisition (DIA) assay library that enables simultaneous targeted proteomics of thousands of Oreochromis niloticus kidney proteins using a label- and gel-free workflow that is well suited for ecologically relevant field samples. We demonstrate the usefulness of this DIA assay library by discerning environmental effects on the kidney proteome of O. niloticus. Moreover, we demonstrate that the DIA assay library approach generates data that are complimentary rather than redundant to transcriptomics data. Transcript and protein abundance differences in kidneys of tilapia acclimated to freshwater and brackish water (25 g/kg) were correlated for 2114 unique genes. A high degree of non-linearity in salinity-dependent regulation of transcriptomes and proteomes was revealed suggesting that the regulation of O. niloticus renal function by environmental salinity relies heavily on post-transcriptional mechanisms. The application of functional enrichment analyses using STRING and KEGG to DIA assay datasets is demonstrated by identifying myo-inositol metabolism, antioxidant and xenobiotic functions, and signaling mechanisms as key elements controlled by salinity in tilapia kidneys. The DIA assay library resource presented here can be adopted for other tissues and other organisms to study proteome dynamics during changing ecological contexts.
The diploid Poropuntius huangchuchieni in the cyprinid family, which is widely distributed in the Mekong and Red River basins, is one of the most closely related diploid progenitor-like species of allotetraploid common carp, which was generated by merging of two diploid genomes during evolution. Therefore, the P. huangchuchieni genome is essential for polyploidy evolution studies in Cyprinidae. Here, we report a high-quality chromosome-level genome assembly of P. huangchuchieni by integrating Oxford Nanopore and Hi-C technology. The assembled genome size was 1021.38 Mb, 895.66 Mb of which was anchored onto 25 chromosomes with a N50 of 32.93 Mb. The genome contained 486.28 Mb repetitive elements and 24,099 protein-coding genes. Approximately 95.9% of the complete BUSCOs were detected, suggesting a high completeness of the genome. Evolutionary analysis revealed that P. huangchuchieni diverged from Cyprinus carpio at approximately 12 Mya. Genome comparison between P. huangchuchieni and the B subgenome of C. carpio provided insights into chromosomal rearrangements during the allotetraploid speciation. With the complete gene set, 17,474 orthologous genes were identified between P. huangchuchieni and C. carpio, providing a broad view of the gene component in the allotetraploid genome, which is critical for future genetic analyses. The high-quality genomic dataset created for P. huangchuchieni provides a diploid progenitor-like reference for the evolution and adaptation of allotetraploid carps.
Fungi form diverse communities and play essential roles in many terrestrial ecosystems, yet there are methodological challenges in taxonomic and phylogenetic placement of fungi from environmental sequences. To address such challenges we investigated spatio-temporal structure of a fungal community using soil metabarcoding with four different sequencing strategies: short amplicon sequencing of the ITS2 region (300–400\ bp) with Illumina MiSeq, Ion Torrent Ion S5, and PacBio RS II, all from the same PCR library, as well as long amplicon sequencing of the full ITS and partial LSU regions (1200–1600\ bp) with PacBio RS II. Resulting community structure and diversity depended more on statistical method than sequencing technology. The use of long-amplicon sequencing enables construction of a phylogenetic tree from metabarcoding reads, which facilitates taxonomic identification of sequences. However, long reads present issues for denoising algorithms in diverse communities. We present a solution that splits the reads into shorter homologous regions prior to denoising, and then reconstructs the full denoised reads. In the choice between short and long amplicons, we suggest a hybrid approach using short amplicons for sampling breadth and depth, and long amplicons to characterize the local species pool for improved identification and phylogenetic analyses.
Characterization of microbial assemblages via environmental DNA metabarcoding is increasingly being used in routine monitoring programs due to its sensitivity and cost-effectiveness. Several programs have been developed recently which infer functional profiles from 16S rRNA gene data using hidden-state prediction (HSP) algorithms. These might offer an economic and scalable alter-native to shotgun metagenomics. To date, HSP-based methods have seen limited use for benthic marine surveys and their performance in these environments remains unevaluated. In this study, 16S rRNA metabarcoding was applied to sediment samples collected at 0 and ≥ 1200 m from Norwegian salmon farms, and three metabolic inference approaches (PAPRICA, PICRUSt2 and TAX4FUN2) evaluated against metagenomics and environmental data. While metabarcoding and metagenomics recovered a comparable functional diversity, the taxonomic composition differed be-tween approaches, with genera richness up to 20× higher for metabarcoding. Comparisons between the sensitivity (highest true positive rates) and specificity (lowest true negative rates) of HSP-based programs in detecting functions found in metagenomics data ranged, respectively, from 0.52 and 0.60 to 0.76 and 0.79. However, little correlation was observed between the relative abundance of their specific functions. Functional beta-diversity of HSP-based data was strongly associated with that of metagenomics (r ≥ 0.86 for PAPRICA and TAX4FUN2) and responded similarly to the impact of fish farm activities. Our results demonstrate that although HSP-based metabarcoding approaches provide a slightly different functional profile than metagenomics, partly due to recovering a distinct community, they represent a cost-effective and valuable tool for characterizing and assessing the effects of fish farming on benthic ecosystems.