Populus has a wide ecogeographical range spanning the Northern Hemisphere, and exhibits abundant distinct species and hybrids globally. Populus tomentosa Carr. is widely distributed and cultivated in the eastern region of Asia, where it plays multiple important roles in forestry, agriculture, conservation, and urban horticulture. Reference genomes are available for several Populus species, however, our goals were to produce a very high quality de novo, chromosome-level genome assembly in P. tomentosa genome that could serve as a reference for evolutionary and ecological studies of hybrid speciation. Here, combining long-read sequencing and Hi-C scaffolding, we present a high-quality, haplotype-resolved genome assembly. The genome size was 740.2 Mb, with a contig N50 size of 5.47 Mb and a scaffold N50 size of 46.68 Mb, consisting of 38 chromosomes, as expected with the known diploid chromosome number (2n=2x=38). A total of 59,124 protein-coding genes were identified. Phylogenomic analyses revealed that P. tomentosa is comprised of two distinct subgenomes, which we deomonstrate is likely to have resulted from hybridization between Populus adenopoda as the female parent and Populus alba var. pyramidalis as the male parent, approximately 3.93 Mya. Although highly colinear, significant structural variation was also found between the two subgenomes. Our study provides a valuable resource for ecological genetics and forest biotechnology.
Microbial diversity and community function are related, and can be highly specialized in different gut regions. The cloacal microbiome of Sceloporus virgatus provides antifungal protection to eggshells during oviposition – a specialized function that suggests a specialized microbial composition. Here, we describe the S. virgatus cloacal microbiome from tissue and swab samples, and compare it to tissue samples from the gastrointestinal (GI) tract and oviduct, adding to the growing body of evidence of microbiome localization in reptiles. We further assessed whether common methods of microbial sampling – cloacal swabs and feces – provide accurate representations of these microbial communities and whether feces might “seed” the cloacal microbiome or impact the accuracy of cloacal swab sampling. We found that different regions of the gut had unique microbial community structures. The cloacal community, in particular, showed extreme specialization averaging 99% Proteobacteria (Phylum) and 83% Enterobacteriacaea (Family). Cloacal swabs recovered communities similar to that of lower intestine and cloacal tissues, but fecal samples had much higher diversity and a distinct composition (62% Firmicutes and 39% Lachnospiraceae) relative to all gut regions. Finally, we found that feces and cloacal swabs recover different communities, but cloacal swabs may be contaminated with fecal matter if taken immediately after defecation. These results serve as a caution against the assumption that fecal samples provide an accurate representation of the gut, and that although cloacal swabs can reflect a portion of the lower GI tract microbiome, they may also result in a mixed community of gut and fecal microbes.
To associate specimens identified by molecular characters to other biological knowledge, we need reference sequences annotated by Linnaean taxonomy. In this paper, we 1) report the creation of a comprehensive reference library of DNA barcodes for the arthropods of an entire country (Finland), 2) publish this library, and 3) deliver a new identification tool based on this resource. The reference library contains mtDNA COI barcodes for 11,275 (43%) of 26,437 arthropod species known from Finland, including 10,811 (45%) of 23,956 insect species. To quantify the improvement in identification accuracy enabled by the current reference library, we ran 1,000 Finnish insect and spider species through the Barcode of Life Data system (BOLD) identification engine. Of these, 91% were correctly assigned to a unique species when compared to the new reference library alone, 85% were correctly identified when compared to BOLD with the new material included, and 75% with the new material excluded. To capitalize on this resource, we used the new reference material to train a probabilistic taxonomic assignment tool, FinPROTAX, scoring high success. For the full-length barcode region, the accuracy of taxonomic assignments at the level of classes, orders, families, subfamilies, tribes, genera, and species reached 99.9%, 99.9%, 99.8%, 99.7%, 99.4%, 96.8%, and 88.5%, respectively. The FinBOL arthropod reference library and FinPROTAX are available through the Finnish Biodiversity Information Facility (www.laji.fi). Overall, the FinBOL investment represents a massive capacity-transfer from the taxonomic community of Finland to all sectors of society.
Ark shells are commercially important clam species that inhabit in muddy sediments of shallow coasts in East Asia. For a long time, the lack of genome resources has hindered scientific research of ark shells. Here, we reported a high-quality chromosome-level genome assembly of Scapharca kagoshimensis, with an aim to unravel the molecular basis of heme biosynthesis, and develop genomic resources for genetic breeding and population genetics in ark shells. Nineteen scaffolds corresponding to 19 chromosomes were constructed from 938 contigs (contig N50=2.01 Mb) to produce a final high-quality assembly with a total length of 1.11 Gb and scaffold N50 around 60.64 Mb. The genome assembly represents 93.4% completeness via matching 303 eukaryota core conserved genes. A total of 24,908 protein-coding genes were predicted and 24,551 genes (98.56%) of which were functionally annotated. The enrichment analyses suggested that genes in heme biosynthesis pathways were expanded and positive selection of the hemoglobin genes was also found in the genome of S. kagoshimensis, which gives important insights into the molecular mechanisms and evolution of the heme biosynthesis in mollusca. The valuable genome assembly of S. kagoshimensis would provide a solid foundation for investigating the molecular mechanisms that underlie the diverse biological functions and evolutionary adaptations of S. kagoshimensis.
Metabarcoding of DNA extracted from community samples of whole organisms (whole organism community DNA, wocDNA) is increasingly being applied to terrestrial, marine and freshwater metazoan communities to provide rapid, accurate and high resolution data for novel molecular ecology research. The growth of this field has been accompanied by considerable development that builds on microbial metabarcoding methods to develop appropriate and efficient sampling and laboratory protocols for whole organism metazoan communities. However, considerably less attention has focused on ensuring bioinformatic methods are adapted and applied comprehensively in wocDNA metabarcoding. In this study we examined over 600 papers and identified 111 studies that performed COI metabarcoding of wocDNA. We then systematically reviewed the bioinformatic methods employed by these papers to identify the state-of-the-art. Our results show that the increasing use of wocDNA COI metabarcoding for metazoan diversity is characterised by a clear absence of bioinformatic harmonisation, and the temporal trends show little change in this situation. The reviewed literature showed (i) high heterogeneity across pipelines, tasks and tools used, (ii) limited or no adaptation of bioinformatic procedures to the nature of the COI fragment, and (iii) a worrying underreporting of tasks, software and parameters. Based upon these findings we propose a set of recommendations that we think the wocDNA metabarcoding community should consider to ensure that bioinformatic methods are appropriate, comprehensive and comparable. We believe that adhering to these recommendations will improve the long-term integrative potential of wocDNA COI metabarcoding for biodiversity science.
Metabarcoding of DNA extracted from environmental or bulk specimen samples is increasingly used to detect plant and animal taxa in basic and applied biodiversity research because of its targeted nature that allows sequencing of genetic markers from many samples in parallel. To achieve this, PCR amplification is carried out with primers designed to target a taxonomically informative marker within a taxonomic group, and sample-specific nucleotide identifiers are added to the amplicons prior to sequencing. This enables assignment of the sequences back to the samples they originated from. Nucleotide identifiers can be added during the metabarcoding PCR and/or during ‘library preparation’, i.e. when amplicons are prepared for sequencing. Different strategies to achieve this labelling exist. All have advantages, challenges and limitations, some of which can lead to misleading results, and in the worst case compromise the fidelity of the metabarcoding data. Given the range of questions addressed using metabarcoding, the importance of ensuring that data generation is robust and fit for purpose should be at the forefront of practitioners seeking to employ metabarcoding for biodiversity assessments. Here, we present an overview of the three main workflows for sample-specific labelling and library preparation in metabarcoding studies on Illumina sequencing platforms. Further, we distil the key considerations for researchers seeking to select an appropriate metabarcoding strategy for their specific study. Ultimately, by gaining insights into the consequences of different metabarcoding workflows, we hope to further consolidate the power of metabarcoding as a tool to assess biodiversity across a range of applications.
We present the chromosome-level genome assembly of Dysdera silvatica Schmidt, 1981, a nocturnal ground-dwelling spider endemic from the Canary Islands. The genus Dysdera has undergone a remarkable diversification in this archipelago mostly associated with shifts in the level of trophic specialization, becoming an excellent model to study the genomic drivers of adaptive radiations. The new assembly (1.37 Gb; and scaffold N50 of 174.2 Mb), was performed using the chromosome conformation capture scaffolding technique, represents a continuity improvement of more than 4,500 times with respect to the previous version. The seven largest scaffolds or pseudochromosomes cover 87% of the total assembly size and match consistently with the seven chromosomes of the karyotype of this species, including the characteristic large X chromosome. To illustrate the value of this new resource we performed a comprehensive analysis of the two major arthropod chemoreceptor gene families (i.e., gustatory and ionotropic receptors). We identified 545 chemoreceptor sequences distributed across all pseudochromosomes, with a notable underrepresentation in the X chromosome. At least 54% of them localize in 83 genomic clusters with a significantly lower evolutionary distances between them than the average of the family, suggesting a recent origin of many of them. This chromosome-level assembly is the first high-quality genome representative of the Synspermiata clade, and just the third among spiders, representing a new valuable resource to gain insights into the structure and organization of chelicerate genomes, including the role that structural variants, repetitive elements and large gene families played in the extraordinary biology of spiders.
Arctium lappa has a long medicinal and edible history with great economic importance. We combined Illumina and PacBio sequences to generate the first high-quality chromosome-level draft genome of A. lappa. The assembled genome is approximately 1.79 Gb with a N50 contig size of 6.88 Mb. Approximately 1.70 Gb (95.4%) of the contig sequences were anchored onto 18 chromosomes using Hi-C data; the scaffold N50 was improved to be 91.64 Mb. Furthermore, we obtained 1.12 Gb (68.46%) of repetitive sequences and 32,771 protein-coding genes; 616 positively selected candidate genes were identified. Additionally, we compared the transcriptomes of A. lappa roots at three different developmental stages and identified 8,943 differentially expressed genes (DEGs) in these tissues. Among candidate genes related to lignan biosynthesis, the following were found to be highly correlated with the accumulation of arctiin: 4-coumarate-CoA ligase (4CL), dirigent protein (DIR), and hydroxycinnamoyl transferase (HCT). These data can be utilized to identify genes related to A. lappa quality or provide a basis for molecular identification and comparative genomics among related species.
Dispersal abilities play a crucial role in shaping the extent of population genetic structure, with more mobile species being panmictic over large geographic ranges and less mobile ones organized in meta-populations exchanging migrants to different degrees. In turn, population structure directly influences the coalescence pattern of the sampled lineages, but the consequences on the estimated variation of the effective population size (Ne) over time obtained by means of unstructured demographic models remain poorly understood. However, this knowledge is crucial for biologically interpreting the observed Ne trajectory and further devising conservation strategies in endangered species. Here we investigated the demographic history of four shark species (Carharhinus melanopterus, Carharhinus limbatus, Carharhinus amblyrhynchos, Galeocerdo cuvier) with different degrees of endangered status and life history traits related to dispersal distributed in the Indo-Pacific and sampled off New Caledonia. We compared several evolutionary scenarios representing both structured (meta-population) and unstructured models and then inferred the Ne variation through time. By performing extensive coalescent simulations, we provided a general framework relating the underlying population structure and the observed Ne dynamics. On this basis, we concluded that the recent decline observed in three out of the four considered species when assuming unstructured demographic models can be explained by the presence of population structure. Furthermore, we also demonstrated the limits of the inferences based on the sole site frequency spectrum and warn that statistics based on linkage disequilibrium will be needed to exclude recent demographic events affecting meta-populations.
Metabarcoding of environmental DNA (eDNA) is now widely used to build diversity profiles from DNA that has been shed by species into the environment. There is substantial interest in the expansion of eDNA approaches for improved detection of terrestrial vertebrates using invertebrate-derived DNA (iDNA) in which hematophagous, sarcophagous, and coprophagous invertebrates sample vertebrate blood, carrion, or feces. Here, we use metabarcoding and multiple iDNA samplers (carrion flies, sandflies, and mosquitos) to profile gamma and alpha diversity in a dry, tropical forest in the southern Amazon. Our main objectives were to (1) compare diversity found with iDNA to camera trapping, which is the conventional method of vertebrate diversity surveillance and (2) compare each of the iDNA samplers to assess the effectiveness, efficiency, and potential biases associated with each sampler. Carrion flies were the most effective sampler, despite the least amount of sampling effort and the fewest number of individuals captured for metabarcoding, in describing vertebrate biodiversity followed by sandflies. Camera traps had the highest median species richness at the site-level but showed strong bias towards carnivore and ungulate species and missed much of the diversity described by iDNA methods. Mosquitos showed a strong feeding preference for humans as did sandflies for armadillos, thus presenting potential utility to further study related to host-vector interactions.
Biodiversity inventory remains limited in marine systems due to unbalanced access to the three ocean dimensions. The use of environmental DNA (eDNA) for metabarcoding allows fast and effective biodiversity inventory and is forecast as a future biodiversity research and biomonitoring tool. However, in poorly understood ecosystems, eDNA results remain difficult to interpret due to large gaps in reference databases and PCR bias limiting the detection of some major phyla. Here, we aimed to circumvent these limitations by avoiding PCR and recollecting larger DNA fragments to improve assignment of detected taxa through phylogenetic reconstruction. We applied capture by hybridization (CBH) to enrich DNA from deep-sea sediment samples and compared the results with those obtained through an up-to-date metabarcoding PCR-based approach (MTB). Originally developed for bacterial communities by targeting 16S rDNA, the CBH approach was applied to 18S rDNA to improve the detection of species forming benthic communities of eukaryotes, with particular focus on metazoans. The results confirmed the possibility of extending CBH to metazoans with two major advantages: i) CBH revealed a broader spectrum of prokaryotic, eukaryotic, and particularly metazoan diversity, and ii) CBH allowed much more robust phylogenetic reconstructions of full-length barcodes with up to 1900 base pairs. This is particularly important for taxa whose assignment is hampered by gaps in reference databases. This study provides a database and probes to apply 18S CBH to diverse marine systems, confirming this promising new tool to improve biodiversity assessments in data-poor ecosystems like those in the deep sea.
Pollinators are in decline thanks to the combined stresses of disease, pesticides, habitat loss, and climate. Honey bees face numerous pests and pathogens but arguably none are as devastating as Deformed wing virus (DWV). Understanding host-pathogen interactions and virulence of DWV in honey bees is slowed by the lack of cost-effective high-throughput screening methods for viral infection. Currently, analysis of virus infection in bees and their colonies is tedious, requiring a well-equipped molecular biology laboratory and the use of hazardous chemicals. Here we describe cDNA clones of DWV tagged with green fluorescent protein (GFP) or nanoluciferase (nLuc), providing high-throughput detection and quantification of virus infections. GFP fluorescence is recorded non-invasively in living bees via commonly available long-wave UV light sources and a smartphone camera or a standard ultraviolet transilluminator gel imaging system. Nonlethal monitoring with GFP allows high-throughput screening and serves as a direct breeding tool for identifying honey bee parents with increased antivirus resistance. Expression using the nLuc reporter strongly correlates with virus infection levels and is especially sensitive. Using multiple reporters, it is also possible to visualize competition, differential virulence, and host tissue targeting by co-occuring pathogens. Finally, it is possible to directly assess the risk of cross-species ‘spillover’ from honey bees to other pollinators and vice versa.
Metabarcoding is an important tool for understanding fungal communities. The internal transcribed spacer (ITS) rDNA is the accepted fungal barcode but has known problems. The large subunit (LSU) rDNA has also been used to investigate fungal communities but available LSU metabarcoding primers were mostly designed to target Dikarya (Ascomycota + Basidiomycota) with little attention to early diverging fungi (EDF). However, evidence from multiple studies suggests that EDF comprise a large portion of unknown diversity in community sampling. Here we investigate how DNA marker choice and methodological biases impact recovery of EDF from environmental samples. We focused on one EDF lineage, Zoopagomycota, as an example. We evaluated three primer sets (ITS1F/ITS2, LROR/LR3, and LR3 paired with new primer LR22F) to amplify and sequence a Zoopagomycota mock community and a set of 146 environmental samples with Illumina MiSeq. We compared two taxonomy assignment methods and created an LSU reference database compatible with AMPtk software. The two taxonomy assignment methods recovered strikingly different communities of fungi and EDF. Target fragment length variation exacerbated PCR amplification biases and influenced downstream taxonomic assignments, but this effect was greater for EDF than Dikarya. To improve identification of LSU amplicons we performed phylogenetic reconstruction and illustrate the advantages of this critical tool for investigating identified and unidentified sequences. Our results suggest much of the EDF community may be missed or misidentified with “standard” metabarcoding approaches and modified techniques are needed to understand the role of these taxa in a broader ecological context.
Phylogenetic trees have been extensively used in community ecology. However, how the phylogenetic reconstruction affects ecological inferences is poorly understood. In this study, we reconstructed three different types of phylogenetic trees (a synthetic-tree generated using VPhylomaker, a barcode-tree generated using rbcL+matK+trnH-psbA and a genome-tree generated from plastid genomes) that represented an increasing level of phylogenetic resolution among 580 woody plant species from six dynamic plots in subtropical evergreen broadleaved forests of China. We then evaluated the performance of each phylogeny in estimations of community phylogenetic structure, turnover and phylogenetic signal in functional traits. As expected, the genome-tree was most resolved and most supported for relationships among species. For local phylogenetic structure, the three trees showed consistent results with Faith’s PD and MPD; however, only the synthetic-tree produced significant clustering patterns using MNTD for some plots. For phylogenetic turnover, contrasting results between the molecular trees and the synthetic-tree occurred only with nearest neighbor distance. The barcode-tree agreed more with the genome-tree than the synthetic-tree for both phylogenetic structure and turnover. For functional traits, both the barcode-tree and genome-tree detected phylogenetic signal in maximum height, but only the genome-tree detected signal in leaf width. This is the first study that uses plastid genomes in large-scale community phylogenetics. Our results highlight the outperformance of genome-trees over barcode-trees and synthetic-trees for the analyses studied here. Our results also point to the possibility of Type I and II errors in estimation of phylogenetic structure and turnover and detection of phylogenetic signal when using synthetic-trees.
A high-quality reference genome is necessary to determine the molecular mechanisms underlying important biological phenomena; therefore, in the present study, a chromosome-level genome assembly of the Chinese shrimp Fenneropenaeus chinensis was performed. Muscle of a male shrimp was sequenced using PacBio platform, and assembled by Hi-C technology. The assembled F. chinensis genome was 1,465.32 Mb with contig N50 of 472.84 Kb, including 57.73% repetitive sequences, and was anchored to 43 pseudochromosomes, with scaffold N50 of 36.87 Mb. In total, 25,026 protein-coding genes were predicted. The genome size of F. chinensis showed significant contraction in comparison with that of other penaeid species, which is likely related to migration observed in this species. However, the F. chinensis genome included several expanded gene families related to cellular processes and metabolic processes, and the contracted gene families were associated with virus infection process. The findings signify the adaptation of F. chinensis to the selection pressure of migration and cold environment. Furthermore, the selection signature analysis identified genes associated with metabolism, phototransduction, and nervous system in cultured shrimps when compared with wild population, indicating targeted, artificial selection of growth, vision, and behavior during domestication. The construction of the genome of F. chinensis provided valuable information for the further genetic mechanism analysis of important biological processes, and will facilitate the research of genetic changes during evolution.
Here we present an annotated, chromosome-anchored, genome assembly for Lake Trout (Salvelinus namaycush) – a highly diverse salmonid species of notable conservation concern and an excellent model for research on adaptation and speciation. We leveraged Pacific Biosciences long-read sequencing, paired-end Illumina sequencing, proximity ligation (Hi-C), and a previously published linkage map to produce a highly contiguous assembly composed of 7,378 contigs (contig N50 = 1.8 mb) assigned to 4,120 scaffolds (scaffold N50 = 44.975 mb). 84.7% of the genome was assigned to 42 chromosome-sized scaffolds and 93.2% of Benchmarking Universal Single Copy Orthologs were recovered, putting this assembly on par with the best currently available salmonid genomes. Estimates of genome size based on k-mer frequency analysis were highly similar to the total size of the finished genome, suggesting that the entirety of the genome was recovered. A mitome assembly was also produced. Self-vs-self synteny analysis allowed us to identify homeologs resulting from the Salmonid specific autotetraploid event (Ss4R) and alignment with three other salmonid species allowed us to identify homologous chromosomes in other species. We also generated multiple resources useful for future genomic research on Lake Trout including a repeat library and a sex averaged recombination map. A novel RNA sequencing dataset was also used to produce a publicly available set of gene annotations using the National Center for Biotechnology Information Eukaryotic Genome Annotation Pipeline. Potential applications of these resources to population genetics and the conservation of native populations are discussed.
The promotion of responsible and sustainable trade in biological resources is widely proposed as one solution to mitigate currently high levels of global biodiversity loss. Various molecular identification methods have been proposed as appropriate tools for monitoring global supply chains of commercialized animals and plants. We demonstrate the efficacy of target capture genomic barcoding in identifying and establishing the geographic origin of samples traded as Anacyclus pyrethrum, a medicinal plant assessed as globally vulnerable in the IUCN Red List. Samples collected from national and international supply chains were identified through target capture sequencing of 443 low-copy nuclear makers and compared to results derived from genome skimming of plastome, standard plastid barcoding regions and ITS. Both target capture and genome skimming provided approximately 3.4 million reads per sample, but target capture largely outperformed standard plant DNA barcodes and entire plastid genome sequences. Despite the difficulty of distinguishing among closely related species and infraspecific taxa of Anacyclus using conventional taxonomic methods, we succeeded in identifying 89 of 110 analysed samples to subspecies level without ambiguity through target capture. Furthermore, we were able to discern the geographical origin of Anacyclus samples collected in Moroccan, Indian and Sri Lankan markets, differentiating between plant materials originally harvested from diverse populations in Algeria and Morocco. With a recent drop in the cost of analysing samples, target capture offers the potential to routinely identify commercialized plant species and determine their geographic origin. It promises to play an important role in monitoring and regulation of plant species in trade, supporting biodiversity conservation efforts, and in ensuring that plant products are unadulterated, contributing to consumer protection.
Until recently many historical museum specimens were largely inaccessible to genomic inquiry, but high-throughput sequencing (HTS) approaches have allowed researchers to successfully sequence genomic DNA from dried and fluid-preserved museum specimens. In addition to preserved specimens, many museums contain large series of allozyme supernatant samples but the amenability of these samples to HTS has not yet been assessed. Here, we compared the performance of a target-capture approach using alternative sources of genomic DNA from ten specimens of spring salamanders (Plethodontidae: Gyrinophilus porphyriticus) collected 1985–1990: allozyme supernatants, allozyme homogenate pellets, and formalin-fixed tissues. We designed capture probes based on double-digest restriction-site associated (RADseq) sequencing derived loci from seven of the specimens and assessed the success and consistency of capture and RADseq technical replicates. This study design enabled direct comparisons of data quality and potential biases among the different datasets for phylogenomic and population genomic analyses. We found that in phylogenetic analyses, all replicates for a given specimen clustered together, but in principal component space, RADseq replicates did not cluster with corresponding capture-based replicates. SNP calls were on average 18.3% different between technical replicates, but these discrepancies were primarily due to differences in heterozygous/homozygous SNP calls. We demonstrate that both allozyme supernatant and formalin-fixed samples can be successfully used for population genomic analyses and we discuss ways to identify and reduce biases associated with combining capture and RADseq data.