Research Article |
Corresponding author: Christopher J. Glasby ( chris.glasby@nt.gov.au ) Corresponding author: Jiang‐Shiou Hwang ( jshwang@mail.ntou.edu.tw ) Corresponding author: Lizhe Cai ( cailizhe@xmu.edu.cn ) Academic editor: Greg Rouse
© 2024 Deyuan Yang, Sheng Zeng, Zhi Wang, Yanjie Zhang, Dazuo Yang, Christopher J. Glasby, Jiang‐Shiou Hwang, Lizhe Cai.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Yang D, Zeng S, Wang Z, Zhang Y, Yang D, Glasby CJ, Hwang J‐S, Cai L (2024) Molecular systematics of Perinereis and an investigation of the status and relationships of the cultured species Perinereis wilsoni Glasby & Hsieh, 2006 (Annelida, Nereididae). Zoosystematics and Evolution 100(4): 1297-1314. https://doi.org/10.3897/zse.100.127201
|
In this study, we conducted morphological and molecular analyses of Perinereis wilsoni, a species being considered for aquaculture in China. We found this species difficult to identify because of its close morphological similarity to the sympatric P. mictodonta and thus sought genetic markers to more easily distinguish it and to investigate its phylogenetic relationship to P. mictodonta and other nereidids. For the first time, we sequenced, assembled, and annotated the complete mitochondrial genome, nuclear ribosomal sequences (18S-ITS1-5.8S-ITS2-28S), and four nuclear histone genes (H3-H2A-H2B-H4) of P. wilsoni. Comprehensive bioinformatics methods were employed to assemble the genome-skimming data of P. wilsoni to ensure assembly quality. Phylogenetic analyses based on five datasets of the available mitochondrial genomes (32 taxa in Nereididae, including 8 taxa in Perinereis), using maximum likelihood and Bayesian analyses, provide support for the monophyly of the genus Perinereis. In contrast, the P. nuntia species group, a subgroup within Perinereis, is nonmonophyletic. Perinereis wilsoni has a closer phylogenetic relationship with P. vancaurica and P. nuntia. Our study serves as a baseline for future work on the cultivation, reproductive biology, and phylogeny of P. wilsoni.
Bioinformatic analyses, genome skimming, mitogenomes, Perinereis
Recently, a local Perinereis species intended for large-scale aquaculture in China by the Key Laboratory of Marine Bio-resource Restoration and Habitat Reparation in Liaoning Province, Dalian Ocean University, was identified as P. wilsoni Glasby & Hsieh, 2006. The genus Perinereis Kinberg, 1865, belongs to the Nereididae. This family comprises over 700 described species and 45 genera (
Perinereis represents the second-largest genus in Nereididae, with approximately 100 species (
Perinereis wilsoni, a member of the P. nuntia species group, is morphologically very similar to Perinereis mictodonta (Marenzeller, 1879), with no distinct morphological differences, although slight but statistically significant morphometric differences were found with respect to paragnath numbers and the relative length of the dorsal cirri (
Given the considerable economic importance of Perinereis species, accurate species identification is crucial because correct scientific names can facilitate the linking of subsequent physiological or reproductive studies, thus ensuring the reproducibility of these research efforts (
In this study, we used low-coverage whole genome sequencing, also known as genome skimming (
The specimens of P. wilsoni were sampled from Dalian, Liaoning Province, China (38.8732°N, 121.6767°E) and identified by Deyuan Yang. All specimens were fixed directly in 95% ethanol. Two specimens with a fully-everted pharynx were used for morphological and molecular studies. They were deposited at
Xiamen University (
Before DNA extraction, each individual was cleaned with 95% ethanol. To avoid gut contamination, two to seven parapodia were clipped from the specimens. Whole genomic DNA was extracted using the TIANamp Genomic DNA Kit (TIANGEN, Beijing, China). Genome skimming was conducted on the Illumina NovaSeq X Plus with a PE150 strategy. DNA extraction and sequencing were performed at Novogene Bioinformatics Technology Co., Ltd. (Beijing, China). Our initial sequencing effort was to obtain 5 Gb of raw data for each sample. Voucher number 23007-2 was increased to 25 Gb to explore more universal single-copy ortholog genes.
Raw paired-end reads were removed from sequence adapters, and low-quality regions were trimmed using Fastp v.0.23.4 (
To ensure reliable assembly for the present data, various assemblers were used: 1) SPAdes v.3.15.5 (
For de novo assembly results (SPAdes, Megahit, Ray, and IDBA-UD), we used MitoFinder v.1.4 (
The mitogenome was annotated using MITOS2 (
The boundaries of each gene were determined by the following rules: 1) Protein-coding genes can overlap with each other but cannot overlap with tRNA genes. In Nereididae, only the ND4 and ND4L coding genes are allowed to overlap. 2) The boundaries of mitochondrial rRNA genes (12S and 16S) were defined by flanking genes. 3) The large non-coding region (or putative control region) was determined by the boundaries of neighboring genes and a low GC region.
The CGView Comparison Tool (CCT) (https://github.com/paulstothard/cgview_comparison_tool) (
The high-copy nuclear markers (18S to 28S genes and histone genes) were assembled using GetOrganelle v.1.7.6.1. Considering the lack of effective software or pipelines to annotate these genes, we manually annotated nuclear rRNA genes in Geneious using primer pairs to define gene boundaries: 18SA and 18SB (
BUSCO v.5.6 was employed to generate universal single-copy orthologs (USCOs) (
The phylogenetic analysis utilized two types of datasets: complete or near-complete mitogenomes of Nereididae and DNA barcoding sequences (COX1, 16S, and ITS) of Perinereis. All publicly available Nereididae mitogenomes from GenBank (as of 15 January 2024), along with two outgroup species (Craseoschema thyasiricola NC 060815 and Leocrates chinensis NC 066969, belonging to Chrysopetalidae and Hesionidae, respectively), were included in the phylogenetic analysis. The outgroup taxa were selected based on previous hypotheses that Chrysopetalidae and Hesionidae are sister groups to Nereididae (
To identify potential taxonomically mislabeled sequences in the mitogenomes dataset, we extracted the COX1 and 16S genes and conducted NCBI BLAST analysis. The results were downloaded as CSV files. For each sequence, the top 10 matching sequences were sorted based on identity using WEKits v.1.0 (https://github.com/GP-sir/wekits/releases). Next, we retrieved the source information (subjectAcc) for these matched sequences from NCBI. The species identification of each sequence was then manually verified in Microsoft Excel. Specifically, we focused on matched sequences with identity over 97% and checked whether they were supported by reliable taxonomic references (Suppl. material
All manually curated mitogenomes were imported into PhyloSuite v.1.2.3. The extracted protein-coding genes (PCGs) and two rRNAs (2R) from these sequences were aligned using MAFFT in normal mode. MACSE v.2 (
ModelFinder v.2.2.0 was used to select the best substitution models for each partition of the maximum likelihood (ML) and Bayesian inference (BI) analyses (Suppl. material
Considering the mitogenomes of Perinereis are limited, shorter DNA sequences (DNA barcodes) were also used to explore the phylogenetic position of Perinereis wilsoni. All available Perinereis genes from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/?term=Perinereis) were downloaded. The COX1, 16S, 18S, 28S, H3, and ITS genes were further extracted because these molecular markers are widely employed in Annelida. To organize the data more efficiently, we used a custom script (Suppl. material
Multiple alignment summary statistics were generated using the alignment summary function in BioKIT (
Pairwise tree structure comparison was conducted using the all.equal.phylo function in the ape v.5.7.1 package (
Strand asymmetries were calculated using the following formulae (
To determine which molecular marker is reliable for accurate species delimitation, all labeled P. wilsoni and P. mictodonta COX1, 16S, and ITS genes in NCBI were used. We used the following criteria: 1) whether the presence of a barcode gap: the minimum interspecific genetic distance is greater than the maximum intraspecific genetic distance; 2) whether each species is recovered as monophyletic; and 3) whether there is a small overlap between intra- and interspecific distances and a large barcode gap, as recommended by
• 23007-1 and 23007-2, both collected from Dalian, Liaoning, China (38.87315°N, 121.676671°E), 08 August 2023, preserved in 95% ethanol.
Description based on 23007-1 and 23007-2. 23007-1 complete, 93 chaetigers, 7.0 cm in length, 2.9 mm wide at chaetiger 10 (excluding parapodia); and 4.0 mm wide at chaetiger 10 (including parapodia) (Fig.
Perinereis wilsoni Glasby & Hsieh, 2006; A–I. (except J, K) 23007-1; J, K. 23007-2; A. Entire body in dorsal view; B. Right jaw, ventral, and dorsal view; C. Anterior region with pharynx everted, dorsal view; D. Maxillary ring, frontal view; E. Anterior end with pharynx everted, ventral view; F. Left parapodium, posterior view, chaetiger 10; G. Right parapodium, posterior view, chaetiger 30; H. Sub-acicular neuropodial heterogomph falciger, chaetiger 30; I. Right parapodium, posterior view, chaetiger 80; J. Anterior view with pharynx everted, dorsal view; K. Anterior view with pharynx everted, ventral view. All photos were taken by Deyuan Yang.
Prostomium and anterior dorsum with dark brown pigmentation. Prostomium anterior margin entire, pear-shaped, wider than long, with shallow longitudinal groove in central area. Antennae conical, about one-third length relative to prostomium length. Palps longer than prostomium, biarticulate, with palpophores and palpostyles (Fig.
Pharynx fully everted (Fig.
Paragnath counts (Fig.
Notopodia with 2 lobes, prechaetal lobe absent. Dorsal cirri longer than notopodial ligule, about 1.5 times length of the notopodial ligule throughout. Notopodial ligule similar to median ligule throughout. Neuropodial postchaetal lobe rounded, not projecting beyond acicular lobe. Ventral ligule similar in length to acicular ligule in all chaetigers (Fig.
Notopodia homogomph spinigers only. Neuropodial heterogomph spinigers present throughout. At chaetiger 10, lower neurochaetae all heterogomph falcigers, upper neurochaetae with heterogomph falcigers, and heterogomph spinigers. At chaetiger 30 and following chaetigers, lower neurochaetae present heterogomph spinigers.
Perinereis wilsoni was established by
The ITS genes may presently be the most effective and reliable method for accurately identifying these species, as these sequences were generated from the paratypes of P. wilsoni (
Japan; China (based on ITS genes). Other records require validation using at least ITS data.
The mitogenomes (23007-1 and 23007-2) generated from various assemblers yielded consistent results, except for the Ray assembler. They are 15,817 bp with an average coverage depth of 315×, and read mapping is 0.09% (Suppl. material
The gene map of Perinereis wilsoni (23007-1), 23007-2, is the same as 23007-1 but only shows one; A. Mitogenome; B. Nuclear rRNA cluster; C. And histone genes; D. CCT (CGView Comparison Tool) map and sequence identity compare the mitogenome between P. wilsoni and the other nereidids. A–C. The green lines depict the distribution of coverage depth; D. Starting from the outermost ring, the feature rings depict: 1. COG (Clusters of Orthologous Groups of Proteins) functional categories for forward strand coding sequences; 2. Forward strand sequence features; 3. The remaining rings show regions of sequence similarity detected by BLAST comparisons between CDS translations from the reference genome and 31 comparison genomes. BLAST identities are organized from high to low, with higher values closer to the outer ring.
The NCBI BLAST results of the mitogenome revealed that the sequence identity with the published 16S sequences of P. wilsoni (LC482171–LC482183,
The nuclear rRNA contigs for the two specimens (23007-1 and 23007-2) yielded 10,983 bp and 10,820 bp with average coverages of 218.3× and 306×, respectively. Both results contained the full 18S, ITS1, 5.8S, ITS2, and 28S regions, with gene lengths of 1,849 bp, 358 bp, 160 bp, 305 bp, and 3,840 bp, respectively (Fig.
The NCBI BLAST analysis of nuclear rRNA and histone genes showed that the 18S sequence had over 99% sequence identity with eight species belonging to five genera, indicating that 18S may not be suitable for species identification: 91.43% for 28S with Alitta virens (OW028578); 98.48%-98.54% for ITS with P. wilsoni (AF332158–AF332162,
The mitochondrial genes in P. wilsoni exhibit a high A + T content of 64.6% (34.3% T and 30.3% A) and lower levels of C and G at 21.6% and 13.8%, respectively (Suppl. material
The most frequently utilized amino acids in the mitogenome of P. wilsoni are Leu (14.83%), Ile (9.94%), Ser (8.60%), and Ala (7.84%). The least common amino acids are Cys (1.01%), Arg (1.78%), Asp (1.78%), and Gln (1.94%) (Fig.
Summary statistics for multiple alignments of various datasets are available in Suppl. material
A total of ten ML and BI trees were inferred from five mitogenome datasets of Nereididae (see Suppl. material
An analysis of phylogenies from five mitogenome datasets. A. Maximum likelihood (ML) and Bayesian inference (BI) tree of Nereididae based on the dataset 13PCGs. The GenBank accession numbers used are listed after the species names. The scale bar (0.5) corresponds to the estimated number of substitutions per site. Numbers at nodes are statistical support values for ML bootstrap support. Asterisks denote 100% bootstrap support. “-” indicates no support value. Color-coded clades are four subfamilies within Nereididae. The gene order is shown to the right. “?” indicates the deletion of a gene; B. A two-dimensional MDS plot of eight trees (excluding 13PCGs12_ML and 13PCGs12_BI), colored by different clusters. C–F. For each cluster identified in B., a representative tree was selected. Note: Neanthes glandicincta (NC 035893) is an incorrectly identified taxon, which should be the genus Dendronereis. Neanthes acuminata (OQ729916) and Perinereis fayedensis (OQ729919) should be Perinereis suezensis and Perinereis damietta, respectively. Perinereis aibuhitensis (NC 023943) should be Perinereis linea (NC 063944).
Phylogenetic analyses based on five datasets of the available mitochondrial genomes (32 taxa in Nereididae, including 8 taxa in Perinereis) provide support for the monophyly of the genus Perinereis. Perinereis was either sister to the genus Nereis and the species Cheilonereis cyclurus (MF538532) (Fig.
Phylogenetic trees (ML and BI) based on COX1 genes recovered that P. wilsoni (23007-1, 23007-2) have a closer relationship with P. mictodonta (KC800632, KC800630, KC800628), with nodal support values (BS = 83%, PP = 0.94). All taxa labeled P. wilsoni and P. mictodonta did not each form a monophyletic group, respectively; instead, they were divided into five distinct clades. This suggests the potential of cryptic species present within P. wilsoni and P. mictodonta or that the specimens were sampled from geographically distant localities with some degree of isolation. These five clades were also supported by genetic distances, which have a distinct barcode gap from each other (Fig.
The heatmap of COX1 A, 16S B., and ITS C. p-distance for Perinereis wilsoni and P. mictodonta, with the barcode gap and distance overlap of them on the left. Bayesian inference (BI) phylogenetic trees for the two species, based on sequences of COX1 D, 16S E, and ITS F, are excerpted from Suppl. materials
The positions of Paraleonnates uschakovi (NC 032361) and Laeonereis culveri (KU992689) are unstable, jumping across different phylogenies (Fig.
There are two primary types of mitochondrial gene order in the known mitochondrial genomes of Nereididae, except for L. culveri (KU992689). The first type of gene order is observed in the subfamilies Gymnonereidinae, Dendronereinae, and P. uschakovi. The second type is found in the subfamily Nereidinae, except for L. culveri (see Fig.
The nucleotide diversity (Pi) analysis was conducted using concatenated alignments of 13 PCGs and two rRNAs of 32 Nereididae species. The sequence variation ratio exhibits variable nucleotide diversity between the Nereididae, with Pi values for the 100 bp windows ranging from 0.061 to 0.563 (Fig.
Nucleotide diversity analysis: A. Of 13 PCGs + two rRNAs and Ka/Ks rates; B. Of 13 PCGs based on 32 Nereididae species. The Pi values for the 13 PCGs + two rRNAs are shown on the graph. The red line represents the value of nucleotide diversity (Pi) (window size = 100 bp, step size = 20 bp). The pink, purple, and green columns represent the values of Ka, Ks, and Ka/Ks, respectively.
Before discussing this topic, it is crucial to specify the species concept employed in this study. Despite approximately 30 species concepts having been proposed (
Perinereis wilsoni and P. mictodonta were initially established as separate species based on statistically validated morphometric differences and the results of the ITS genes (
Partial mitochondrial genes, like COX1 and 16S, and nuclear genes, like 18S, 28S, ITS, and H3, are widely used in Nereididae for species discovery and phylogenetic studies (
Given that species within the P. nuntia complex are morphologically similar and not easily distinguishable based on morphology alone, molecular-based identification can offer a faster and more reliable method for species identification when reliable molecular references are available. Our analyses were unable to confirm the discriminative capability of the COX1 gene between P. wilsoni and P. mictodonta, as the sequences sourced from public databases were not linked to morphological and ITS gene data, thus limiting our further study. Additionally, the 16S gene was proven to be insufficient to distinguish these two species. Although ITS genes have been found effective in distinguishing cryptic species (
With advances in sequencing technologies, high-throughput sequencing (HTS) is more efficient than PCR-based Sanger sequencing in obtaining molecular markers. Recently, some new molecular markers have been proposed, such as nearly universal single-copy nuclear protein-coding genes (
Currently, a robust phylogenetic backbone of the genus Perinereis, based on phylogenomic methods and extensive taxon sampling encompassing major species in all five informal grouping schemes proposed by
Based on the available mitogenome datasets, Perinereis wilsoni is a sister group to P. vancaurica and P. nuntia, with high nodal support in all phylogenetic trees. In contrast, single-gene phylogenetic trees suggest that P. wilsoni is more closely related to P. mictodonta, with low nodal support. Although including more taxa, single-gene trees do not provide sufficient resolution in phylogeny. In summary, the phylogenetic relationships of P. wilsoni remain poorly understood due to the limited number of Perinereis species or other Nereididae for which genomic data are available.
A deep discussion of the phylogeny of Nereididae is beyond the scope of our current work. Here, we provide a brief discussion. The positions of P. uschakovi and L. culveri (KU992689) were observed to be unstable in the trees, inferred from different mitogenome datasets in this study.
In detail, L. culveri always nested within the subfamily Nereidinae, which was also found in previous studies using different datasets, such as COX1, 16S, and 18S (
In this study, we uncovered potential errors in assembly and annotation within GenBank. However, the absence of corresponding Sequence Read Archive (SRA) data in public databases hampers accurate confirmation of these errors. Consequently, we filtered these sequences based solely on our expertise. Therefore, I (Deyuan Yang) advocate for the uploading of original data (raw data) to public databases, such as NCBI. Even if some authors are unwilling to upload their data, various assembly methods should be employed to ensure the accuracy of the assemblies.
Additionally, we emphasize that carefully curated datasets (verifying the taxonomic identification of the species used in the phylogenetic study), especially those from public databases, are crucial before conducting phylogenetic studies, as taxonomic misidentifications can lead to incorrect conclusions. For example,
D.Y.Y. and S.Z. wrote the manuscript, and S.Z. wrote the part on mitochondrial genes, codon usage, nucleotide diversity, and evolutionary rate analyses. D.Y.Y. and S.Z. analyzed the data. Z.W., Y.J.Z., and D.Z.Y. participated in the discussion and reviewed the manuscript. C.J.G., J.S.H., and L.Z.C. conceived and designed, supervised the work, and reviewed drafts of the paper. S.Z. and D.Y.Y. contributed equally to this manuscript. All authors have read and agreed to the published version of the manuscript.
Many thanks to Yuanzheng Meng for helping us organize and format the literature for this paper. Thanks to Grammarly (https://www.grammarly.com/) for grammar correction while writing the first manuscript and ChatGPT 4.0 for generating some Python and R scripts during this study. Thanks to reviewer Robin Wilson for his constructive comments, especially focused on the phylogeny of the genus Perinereis, which have greatly contributed to the revision of our article. This work was supported by the Youth Fund of the National Natural Science Foundation of China (42306107) and the China Postdoctoral Science Foundation (2021M691866).
The reports from FastQC
Data type: pdf
Explanation note: (a) Basic information on clean data, including duplicate reads (%, Dups), average GC content (%, GC), and total sequences (millions, M Seqs). (b) Sequence counts for each sample. Duplicate read counts are an estimate only.
The report of Quast
Data type: pdf
Explanation note: (a) Basic information of various assemblers. (b) The cumulative length of each assembler. All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs).
Additional information
Data type: xlsx
Explanation note: table S1. The Blast results of mitochondrial gene, 18S, 28S, ITS, and histone genes (Only 20 sequences are shown). table S2. List of 32 species and two outgroups used in this paper. table S3. Sequence Information from NCBI BLAST Analysis of the COX1 Gene. table S4. Sequence Information from NCBI BLAST Analysis of the 16S Gene. table S5. Original and Gblock lengths of the PCGs and PCGsAA sequences. table S6. Best partitioning schemes and models based on different datasets for maximum likelihood and Bayesian inference analysis. table S7. COI, 16S, 18S, 28S, ITS, and H3 gene sequences information of Perinereis. table S8. Nucleotide composition and skewness comparison of different elements of the mitochondrial genomes of P. wilsoni. table S9. Features of the P. wilsoni mitogenome. table S10. Codon numbers and relative synonymous codon usage (RSCU) of 13 PCGs in the P. wilsoni mitogenome. table S11. Summary statistics for multiple alignment of various datasets.
The Python script for categorizing data
Data type: docx
Heterogeneity of sequence composition of mitochondrial genomes for 5 different data sets
Data type: pdf
10 trees from different datasets and tree-building methods
Data type: pdf
Phylogenetic trees of Perinereis based on the COX1 dataset
Data type: pdf
Explanation note: (a) Bayesian inference (BI, left) and (b) maximum likelihood (ML, right) method
Phylogenetic trees of Perinereis based on the 16S dataset
Data type: pdf
Explanation note: (a) Bayesian inference (BI, left) and (b) maximum likelihood (ML, right) methods.
Phylogenetic trees of Perinereis based on the ITS dataset
Data type: pdf
Explanation note: (a) maximum likelihood (ML, left) and (b) Bayesian inference (BI, right) methods.