DNA barcoding of the genus Alburnoides Jeitteles, 1861 (Actinopterygii, Cyprinidae) from Anatolia, Turkey

The present study investigated the ability of DNA barcoding to reliably identify the endemic freshwater species in Turkey, known as biodiversity hotspots. The barcode region (652 bp) of the mitochondrial cytochrome c oxidase subunit I (COI) was used to barcode 153 individuals from 13 morphologically identified species of the genus Alburnoides . Based on the Kimura two-parameter (K2P) evolution model, the average interspecific distance (0.0595) was 31-fold higher than the average intraspecific distance (0.0019). There was a clear-cut barcode gap (0.0158–0.0187) between maximum intraspecific distance ( A. tzanevi and A. velioglui ) and minimum nearest-neighbour distance ( A. freyhofi and A. kurui ) for Anatolian Alburnoides species and a common genetic threshold of 0.0158 sequence divergence was defined for species delimitation. The multiple species delimitation methods (ABGD, ASAP, GMYC and bPTP) revealed a total of 11 molecular operational taxonomic units (MOTUs) for 13 morphospecies. Neighbour-joining (NJ), Maximum Likelihood (ML) and Bayesian Inference (BI) tree analysis indicated that all haplotypes were clustered into two major clades, which corresponded to eleven Alburnoides species clusters, with strong bootstrap support. Furthermore, all the specimens clustered in concurrence with the morpho-taxonomic status of the species, except for two species ( A. coskuncelebii and A. emineae ) that were morphologically differentiated, but showed overlap in variation for COI-based DNA barcode data with other species. Overall, present results identified that COI-based DNA barcoding is effective for species identification and cataloguing of genus Alburnoides in Turkey.


Introduction
Turkey is located at the intersection of three (Caucasus, the Irano-Anatolian and the Mediterranean) of the world's 35 biodiversity hotspots (Mittermeier et al. 2004;Fricke et al. 2007). Turkish freshwater wetlands contain numerous lakes, rivers and reservoirs, which host over 180 Cyprinid species Kuru et al. 2014;Çiçek et al. 2020). However, recent reports have shown a decline in the number of fish caught from most Turkish inland waters due to habitat degradation, pollution, climate change, inadequate management of fisheries, highly-regulated large river drainages, such as dams and hydropower plants and the introduction of non-native fishes (Fricke et al. 2007;Çiçek et al. 2015;Tarkan et al. 2015;Freyhof et al. 2020). Despite being faced with serious and increasing human activities-induced threats to fish biodiversity, especially in recent years, Turkey is not yet at the level it should be in terms of biodiversity and habitat conservation activities (Şekercioğlu et al. 2011). Phylogeny and phylogeography of Alburnoides species in Turkey were revealed by analysing the mitochondrial cytochrome b (Cyt b) gene sequences (Bektas et al. 2019a). However, COI-based DNA barcoding has not been performed for the detection of intraspecific diversity and species delimitations in the genus Alburnoides, most of which are endemic to Anatolia. Hebert et al. (2003) proposed DNA barcoding technology using mitochondrial cytochrome c oxidase subunit I (COI) gene sequence, for which the intraspecific diversity is significantly lower than the interspecific diversity, as a barcode for species identification and classification. For over a decade, the COI gene has been widely used for classifying and identifying Cyprinid fishes (Hubert et al. 2008(Hubert et al. , 2015Keskin and Atar 2013;Chakraborty and Ghosh 2014;Geiger et al. 2014;Karim et al. 2016;Lakra et al. 2016;Bayçelebi et al. 2018;Aksu and Bektas 2019;Bektas et al. 2019b;Rahman et al. 2019;Bahadır et al. 2020;Bektas et al. 2022). So far, the COI-based DNA barcode has been used to identify a limited number of cyprinid fish genera or species in Turkey (Keskin et al. 2012;Keskin and Atar 2013;Geiger et al. 2014;Bektas et al. 2019b) in Turkey. However, Alburnoides populations in Iran (Jouladeh-Roudbar et al. 2016;Eagderi et al. 2019) and the Caucasus (Levin et al. 2018), which are adjacent regions to Turkey, were barcoded, based on the COI sequences, but DNA barcodes of the Alburnoides species in Anatolia, an important centre of endemism, have not yet been generated.
This paper aims to determine intraspecific and interspecific genetic distances and to identify DNA barcodes of Alburnoides species from Anatolia.

Materials and methods
Sample collection and morphological identification One hundred and fifty-three freshwater fish samples belonging to 13 Alburnoides species were collected from 47 different sites (some from type localities) in Turkish inland waters between 2016 and 2020 ( Fig. 1, Suppl. material 1). Morphological identification was performed in situ by following the counts and measurements of Turan et al. (2016Turan et al. ( , 2017. After each specimen was morphologically identified, tail fin tissue was taken, preserved in 95% ethanol and stored at -20 °C. DNA extraction, PCR amplification and DNA sequencing Genomic DNA extraction was performed using the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) in an automated DNA extraction robot (QIAcube). DNA yield and purity were quantified using a NanoDrop 2000C spectrophotometer (Thermo Scientific). Extracted DNA were stored at -20 °C. Universal PCR primers FishF1 and FishR1 (Ward et al. 2005) were used for COI amplification. Polymerase chain reaction (PCR) was performed on BIORAD T100TM Gradient thermocycler (Biorad, Hercules, CA, USA) following the protocol of Ward et al (2005). Thermal cycling conditions were as follows: 3 min initial denaturation at 94 °C, followed by 35 cycles of 1 min at 94 °C, annealing for 40 s at 54 °C, elongation for 1 min at 72 °C and final extention for 7 min at 72 °C. PCR products were electrophoresed on a 1.4% agarose gel containing ethidium bromide and visualised under UV using a gel documentation system. PCR products were purified using the QIAquick kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions and purified PCR products were directly sequenced in both directions using an ABI PRISM 3730x1 automated DNA sequencer (Applied Biosystem, Foster City, USA) following the manufacturer's instruction.
Based on the COI barcode region, phylogenetic relationships amongst the Spirlin haplotypes were reconstructed using three methods, Neighbour-joining (NJ), Bayesian Inference (BI), and Maximum Likelihood (ML). The programme jModelTest 0.1.1 (Posada 2008) was used to obtain the appropriate model of evolution using the Akaike Information Criterion (AIC; Akaike (1973)) and the Bayesian Information Criterion (BIC; Schwarz (1978)). The NJ tree was generated using MEGA X (Kumar et al. 2018) with the Kimura-2 parameter (K2P). The node support was evaluated with 1000 bootstrap pseudo-replicates. Bayesian Inference analysis was conducted by using BEAST 1.8.2 (Drummond et al. 2012), setting a coalescent tree prior and an uncorrelated lognormal relaxed clock. Each Bayesian tree was run for 10 million generations with a sampling frequency of 1,000 and were combined with a burn-in set to 25% using LogCombiner 1.8.2 (Drummond et al. 2012). A majority rule consensus tree was constructed using TreeAnnotator 1.8.2 (Rambaut and Drummond 2015). In BEAST, the divergence date of Alburnoides and Rutilus (approximately 27 Mya; Levy et al. (2009);Perea et al. (2010)) was used as a calibration point for estimating the divergence times of Alburnoides species. The ML method was performed with RAxML v.8.1.21 (Stamatakis 2014). The support of each node was estimated using a rapid bootstrap analysis with 1,000 replicates. Rutilus rutilus (GenBank Accession No. MW473258) was used as an outgroup. The obtained trees were visualised using FigTree v.1.4.3 (Rambaut 2012) and modified using Adobe Photoshop CS6 software.
For the species delimitation, we used Automatic Barcode Gap Discovery (ABGD) (Puillandre et al. 2012) and Assemble Species by Automatic Partitioning (ASAP) (Puillandre et al. 2020) (both distance-based methods) on the alignment of CO1 sequences. The ABGD species delimitation was performed via the ABGD web server with default settings (P min = 0.001, P max = 0.1, Steps = 10, X relative gap width = 1.5, Nb bins = 20) and testing of two substitution models (JC69 and K2P) for calculation of the Kimura 2-parameter distances (https://bioinfo.mnhn.fr/abi/ public/abgd/abgdweb.html, accessed on 3 March 2022). ASAP species delimitation was performed via the ASAP web server with testing of all three substitution models for calculation of the distances (https://bioinfo.mnhn.fr/ abi/public/asap/asapweb.html, accessed on 19 December 2022) using the distance matrix, generated through MEGA X (Kumar et al. 2018).
The single-locus and tree-based delimitation approaches, the General Mixed Yule-coalescent (GMYC; Pons et al. (2006)) and Bayesian poisson tree process (bPTP; Zhang et al. (2013)) models, distinguish specimens in species level using coalescence theory. The ultrametric tree required for the GMYC method was obtained using BEAST 1.8.2 (Drummond et al. 2012) on COI dataset. The MCMC chain was run for 10 million generations with the best substitution model calculated in jModelTest 0.1.1 (Posada 2008) and the Yule speciation model. The MCMC log-on prior and posterior values was examined in Tracer 1.6 (Rambaut et al. 2014) and a burn-in of 1 million generations was used to avoid suboptimal trees in the final consensus tree. The single-threshold GMYC method was run using ultrametric haplotype BEAST trees on the GMYC web server (species.h-its.org/gmyc, accessed on 06 May 2022). For PTP-based MOTU estimation, the needed rooted phylogenetic input-tree was constructed with RAxML v.8.1.21 (Stamatakis 2014) with the best substitution model. The PTP model was implemented following the default parameters and 500,000 MCMC generations on the bPTP web server (http://species.h-its. org/ptp/).
We based species delimitation on a majority consensus of the results inferred from all four methods: two species were considered a single species if at least three out of four methods harmoniously merged two species as one.

Distance-based species assignment
Distance-based methods of species assignments in conjunction with computer simulations can determine the statistical significance of species identification success rates (Huang et al. 2013). To evaluate the potential of the COI dataset for correct species identification and to select the best threshold value, these were tested using the 'Best Close Match' (BCM), implemented in Species Identifier 1.8 programme of TaxonDNA software (Meier et al. 2006).

Sequence variation of morphospecies
A total of 36 distinct haplotypes for COI barcode region were identified (Suppl. material 1). The average nucleotide frequencies were 29.71 (T), 25.98 (C), 19.20 (G) and 25.11 (A)%. The mean ratio is approximately 6:1. In this study, the average intraspecific K2P distance was 0.0019, compared with the 0.0595 intraspecific value. The mean interspecific distance was found to be 31-fold higher than the mean intraspecific distance.

Phylogenetic relationships within Alburnoides
Phylogenetic relationships amongst Alburnoides species were investigated using TIM3+I+G, which was selected as the best evolution model in BI and ML approaches for our dataset. Neighbour-joining, Maximum Likelihood and Bayesian Inference analyses generated similar tree topologies and two main clades were detected consisting of eleven deeply divergent haplogroups within Alburnoides (posterior probabilities ≥ 0.95 and 68% ≤ bootstrap values for ML and NJ, Fig. 3). The dataset was analysed using the strict molecular clock in BEAST. The timing of splitting-events of Alburnoides lineages are shown in Fig. 3.

Species delimitation
The morphological species hypothesis was not supported by our analyses using ABGD, ASAP, bPTP and GMYC, which delimited 13 species. Nevertheless, 11 putative species were recognised by all the methods that we applied to the COI dataset. We detected a distinct gap between intra-and interspecific genetic distances for 11 hypothetical species of Alburnoides, ranging from 0.0158 to 0.0187 for K2P distances (Fig. 2). Results from the four species delimitation analyses, ABGD, ASAP, bPTP and GMYC are summarised in Fig. 3. Relationships amongst the major clades, identified using the GMYC and bPTP analyses (Fig. 3), were identical to each other and to the concatenated phylogenies. Nodal support was high (> 0.95) for most nodes of the BEAST consensus tree (Fig. 3). Results from GMYC and bPTP tests corroborated the presence of eleven independent lineages in Alburnoides (Fig. 3). According to the four methods of MOTU delimitation, there was an overlap between A. tzanevi-A. coskuncelebii and A. velioglui-A. emineae belonging to the genus Alburnoides. Therefore, the Alburnoides dataset was reduced to eleven groups, based on the results of four delimitation methods (Fig. 3).

Species identification with BCM
Due to the observed discrepancy between taxa delimited, based on the four MOTU delimitation methods described above and morphospecies, all Alburnoides species in Turkey were assigned into eleven of thirteen categories. Moreover, the DNA barcoding gap, which was the maximum intraspecific distance of each species against its minimum distance to the nearest neighbour, was calculated for all species. The maximum sequence divergence amongst individuals of some species (A. tzanevi: 0.01583 and A. velioglui: 0.01582, Table 1

Phylogenetic reconstructions
Morphological identification of Spirlin can be challenging due to their similar morphology and limited morphological information. As maternally-inherited mtDNA evolves about four times faster than that of the nuclear genome, mtDNA can be used to track differences in closely-related taxa and even within species (Stepien and Kocher 1997;Desalle et al. 2017). In this study, the phylogenetic relationships inferred using mitochondrial COI (Fig. 3) are consistent with previous studies using several mitochondrial markers (Jouladeh-Roudbar et al. 2016;Stierandová et al. 2016;Levin et al. 2018), but differ in some of the deepest nodes, with the currently accepted phylogenies of Alburnoides (Cyt b, Bektas et al. 2019a).
As seen in the phylogenies of Bektas et al. (2019a), the phylogenetic reconstructions strongly support the topology of two clades (western and eastern), containing 13 Alburnoides lineages, each corresponding to the species, with moderate-high posterior probability and bootstrap values (> 0.95 pp, > 86 BT; Fig. 3). On the contrary, the analysis of the present COI dataset supports the existence of 11 instead of 13 major lineages in Alburnoides. The phylogenetic trees constructed (Fig. 3) showed that the most conspecific haplotypes cluster together, which substantially confirmed previous taxonomic assignments, based on morphology (Turan et al. 2013(Turan et al. , 2014(Turan et al. , 2016(Turan et al. , 2017Kaya 2020). Nevertheless, our phylogenetic tree topologies are completely not congruent with the morphological data: A. emineae is located inside A. velioglui and A. coskuncelebii is located inside A. tzanevi.
As most closely-related species (A. cockuncelebii / A. tzanevi and A. emineae / A. velioglui) do not share haplotypes and both species pairs are living allopatrically, the low genetic distances between them may be a result of recent speciation, thus representing the most recently diverged species in the study area. Alburnoides tzanevi is found in some rivers that flow into the Black Sea from the Bulgarian-Turkish border to Sinop, the northernmost point of Turkey, while A. coskuncelebii has only been reported from the Büyükmelen River, which is also the type locality. As can be seen in Fig. 1, the geographic distributions of A. tzanevi and A. coskuncelebii overlap. Corrected divergence estimates between the species (Fig. 3) indicate that this phylogeographical structure was initiated by historical events of the Early to Middle Pleistocene. Current distributions of two species are restricted to the Thrace and south-western Anatolia coastal areas of the Black Sea (Fig. 1), where they, thus, most likely survived several later glacial cycles in separate allopatric refugia. The Tigris-Euphrates River system originated in the Late Miocene, developed from small and probably short-lived isolated channels in the Pliocene and became the main drainage system of the region at present (Stow et al. 2020). Alburnoides velioglui is found in the upper tributaries of the Euphrates, the Karasu and Murat and A. emineae is in the Beyazsu Stream, which is the upper tributary of the Khabur River in the Euphrates drainage system, within the borders of Turkey. According to our divergence time estimates (Fig. 3), these species were probably isolated in the distant tributaries of the Euphrates after the Pliocene (Trifonov et al. 2018), when the drainage system of the region evolved into its present form.

Species delimitations
In our study, the combination of four species delimitation methods (ABGD, ASAP, bPTP and GMYC) (Fig. 3) implemented, suggested that our results generally confirm the high taxonomic utility of the COI-barcoding region in Alburnoides and most closely correspond to the current morphology-based classification of Alburnoides. Contrary to the findings of Turan et al. (2014Turan et al. ( , 2019, which reported a total of 13 Alburnoides species, a total of eleven putative partitions were identified amongst Spirlin samples, based on COI-based analysis in this study. There was a strong consensus between the species delimitation results and the phylogenetic clustering, revealing the following lineages: A. smyrnae, A. tzanevi, A. eichwaldii, A. freyhofi, A. kurui, A. turani, A. manyasensis, A. velioglui, A. kosswigi, A. diclensis and A. fasciatus (Fig. 3).
Algorithms like bPTP/GMYC help us to objectivise species or as an argument to drop poorly-described morphotaxa. The reason why we use different approaches like bPTP and GYMC is that they may give different results. When we test the number of morphotaxa using both approaches for our dataset, we found that the results of bPTP and GYMC analyses are harmonised, but underestimate the number of species. We need to discuss these morphospecies when the pPTP and GMYC do not match the tip labels (phenotypic) or give a lower number. bPTP and GMYC give 11 species standing against the 13 annotated in ML and BI trees. Looking at the topology and branch-lengths distribution in NJ and BI trees, it shows 11 clades regarding intraclade coherence and interclade distinction (Fig. 3). This result tells us that the number of morphologically distinguished species is high. Most of the major clades include only one species, with only two including genetically quite similar species. These species (A. emineae and A. coskuncelebii) are poorly separated from other ones, so reducing the number of species makes sense. A. emineae falls into the A. velioglui group, which has a higher intra-species diversity. Similarly, A. coskuncelebii is nearly identical to some tzanevi, but the A. tzanevi's intra-species diversity is higher. Therefore, there is little reason molecular-wise to keep A. emineae and A. coskuncelebii as a different species. Following the bPTP/GMYC result, species epithets, A. velioglui and A. tzanevi, can be dropped as they have priority. Consequently, the species delimitation algoritm showed that A. tzanevi and A. coskuncelebii merged into a single putative species, while A. velioglui and A. emineae were also merged into a single putative species.
Our results revealed two cases of discordance between morphology-based classification and COI-barcoding-based species delimitation. These discordances resulted from the underestimation of possible species in which taxa, currently recognised as valid taxa, collapse in single barcode units. The first case when COI-barcoding failed to recognise morphologically distinct taxa as independent MOTUs includes the A. tzanevi and A. coskuncelebii from the Black Sea drainage in Thrace and western Anatolia. In our analyses, all methods recognised A. tzanevi and A. coskuncelebii, as a single unit (Fig. 3). Alburnoides coskuncelebii was recently recognised from the Büyük Melen, a catchment situated between the Sakarya and Kızılırmak Rivers, while A. tzanevi from small streams in Thrace and adjacent Anatolia (Turan et al. 2019). These results may be explained by allopatric speciation which led to initial differentiation in mtDNA and morphology, which was reported earlier for Spirlins (Bektas et al. 2019a). The second case is the allopatric A. velioglui and A. emineae, inhabiting Murat, Munzur, Sultansuyu and Beyazsu Rivers (Suppl. material 1) in the upper tributaries of the Euphrates River Basin. These two species are easily distinguished by several diagnostic morphological characters (Turan et al. 2014), yet all species delimitation methods applied herein failed to recognise them as independent groups (Fig. 3). Further studies, including examination of nuclear DNA markers, are required to clarify evolutionary relationships and taxonomic status of these species.

Barcoding gap and automated identification
Species identification, using DNA barcoding, is based on the principle that the genetic distance between two species is much greater than that within a species (Ge et al. 2021). A 10-fold sequence difference between the average interspecific and the average intraspecific differences has been suggested as the standard COI threshold for animal species identification (Hebert et al. 2004). In our present study, the mean interspecific distance was found to be 31-fold higher than the mean intraspecific distance, which was higher than the 23-fold difference observed in Caucasian Alburnoides species (Levin et al. 2018) and the 15-fold differences observed in Iranian Alburnoides species (Jouladeh-Roudbar et al. 2016). This difference is due to the fact that the average interspecies genetic distance calculated in this study (0.0595) is higher than those (0.0395 for 13 Caucasian species and 0.0386 for 11 Iranian species) in previous studies (Jouladeh-Roudbar et al. 2016;Levin et al. 2018). In other words, our results point to clear genetic differences between the Anatolian species of Alburnoides. The high species diversity and clear interspecies genetic differentiation detected for Alburnoides can be attributed to the presence of three biodiversity hotspots (Mediterranean, Caucasus and Irano-Anatolian) that coincide in Anatolia (Mittermeier et al. 2011) and the fact that Turkey served as the refugium for southern European taxa during the last glacial age (Reyjol et al. 2007;Bilgin 2011). Anatolia, indeed, exhibits high species diversity and endemic species richness compared to its neighbouring regions (Levy et al. 2009;Giannetto and Innal 2021). The families Nemacheilidae, Gobiidae, Cobitidae and Salmonidae provide evidence that Anatolia is such a centre of diversity (Kuru et al. 2014).
Based on mitochondrial Cyt b gene sequences, Stierandová et al. (2016) reported that Alburnoides lineages in Albania show low mitochondrial distances between 0.83% and 1.43%, which are more likely to be evaluated at an intraspecific level. On the other hand, Levin et al. (2018) reported a COI-based threshold value of 1.3% for Alburnoides species in the Caucasian and Middle East regions. Our results indicate that there is a clear barcoding gap (0.0158-0.0187) separating the intraspecific and interspecific distances. Consequently, a common genetic distance threshold of 0.0158 was observed for rapid and reliable identification of Alburnoides species. We think that our threshold value, which is higher than that reported by Levin et al. (2018), is due to the complex topography and geomorphology of Anatolia (EEA 2002).

Conclusion
Our study demonstrates the usefulness of DNA barcoding for the identification of Alburnoides species in Turkey. This study contributes to the construction of DNA reference barcode data for Turkish fish fauna. We also confirm that DNA barcoding could assist in resolving issues of ambiguity in morphological identification of species. Finally, it has been concluded that COI-based DNA barcoding is a reliable and useful approach for the genetic identification of morphospecies belonging to the genus Alburnoides.