Distribution pattern and phylogeography of tree rats Chiromyscus (Rodentia, Muridae) in eastern Indochina

The study combines available data on species distribution in eastern Indochina to investigate the phylogeographical genetic and morphological diversity of tree rats (Chiromyscus, Rodentia, Muridae) and to specify their natural ranges. We examined the diversity and distribution of tree rats over its range, based on recent molecular data for mitochondrial (Cyt b, COI) and nuclear (IRBP, RAG1 and GHR) genes. The study presents the most complete and up-to-date data on the distribution and phylogeography of the genus in eastern Indochina. As revealed by mitochondrial genes, C. langbianis splits into at least four coherent geographically-distributed clades, whereas C. thomasi and C. chiropus form two distinctive mitochondrial clades each. Chiromyscus langbianis and C. chiropus show significant inconsistency in nuclear genes, whereas C. thomasi shows the same segregation pattern as can be traced by mitochondrial markers. The Northern and Southern phylogroups of C. thomasi appear to be distributed sympatrically with northern phylogroups of C. langbianis in most parts of eastern Indochina. The mitochondrial clades discovered are geographically subdivided and divergent enough to suspect independent subspecies within C. langbianis and C. thomasi. However, due to the insufficiency of obvious morphological traits, a formal description is not carried out here. The processes of recent fauna formation, species distribution patterns, dispersion routes and possible natural history in Indochina are discussed.

The evolution of the genus Chiromyscus was affected by the natural history of this region. Continuous forest cover in Indochina existed during a considerable proportion of the Pliocene and Pleistocene (Meijaard and Groves 2006), enabling the direct contact, dispersion and genetic exchange of western and eastern Indochinese populations, which are currently mostly interrupted by the extensive deforestation in areas of central Thailand and southern Cambodia. During Pleistocene glaciations, the forest edge was located at lower elevations (Cox and Moore 2000) and substantial forest contraction happened during the last glacial periods, as evidenced in Peninsular Malaysia and Palawan (Wurster et al. 2010). Cranbrook (2000) and Gathorne-Hardy et al. (2002) stressed that most areas in this region were covered with savannah vegetation unfavourable to arboreal species during long periods of the Quaternary epoch. However, the most recent surveys on the biogeography and paleoenvironments of the Sunda Shelf have suggested that the interactions between climate and sea level and their effects on the distribution patterns of the fauna and flora are more complex (van den Bergh et al. 2001;Meijaard and Groves 2006) and environmental conditions changed repeatedly during the Pleistocene. These global processes of ecosystems change were likely to have had an impact on the recent genetic structure of Chiromyscus species. This group of rodents is still quite rare in museum collections, so little information is available about its natural diversity, distribution and phylogenetic relationships and only a few specimens have been genetically characterised. In this paper, we combine available data, including novel data, on species distribution in eastern Indochina to investigate the phylogeography and diversity of these species, both genetic and morphological and specify their natural ranges.

Specimens and samples
A great number of Chiromyscus specimens, obtained in Vietnam during a series of field expeditions of the Joint Russian-Vietnamese Tropical Research and Technological Centre between 2007 and 2018, were sampled for genetic analysis in full agreement with current Vietnamese regulations in the field of Nature Protection and Biodiversity Conservation. We followed the guidelines of the American Society of Mammalogists during the collection and handling of the animals used in this survey (Gannon et al. 2011). The museum specimens were kept in the Zoological Museum, Moscow State University, Moscow, Russia (ZMMU) and the Zoological Institute, Russian Academy of Sciences, Saint-Petersburg, Russia (ZIN); genetic samples are part of collection of Joint Russian-Vietnamese Tropical Research and Technological Centre, Hanoi, Vietnam.
New samples were combined with sequences available in GenBank, including our sequences previously submitted (Balakirev and Rozhnov 2010;Balakirev et al. 2011Balakirev et al. , 2012Balakirev et al. , 2014Rowe et al. 2008;Pages et al. 2010;Zhang et al. 2016). In total, 79 specimens were investigated (47 C. langbianis, 15 C. thomasi and 17 C. chiropus). The geographic scope of the survey included 23 localities (Fig. 1, Table 1; see also Suppl. material 1: Table S1) scattered over China, Laos, Vietnam and Cambodia and constitute the known species distribution. Most of these sample specimens were collected personally by the authors (AVA, AEB).

DNA extraction
Small pieces of liver or muscle tissue were sampled in the field and stored in 96% ethanol. Total genomic DNA was extracted using a routine phenol/chloroform/proteinase K protocol (Kocher et al. 1989;Sambrook et al. 1989). The DNA was further purified either by a DNA Purification Kit (Fermentas, Thermo Fisher Scientific Inc., Pittsburgh, PA, USA) or by direct ethanol precipitation. We targeted five genes that were previously used for the phylogenetic analysis of Chiromyscus and were available for comparative analyses in GenBank. These genes included a complete Cytochrome b (Cyt b, 1140 bp); the 5'-proximal portion (680 bp) of subunit I of the Cytochrome Oxidase (COI), which is generally used for species diagnoses and for DNA barcoding for a number of mammals (Hebert et al. 2003); a portion of the first exon of the Interphotoreceptor Retinoid Binding Protein gene (IRBP, also known as Rbp3, up to 1233 bp); the first exon of the Recombination Activation Factor gene (RAG1, 1244 bp); and a portion of exon 10 of the Growth Hormone Receptor gene (GHR, 815 bp).

PCR amplification and sequencing
Cyt b was amplified using H15915R, CytbRglu (Kocher et al. 1989;Irvin et al. 1991) and CytbRCb9H primers (Robins et al. 2007). The COI gene was amplified using the universal conservative primers BatL 5310 and R6036R (Kocher et al. 1989;Irwin et al. 1991). The following universal PCR protocol was used to amplify mtDNA fragments: initial denaturation for 1 min 30 sec at 95 °C, 35 cycles of denaturation for 30 sec at 95 °C, annealing for 1 min at 52 °C and elongation for 30 sec at 72 °C, followed by terminal elongation for 2 min at 72 °C. The PCR was performed in a 30-50 µl volume that contained 2.5-3 µl 10 x standard PCR buffer, 50 mM of each dNTP, 2 mM MgCl 2 , 10 pmol of each primer, 1 unit of Taq DNA polymerase (Fermentas, Thermo Fisher Scientific Inc., Pittsburgh, PA, USA) and 0.5 µl (25-50 ng) of total DNA template per tube. The reaction was performed using a Tercik (DNK-Tehnologia, Novosibirsk, Russia) thermocycler. IRBP (1000-1610 bp in length) was amplified using the IRBP125f, IRBP1435r, IRBP1125r and IRBP1801r primers according to Stanhope et al. (1992). A nested PCR technique was applied to amplify GHR in accordance with Jansa et al. (2009). An approximately 1.0-kb portion of exon 10 from the GHR gene was amplified using the primers GHRF1 and GHRendAlt. This PCR product was re-amplified using the nested GHRF1 primer paired with GHR750R and the GHRF50 primer paired with GHRendAlt. A 1244 bp portion of the RAG1 gene was obtained using primers S70 and S71, as described by Steppan et al. (2004).
The PCR products were purified using a DNA Purification Kit (Fermentas, Thermo Fisher Scientific Inc., Pittsburgh, PA, USA). The resulting double-stranded DNA products were directly sequenced in both directions using the Applied Biosystems 3130 Genetic Analyzer with the BigDye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Waltham, Massachusetts, USA). All obtained sequences were deposited in Gen-Bank (www.ncbi.nlm.nih.gov/Genbank; MK957014-

Molecular data analyses
Individual sequences were edited manually using BioEdit v. 7.1.11 (Hall 1999) and aligned by Clustal W software incorporated into BioEdit and MEGA 6. The basic sequence parameter calculations and the best-fitting evolution models and inter-and intrapopulation divergence evaluations were performed using MEGA 6 (Tamura et al. 2013). No pseudogenes were detected for the mitochondrial genes. The optimal substitution models and their parameters are summarised in Suppl. material 1: Table S2. Genetic distances (d) between groups under Tamuta-Nei gamma distributed invariant sites including (TN93+G+I) or Tamura 3-parameter (T3P) models (Tamura et al. 2004), (depending on the best model determined) were calculated, based on the Cytb and COI genes in MEGA 6. Bayesian phylogenetic trees were inferred using MrBayes v.3.2. (Huelsenbeck and Ronquist 2001;Ronquist and Huelsenbeck 2003), two MCMCs for four chains with the default heating value and with a burn-in parameter equal to 25% of the initial number of runs. We applied 6×10 6 generations for the Cyt b dataset, 2×10 6 for the COI, 4×10 6 for the RAG1 datasets and 5×10 6 for both GHR and IRBP datasets until the average standard deviation of split frequencies dropped below the level of 0.0025 after the runs for all datasets investigated. We used a flat Dirichlet prior for the relative nucleotide frequencies and for the relative rate parameters, a discrete uniform prior for the topologies and an exponential distribution for the gamma shape parameter and all branch lengths. The gamma shape parameters for Bayesian Inference were evaluated directly from a general dataset by MrBayes v.3.2. A burn-in period of one million generations was determined graphically using TRACER v.  Kimura et al. 2015). Discrete Gamma distribution was used to model evolutionary rate differences amongst sites (five categories (+G, parameter = 0.9185)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 54.21% sites). The analysis involved all 82 nucleotide sequences and 5208 nucleotide positions.

Morphological data analyses
In total, 63 intact skulls of adult Chiromyscus (15 C. chiropus, 35 C. langbianis and 13 C. thomasi) obtained from 19 genetically-investigated localities in Vietnam (see Table 1 and Suppl. material 1: Tables S1 for references) were measured for morphometric comparison and analyses. Age was assessed by tooth wear and closure of cranial sutures. Due to the limited sampling, sexual differences were not especially tested; the possible sexual bias was compensated for by equalisation of representatives of different sexes in the sample. The sex ratio did not exceed 15 percentage points.
Principal components analysis (PCA) and canonical discriminant function analysis (DFA) were used to evaluate "distinctiveness" amongst the samples. A one-way analysis of variance (ANOVA) was performed to test the differences amongst groups on all cranial variables. The software programme Statistica 8.0 (StatSoft Inc., Tulsa, OK, USA) was used for all analytical procedures.

Phylogenetic subdivision and relationships
The most representative tree was constructed for 66 Cyt b sequences. The trees were well supported (PP = 1) (Fig. 2). Different geographical populations of  Fig. 1 and Suppl. material 1: Table S1.
C. chiropus formed a single homogeneous cluster, while C. thomasi and C. langbianis populations were represented by a few geographically-segregated clusters. C. thomasi split into two clusters, which we conventionally called the Northern and Southern phylogroups. The haplotypes forming the Northern cluster were distributed locally over the north-western part of Vietnam, whereas Southern haplotypes spread substantially more widely, stretching to the south to the Tay Nguyen Plateau (also known as the Central Highlands) and westwards to the extreme northwest of Laos. Chiromyscus langbianis formed three coherent geographically-distributed clusters. The first from the Dalat Plateau, the terra typica of this species; named hereafter as the Southern lineage. Samples originating from continental Indochina, southern Yunnan and north-eastern Vietnam in the north to the Tay Nguyen Plateau in the south formed a second cluster, the Northern lineage. Another cluster, which was sister to the Northern lineage, contained samples from Hainan Island ( Figure  2). The genetic divergence within and between these groups is shown in Tables 2, 3. The estimated genetic distances (d) for intraspecific C. langbianis and C. thomasi phylogroups were not very high and fell within the limits of 0.036 and 0.063.
Another tree constructed using the mitochondrial COI gene of 43 samples revealed another additional specific phylogenetic lineage (Suppl. material 1: Figure S2), with divergence levels as high as those for the groups described above. (Table 3). It was formed by a single sample from the Cardamom Mountains, Cambodia. Unfortunately, this was the only sample of this group and, so far, there are no data about the distribution of this genetic lineage in southern Indochina. The clustering pattern of C. thomasi samples was similar to that obtained using Cyt b, with C. chiropus demonstrating two reliable subclusters, combining the animals from the Dalat Plateau foothills (Lam Dong Province) and lowland populations (Tay Ninh, Dong Nai and Ba Ria-Vung Tau Provinces), respectively. Thus, mitochondrial genes supported C. langbianis split into four geographically-distributed phylogroups, whereas C. thomasi and C. chiropus formed two distinctive phylogroups each.
Phylogenetic reconstructions, based on nuclear genes, did not allow us to clarify the relationships and the taxonomic rank of the distinctive phylogroups identified. Thus, only species-level clusters were reliably traced by the RAG1 gene tree constructed for the 26 available samples (Suppl. material 1: Fig. S3). The overall level of divergence was low and genetic distances did not exceed 0.01. The same species-level groups were identified by the GHR gene (Suppl. material 1: Fig. S4), of which we included 52 samples. Within C. langbianis, complex soft polytomy without notable geographic segregation was traced, whereas within C. thomasi, two clusters corresponding to the mitochondrial phylogenetic lineages mentioned above were clearly demonstrated. At the same time, the considerable length of the branches was apparent for C. thomasi and for some specimens of C. langbianis from the Dalat Plateau. These branches were significantly longer than those characteristic of C. chiropus and most of the C. langbianis samples, a trait that may indicate a special pattern of its evolutionary history and, in particular, the longer evolutionary age of these populations. Species-level clusters may also be traced in the IRBP gene tree, of which we had 49 samples (Suppl. material 1: Fig. S5). Within the C. langbianis cluster, no geographical segregation was traced, which may indicate incomplete sorting of lineages; on the other hand, C. thomasi  showed deep trichotomy. The samples corresponding to the Southern cluster of mitochondrial lineages were represented here by two independent branches, which formed populations from the Tay Nguyen Plateau and populations distributed further to the north. Similar to the GHR gene tree (Suppl. material 1: Fig. S4), the inequality of the phylogenetic branch lengths should garner attention. However, in contrast to the GHR gene, all samples of C. langbianis from the Dalat Plateau recovered longer branches. In general, it can be concluded that C. langbianis and C. chiropus showed significant homogeneity in nuclear genes, whereas C. thomasi had the same pattern of nuclear gene variation as traced by mitochondrial markers.
In addition to low support levels for many nuclear gene clades, tree-bisection-reconnection branch-swapping (PBS) analysis indicates an existence of conflicting phylogenetic signals, especially for segments within C. langbianis. In general, the low posterior probability values for internal branches and the conflicting phylogenetic signals in many lineages can be explained by a significantly slower evolution rate of nuclear genes (generally weak phylogenetic signal) and incomplete lineage sorting that may be the result of symplesiomorphy. The tree which constructed the concatenated sequence (Fig. 3) is consistent with nuclear gene trees, but posterior probabilities values for some internal nodes are lower, mainly  Fig. 1 and Suppl. material 1: Table S1. inside C. langbianis. The Hainan cluster is not monophyletic, as one of the samples was recovered in the Northern continental cluster.

Morphological analyses
The descriptive statistics of the craniodental measurements for phylogenetic lineages of C. langbianis (two of four phylogroups discovered were available) and C. thomasi that were identified by the abovementioned analyses are summarised in Suppl. material 1: Tables S3, S4. Craniodental measurements for C. chiropus are given in Suppl. material 1: Table S5. As revealed by the intergroup F-test, the populations under study demonstrated notable peculiarities of cranial morphology. The samples were significantly different (p < 0.05 or lower) from each other in 18 and 7 cranial characters for C. langbianis and C. thomasi, respectively.
In a principal components analysis (PCA) drawing on 20 craniodental measurements, the first two axes captured 60.9% (mainly reflecting general size) and 6.4% of the total variation, respectively. ONL, ZB, IB, LD, PPL and CLM1-3 were the six measurements that had the highest correlations with PC 1 (Suppl. material 1: Table S6). In the PCA of skull measurements, all three species of Chiromyscus overlapped and C. langbianis showed the largest range of variation amongst these species (Fig. 4A). Discriminant function analysis (DFA), which drew on the same variables, provided another means of illuminating the morphometric distinctions and the first two axes captured 53.6% and 25.4% of the variation (Suppl. material 1: Table S3). The DFA yielded moderate to high discrimination amongst all species and genetic lineages (Fig. 4B).

Taxonomic implications
The concordance of morphological and genetic traits and a good separation of samples in 3D factor space indicate the morphological specificity of the studied populations. On the other hand, the concordant pattern of morphological, genetic and clear geographic subdivision of the mitochondrial phylogroups allow us to question the taxonomic status of these populations; in particular, they allow us to attribute the observed genetic lineages to distinct taxa.
The Northern genetic lineage of C. langbianis must be undoubtedly assigned to subspecies C. l. indosinicus Osgood, 1932. This taxon was described as Rattus indosinicus by Osgood (1932) from Sapa in northern Vietnam, Lao Cai Province. The Northern genetic lineage of C. thomasi has to be attributed to the nominotypical subspecies C. t. thomasi (this phylogroup includes the holotype of C. thomasi). The appropriate name for the Southern genetic lineage of C. thomasi is debatable. It may be associated with Rattus indosinicus vientianensis Bourret, 1942, described from the surroundings of Vientiane, Laos. However, Musser (1973) treated vientianensis as a younger synonym of langbianis and, in our previous survey where the most recent genus revision has been made (Balakirev et al. 2014), we also supposed that nomen vientianensis should be associated with C. langbianis. Unfortunately, we have no specimens from the neighbouring Vientiane and we cannot identify which of the species is distributed there. The holotype of vientianensis is unavailable.
The genetic distances between the two phylogroups of C. thomasi correspond minimally to the subspecific level (Baker 2006). However, despite their considerable ages, these groups have no visually remarkable morphological differences (see Suppl. material 1: Table S3), so we cannot give here formal description of a new subspecies and assert only that the southern populations of C. thomasi belong to a unique monophyletic genetic lineage.

Phylogeography and recent fauna formation
Tree rats are usually confined to forest environments and their dispersal is restricted by the forest edge. They usually do not spread beyond these limits and never cross wide deforested areas as they do not feel confident on open ground surface (Musser 1981;Corbet and Hill 1992;our personal observations). The main kind of natural event that contributed to the dispersal, population segregation and speciation of mammalian fauna is, most probably, repeated disjunction-reconnection events of natural populations that were associated with areas covered by tropical forest during the late Miocene-Holocene (Hall 1998). The distribution pattern and genetic diversification of the genetic lineages revealed in the genus Chiromyscus shed light on the natural history and range formation of these species. Based on genetic data, the population of C. langbianis inhabiting the Dalat Plateau is apparently older than other recent continental populations. It can be assumed that the Dalat Plateau served as its main refugium during the Holocene climate oscillations (see also Meschersky et al. 2016;Abramov et al. 2017 for another rodent species). Judging by observed genetic distances and the homogeneity of the Northern genetic lineage of C. langbianis, its expansion beyond the Plateau occurred fairly quickly. Moreover, there is a reason to suppose multiple refugia, as supported by the fact that the most northern Hainanese cluster occupies a basal position in relation to the continental lineages. Noteworthy, to date, the Hainanese haplogroup may occur amongst continental populations. This may indicate both incomplete lineage sorting in this pair of clusters and an ancient hybridisation event between insular and continental populations. Multiple reconnection events that occurred during the Pleistocene make the second scenario possible. This gives some reason to believe that colonisation of the Island initiated from a different population, not the one that inhabits the Dalat highlands. Instead, it probably originated from an additional northern refugium. The colonisation of Hainan by C. langbianis might have happened simultaneously with those of other Muridae (Niviventer and Rattus), which are currently represented by distinct insular populations (Pan et al. 2007;Li et al. 2008;Smith and Xie 2008). This event could be dated back to the Late Miocene. According to Voris et al. (2000), Hainan had been connected to the mainland when the sea level was 120-75 m below the current level, which has happened many times, with the longest connections occurring at approximately 0.25, 0.15 and 0.017 Mya. However, judging by the estimated time of species level genetic lineage divergence (over 1 Mya, Fig. 3, Table 4), all of them were formed much earlier than these dates and cannot be associated with recent insularisation. It should be noticed that estimates, evaluated for divergence time for Muridae, are slightly higher than proposed earlier (Rowe et al. 2011;Pages et al. 2016); however, the genus Chiromyscus was represented there by only a single individual. Our finding provides evidence in support of more complex patterns of its evolutionary history. In any case, even if our estimates are closer to the higher limit of generic age determined earlier (Fabre et al. 2013;Pages et al. 2016), these timings for group split points are significantly older than the last events of the Hainan-Mainland reconnection. The latter supports the hypothesis of their formation in the continental refugia during the Late Miocene. On the other hand, the occurrence of another original genetic lineage in southern Cambodia, which is an even more ancient separation than the Hainanese, indicates that there may have been several insularisation and resettlement events and that "distribution waves" originated from the Dalat and any other refugia during the Pleistocene.
The split of C. thomasi into the Northern and Southern phylogroups happened before the split of the corresponding C. langbianis phylogroups and apparently is associated with antecedent global natural factor fluctuations. However, the recent distribution pattern of these species indicates that their natural history differs significantly amongst the populations that originate from different dispersion centres/refugia. As far as can be traced by the data available, C. thomasi does not reach the Dalat Plateau and more southern regions inhabited by C. chiropus and the Southern lineage of C. langbianis. At the same time, C. thomasi (both Northern and Southern phylogroups) appears to be distributed sympatrically with the Northern phylogroup of C. langbianis in most of eastern Indochina. This indicates that their possible migration routes alongside the Annamite Range occurred in two opposite directions, with C. thomasi moving northwards and C. langbianis moving southwards. The fact that C. thomasi did not participate in mammal fauna formation on Hainan Island supports the recent natural area expansion of C. langbianis and are probably explained by ecological factors. Namely, these phenomena may reflect the ecological preferences of this species. C. thomasi is known to be more strictly associated with mountain forest formations than C. langbianis, showing greater habitat versatility, which apparently allowed the latter to spread much further eastwards along the plains of eastern Indochina.
On the other hand, the significant genetic homogeneity of C. chiropus, which inhabits forest formations everywhere in the extreme south of Indochina and its basal position in relation to the genetic lineages of C. langbianis, may indicate that these recent populations diverged significantly earlier. This finding also indicates that forest refugia remained at the southern part of Indochina throughout the Holocene and even earlier. They could be associated not only with the Dalat Plateau, but also with the Cardamom Mountains, Bolaven Plateau and probably also with some of the offshore islands on the shelf of the Gulf of Siam.
The distribution pattern of Chiromyscus species in the region also raises the problem of the initial intrusion and distribution of C. chiropus in eastern Indochina. The terra typica for this species is the Karen Mountains in eastern Myanmar, where the species inhabit mountainous forests. As we pointed out earlier (Balakirev et al. 2014), Burmese specimens do not demonstrate noticeable morphological differences from the southern Vietnamese populations and these are treated as conspecific. Unfortunately, there are still no genetic data from Burmese populations that would allow direct comparison of their genetic identity. However, the wide distribution of this species to the east in eastern Indochina through the Yunnan and Annamite Ranges is hampered by the wide distribution of another species, namely, C. thomasi, which populates these regions. No cases of sympatry are currently documented, which may suggest a competitive exclusion in this pair of species. At the same time, the existence of a direct connection between the Malacca and southern Indochina in the Holocene by a forest corridor cannot be excluded. Based on data of Meijaard et al. (2003) on tropical forest persistence and the distribution of forest-dependent species on islands of the South China Sea and a forest connection between southern Indochina and Malacca, a southern expansion route is probable. Nevertheless, there are no records on the current distribution of this species in the lowland areas in central Indochina to the west from 105°E.

Conclusions
We show that the genetic distances between phylogroups of C. langbianis and C. thomasi correspond to the subspecific level at least. However, these phylogenetic groups do not demonstrate obvious univocal diagnostic differences in cranial features suitable for species diagnoses without special statistical analysis. Our study shows that the recent phylogenetic structure of C. langbianis is the most recent within the genus and appears within several independent refugia that remained isolated throughout the Pleistocene. In turn, the phylogroups of C. thomasi are likely older than those of C. langbianis. Environmental factors and species preferences followed recent natural ecological shifts which drove allopatry. However, C. chiropus demonstrates the greatest age; the ways of formation of the area of this species still remain obscure and are likely to be associated with changes in forest cover in Indochina and Malacca Peninsula during the Pleistocene. The possibility of competitive interaction of these species in the process of formation of their recent natural areas also cannot be excluded.