The new data of full Y-chromosome sequencing allowed the update of the Q1b (Q-L275) haplogroup struc-
ture, as well as in identifying new subclades: Q-Y2990 (downstream Q-Y2250), Q-Y2225 (downstream Q-Y2220)
and Q-Y3030 (downstream Q-Y2200). It created the background for continuation of further researches of the in-
ner structure of the pointed subclades and on comparing of their existing ethno-population composition with the
migration of the Indo-European tribes.
Original Title
The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing
The new data of full Y-chromosome sequencing allowed the update of the Q1b (Q-L275) haplogroup struc-
ture, as well as in identifying new subclades: Q-Y2990 (downstream Q-Y2250), Q-Y2225 (downstream Q-Y2220)
and Q-Y3030 (downstream Q-Y2200). It created the background for continuation of further researches of the in-
ner structure of the pointed subclades and on comparing of their existing ethno-population composition with the
migration of the Indo-European tribes.
The new data of full Y-chromosome sequencing allowed the update of the Q1b (Q-L275) haplogroup struc-
ture, as well as in identifying new subclades: Q-Y2990 (downstream Q-Y2250), Q-Y2225 (downstream Q-Y2220)
and Q-Y3030 (downstream Q-Y2200). It created the background for continuation of further researches of the in-
ner structure of the pointed subclades and on comparing of their existing ethno-population composition with the
migration of the Indo-European tribes.
Received: May 10 2014; accepted: May 12 2014; published: May 20 2014. Correspondence: gurianov.vm@gmail.com acgt@yfull.com
The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing
Vladimir Gurianov Roman Sychyev Vladimir Tagankin Vadim Urasin
1 Independent Researcher, Russia, 2 YFull Research Group, Russia.
Abstract
The new data of full Y-chromosome sequencing allowed the update of the Q1b (Q-L275) haplogroup struc- ture, as well as in identifying new subclades: Q-Y2990 (downstream Q-Y2250), Q-Y2225 (downstream Q-Y2220) and Q-Y3030 (downstream Q-Y2200). It created the background for continuation of further researches of the in- ner structure of the pointed subclades and on comparing of their existing ethno-population composition with the migration of the Indo-European tribes.
Introduction
Over the short period passed after publica- tion of V. Gurianov et al.s article (2013) 1 , sev- eral samples of full Y-chromosome sequencing referring to Q1b (Q-L275) haplogroup and its downstream subclades, became public available.
The analysis of new data made it possible to update the phylogenetic structure of Q1b hap- logroup, to identify new subclades, to perform a
1 V. Gurianov et al. (2013) Phylogenetic Structure of Q-M378 Subclade Based. On Full Y-Chromosome Sequencing. The Russian Journal of Genetic Genealogy Volume 5, 1, 56-74. more in-depth typing of a range of scientific samples, and to make a feasible hypothesis on the pre-historic migration routes of the Indo- European tribes.
Source Data and Methodology
Data Sets for Comparison
The data on the examined samples are summarised in the table below:
Table 1. Information on the researched samples of full Y-chromosome sequencing.
Sample code Population Verified origin Source of the information HGDP00100 Hazara Pakistan Lippold et al. (2014) 2
HGDP00129 Hazara Pakistan Lippold et al. (2014) HGDP00165 Sindhi Pakistan Lippold et al. (2014) PGP193 N/A N/A 3 The Personal Genome Project 4
Eu1 Italians Sicilia, Italy Provided by a volunteer 5
Eu2 Portugal Azores, Portugal Provided by a volunteer 5
2 Sebastian Lippold et al. (2014) Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences, doi: 10.1101/001792 3 Current location California, USA. 4 http://www.personalgenomes.org/ 5 The test was performed by Full Genomes Corporation (FGC) in Beijing Genomics Institute at Illumina HiSeq 2000 sequenator, and is characterized by the following pa- rameters: coverage 50 at read length of 100 base pairs.
The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG
65
Genotyping
Data sets in BAM format (BAM/SAM Specifi- cation 6 ) and, in case of PGP193, TSV 7 format were used for the research. The parameters of New Generation sequencing (NGS) of Eu1 and Eu2 samples performed by Full Genomes Corpo- ration at Beijing Genomics Institute are the same as were previously described in the article of V. Gurianov et al. (2013).
Data Processing and Analysis
Processing and analysis of full Y-chromosome sequencing data were made using the software developed by YFull research group 8 , and the VCFTools 9 .
Each sample was analysed for both SNPs dis- covered during the research and SNPs included in the ISOGG list under Q1b haplogroup and its downstream subclades.
Presence of mutation in more than two male samples not being relatives, as well as data con- sistency between the new SNPs and the previ- ously known information on phylogenetic struc- ture of a respective subclade, served as the cri- terion of a new SNP discovery.
6 The specification in force is located here: https://github.com/samtools/hts- specs 7 TSV ( Tab Separated Values) text format to present table values. 8 http://www.yfull.com/ 9 http://sourceforge.net/projects/vcftools/ The research also specified phylogenetic po- sition of SNPs previously described in the article of V. Gurianov et al. (2013).
Results
Eu1 Data Analysis
The research findings of Eu1 sample were promptly submitted to ISOGG, and as of the date of this article have been already included into the current version of ISOGG SNP Tree. Nevertheless, we consider it necessary to give a detailed description of the revealed SNPs and al- teration of the structure of subclades down- stream of Q-Y2220 resulted there from.
Level Q-Y2225 and SNPs general for AJ1, PGP193, and Eu1 were formalized upon compar- ing Eu1 sample with the samples of YFull data base.
SNPs typical for this level are included in the Table below.
Table 2. SNPs of the Q-Y2225 level.
Position (hg19) Ancestral value Value positive for SNP SNP name 23646920 C T Y2196 22471554 A T Y2201 19425984 G A Y2206 19053060 C T Y2207 18207170 A G Y2208 18046486 T C Y2210 18043999 G A Y2211 15834557 G A Y2213 15658212 C T Y2214 14385853 T G Y2215 9892635 C T Y2219 8662585 C A Y2224 6949449 C T Y2225
Consequently Q-Y2200 subclade which now may pretend for a more accurate compliance Jewish cluster of the Q-L245 branch, is currently defined by the following single level SNPs:
The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG
66
Table 3. SNPs of the Q-Y2200 level.
Position (hg19) Ancestral value Value positive for SNP SNP name (Y) 22953894 A G Y2197 22825080 A G Y2198 22588598 C T Y2200 21277083 G A Y2203 16994660 T A Y2212 14353022 A C Y2216 14184253 C A Y2118 9401947 C A Y2221 4606181 C T Y2231 3995524 G A Y2232 3148720 A G Y2233
Since according to the phylogenetic structure made on base of STR-markers Eu1 sample is located in the centre of Q-L245 European clus- ter (and presents a typical value of DYF395S1=15-17, which is an ancestral), we may reasonably assume that many of its private SNPs will form branches of the tree subject to availability of close samples to compare. To this end the private SNPs of Eu1 sample are included into a separate Schedule 1.
Eu2 Data Analysis
Eu2 sample was originally known as positive to a private SNP L327. Comparison of the stated sample with other ones stored in YFull data base defined a new branch Q-Y2990, downstream of Q-Y2550 and parallel to the Iran branch Q-L301. Table 4. SNPS of the Q-Y2990 branch.
Position (hg19) Ancestral value Value positive for SNP SNP name (Y) 7929100 A C Y2986 5398133 A T Y2987 15540398 G A Y2988 15656595 A C Y2989 17455705 C G Y2990 18205189 C A Y2991 18427622 C T Y2992 21794826 T C Y2993 21824228 C T Y2994 22779292 G A Y2995 23574588 G T Y2996 6675390 A G Y2997
The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG
67
The stated branch currently includes two samples: Eu2 and Kz1.
It is worth noting that Eu2 haplotype on the phylogenetic tree constructed with regard to values of 67 STR-markers is located near the root of the tree, therefore it was possible to cal- culate the time period when Q-M378 com- menced to actively divide into two subclades: Q- L245 and Q-Y2250. Two calculations made with the use of MURKA software 10 and the method of random pairs STR haplotype 11 demonstrated this time to be 5000 years ago.
10 MURKA http://sourceforge.net/projects/phylomurka/ 11 Adamov et al. (2011) TMRCA assessment though the method of random pairs of STR haplotypes: http://rjgg.org/index.php/RJGGRE/article/view/83/102, http://www.semargl.me/ru/dna/ydna/tools/asd-pairs/ Data Analysis of Samples under Human Genome Diversity Project (HGDP) from Pakistan
The above stated book of Lippold et al. (2014) describes three samples of two Pakistani ethnic groups: Hazaras and Sindhis.
HGDP00129 sample of a Hazara from the Northern Pakistan may be identified as the one of Q-L245 level on base of the following proper- ties:
Table 5. SNPs showing that HGDP00129 sample belongs to Q-L245 level.
Position (hg19) Ancestral value Value positive for SNP SNP name (Y) SNP name (FGC) 9382621 G T Y2222 FGC1902 17860015 G T Y2139 FGC1849 23733052 A G Y2148 FGC1879
______________________________________
For the avoidance of doubt we shall note that the connection of Hazaras and the population of Khazar Kaganat, is not proved by any sources known to us. Hazaras (from Persian , hezr thousand) are Shiahs of Mongol or Iran origin who speak Iranian and dwell in the central Afghanistan (8-10% of the total country population). They speak Hazara dialect or the dialect of Dari language. Some of them speak Mongolian. The historical area of Hazaras dwelling in Afghanistan is Hazaradzhat region shared in the contemporary Afghanistan by several provinces. Sengupta et al. (2006) 12 identifies the following structure of Hazaras by haplotypes of Y-chromosome: C- M217 40% (10/25), R1b-M73 32% (8/25), O-M122 8%. Q1b-M378 is mentioned in this research to be localized in only one person. The detailed analysis of Hazaras haplotypes currently known was presented by Sabitov in his article on the Origin of Hazaras from the point of DNA-genealogy 13 .
12 Sanghamitra Sengupta et al., Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions and Reveal Minor Genetic Influence of Central Asian Pastoralists, Am J Hum Genet. 2006 February; 78(2): 202221. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380230/ 13 Sabitov Zhaksalyk Origin of the Hazara from the point of DNA genealogy, The Russian Journal of Genetic Genealogy, Volume2, 1, 2010, page 38. rjgg.org/index.php/RJGGRE/article/download/42/53
SNPs Y2139 and Y2148 were defined as posi- tive: samples PGP130 and PGP193, as well as the tested samples of Q-L245 (AJ1, AJ2, Ar1) level. However they are negative in respect to all Q-L275 (xL245), and namely to samples Ir1, Kz1, Eu2, HG03914, HG03652, HG03864. We have similar situation on SNP Y2222 (but for the fact that it failed to be defined for PGP130). Therefore, we may very likely suppose that the tested sample HGDP00129 is referred to para- subclade Q-L245*. Unfortunately the quality of sequencing does not allow the sample position on the phylogenetic tree to be defined more ac- curately. The same may be stated in respect of two other samples from HGDP.
The sample HGDP00165 belongs to a Sindhi from Southern Pakistan may be identified as the one of Q-Y2250 level based on the following properties:
Table 6. SNPs showing that HGDP00165 sample belongs to Q-Y2250.
Position (hg19) Ancestral value Value positive for SNP SNP name (Y) SNP name (FGC) 6894323 C T Y2245 PR683 24452225 G C Y2270 FGC4676
SNPs Y2056, Y2091 and F1349 are common for HGDP00129 and HGDP00165, but are nega- tive to HGDP00100, which shows both samples to belong to subclade Q-M378. All SNPs of Q- Y2990 turned to be negative to sample HGDP00165; the latter, therefore, refers to pa- ra-subclade Q-Y2250 (xQ-Y2990).
The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG
68
Sample HGDP00100 belonging to a Hazara from Northern Pakistan may be clearly identified as the one of Q-L275 branch both with regard to the above pointed information on availability of positive mutation of the previously identified SNP being of the single level with L275, and on base of the following properties:
1) All three samples are positive to SNPs L314, F1169, F1337, F1528 being of a single level with L275.
2) SNP F753 (hg19 3714320) is positive to all three tested samples and to all Q-L275 in- cluded into YFull data base, which is also similar in respect of F1205 (hg19 8440399).
It is similar positive to all Q-L275 Y1150+ and to HGDP00100 SNPs Y1189, Y1209, Y1218, Y1232, Y1263, L68/S329/PF3781 (hg19 18700150), YP505 (hg19 6388256).
The above mentioned book of Lippold et al. (2014) included a phylogenetic scheme of Q haplogroup where the mutual alignment of sam- ples HGDP00100, HGDP00129 and HGDP00165 proves our conclusions. At the same time, the in-depth analysis as per SNPs with regard to specific branches failed to be made; samples HGDP00129 and HGDP00165 were identified on the scheme as single level.
Therefore, we managed to specify the phy- logenetic position of three samples from HGDP, and stated them to belong to the following branches:
The high level genetic diversity within a sin- gle population and geographic region demon- strates that the territory of the contemporary Pakistan and Afghanistan played a key role in spreading Q-L275 haplogroup in the past.
We shall note here that the original presence of the population referring to this haplogroup in Central Asia (pre Indo-European substrate) looks more probable than the appearance of this population in the region together with the Indo- Europeans. However, the diffusion of indigenous population and the one originated from the north may result in establishment of a new community where L275 haplogroup was a minor one; its further spreading was connected with the migrations of the Indo-Europeans to India and Western Asia.
Due to presence of people belonging to Q- L275 haplogroup in Central Asia by the close of the 1 st millennium B.C. proved by paleoDNA re- searches, the territory of contemporary Pakistan and Afghanistan is considered to be a transit zone which presented the main migration routes of the Indo-European tribes (which also included representatives of Q-L275 haplogroup) to Hindustan through the Hindu Kush (Q-Y1150), as well as in the direction of Western Asia (Q- Y2250 and Q-L245). The research of paleoDNA performed by Chinese scientists based on the findings of archaeological excavations in Central Asia demonstrates the presence of Q haplogroup representatives in these lands; 6 Q1a and 4 Q1b were found in the Black Gouliang barrow to the east of the Barkol Basin at the ruins of Hami (Kumul). 14 With regard to the location of bodies in the barrow, it may be concluded that repre- sentatives of Q1b haplotype were of a higher so- cial status.
The Hami oasis was located at the Great Silk Road near to Turfan and Khotan (Yarkend). The barrow dated to the Early (Western) Han (II-I centuries B.C.).
A part of the contemporary Uyghur popula- tions may be direct progenies of Q1b haplogroup settled in the ancient Central Asia. The re- searches of Hua Zhong et al., 2010 15 and Wen- juan Shan et al. (2014) 16 show the availability of Q1b haplogroup only among the people of Xinji- ang. Unfortunately we have at our disposal only 17-marker haplotypes which prevent us from making any definite conclusions.
PGP193 Data Analysis
We defined sample PGP193 as referring to a Jewish cluster of Q-L245 (Y2225+ Y2200+)
14 Li Hongjie, Y chromosome genetic diversity of ancient population in the Northern China, Jilin Universit, 2012. http://cdmd.cnki.com.cn/Article/CDMD-10183-1012365432.htm 15 Zhong et al., Extended Y-chromosome investigation suggests post-Glacial migrations of mod- ern humans into East Asia via the northern route // Molecular Biology and Evolution, First pub- lished online: September 13, 2010, doi: 10.1093/molbev/msq247 (among four populations of Ui- gurs from Xinjiang one such person was found in each of the two populations: 1 out of 71, 1 out of 18). 16 Wenjuan Shan et al. (2014) Genetic polymorphism of 17 Y chromosomal STRs in Kazakh and Uighur populations from Xinjiang, China. http://link.springer.com/article/10.1007/s00414-013- 0948-y
The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG
69
branch. All SNPs out of the stated branches (but for several 17 ) were derived to be positive for PGP193.
17 Y2114 not read (level Q-Y2225), Y2232, Y2233 and Y2212 (level Q-Y2200).l
Unfortunately this sample is anonymous, and to justify the tested sample to be of Jewish ori- gin is impossible.
Notwithstanding, the stated sample was compared with private SNPs of AJ1 and AJ2 samples described in the article of Gurianov v. et al. (2013). The results are summarized in the below Table:
Table 7. SNPs of the Q-Y3030 branch.
Position (hg19) Ancestral value Value positive for SNP SNP name (Y) SNP name (FGC) 6985833 G C Y2746 aka YFS028180 FGC4836 7116693 C G Y3026 aka YFS028187 FGC4837 14683323 G A Y3027 aka YFS028303 17842405 G A Y3028 aka YFS028379 FGC4845 18697269 A G Y2750 aka YFS028399 FGC4846 22545510 G T Y3029 aka YFS028485 FGC4850 22989959 T C Y3030 aka YFS028498 FGC4853 23338485 T C Y2751 aka YFS028509 FGC4854
It is currently difficult to speculate on over- lapping of a new SNP structure of Q-L245 sub- clade with the earlier delivered phylogenetic structures as per 67 STR-markers of Y- chromosome; as well as for the reason that the data on STR-markers of PGP193 are not avail- able for the research. Moreover, sample AJ1 presents DYF395S1=15-19 which is typical for a majority of Q1b Ashkenazi Jews, when AJ2 has a unique DYF395S1=15-15 (which is apparently a consequence of RecLOH).
Final Conclusions
The undertaken research resulted in the up- date of Q1b (Q-L275) haplogroup structure, as well as in identifying new subclades: Q-Y2990 (downstream Q-Y2250), Q-Y2225 (downstream Q-Y2220) and Q-Y3030 (downstream Q-Y2200).
It created the background for continuation of further researches on the inner structure of the pointed subclades and on comparing of their ex- isting ethno-population composition with the mi- gration of the Indo-European tribes which con- tributed to formalization of the pointed ethnic groups.
The updated findings in respect of phyloge- netic structure of Q1b haplogroup are included in the following scheme.
The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG
70
SNP Phylogenetic Tree of Q1b Haplogroup.
The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG
71
The changes made to the SNP scheme of Q1b haplogroup compared to the one published by V. Gurianov et al. (2013) are included in Schedule 2.
Acknowledgements
The authors of the article wish to thank the following people, who rendered their assistance in its preparation and conducting the research:
Alessandro Biondo (Italy) Leon Kull (Israel) Justin Allen Loe (USA) Linda Magellan (USA) Olga Vasilyeva (United Kingdom)
The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG
72
Schedule 1. Private SNPs for Sample Eu1.
Position (hg19) Ancestral value Value positive for SNP SNP name (YFull internal notation) 3131205 T C YFS068595 3232026 C T YFS068596 3232027 A G YFS068597 3403647 C A YFS068599 3704060 A G YFS068605 6702576 C G YFS068611 6881382 C T YFS068612 7139179 A T YFS068614 7222827 C T YFS068615 8356720 T A YFS068618 8467849 G A YFS068619 8592711 T C YFS068620 8990561 G C YFS068621 9415377 T G YFS068622 13828699 C T YFS068627 14545910 T C YFS068629 15269498 T C YFS068630 15455814 T C YFS068631 16255444 C T YFS068634 17402893 A T YFS068639 17722084 G T YFS068640 18148788 G A YFS068641 18158679 T C YFS068642 18394566 C G YFS068643 19060348 C T YFS068644 19130251 A G YFS068645 19130253 T C YFS068646 19166462 C T YFS068647 21329851 C G YFS068654 21555930 T C YFS068655 22519498 G A YFS068659 23064750 C T YFS068660 24365889 A G YFS068663
Schedule 2. Changes made to SNP scheme of the Q1b haplogroup. SNPs under research.
SNP Belonging to subclade Notes CTS4507 Q-Y2250 Reverse SNP of P paragroup level (under research). Updated in terms of specifying the reverse character of the mutation L68 Q-Y1150 Added (Y-DNA Haplotree, FTDNA 2014) F753 Q-L275 Added F1205 Q-L275 Added Y1193 Excluded from Q-Y1150 level (under research) F2250 Q-L275 Added (Y-DNA Haplotree, FTDNA 2014) Y1200 Q-L275 Revised: transfer from Q-Y1150 level Y1220 Q-Y1150 Added (under research) Y1228 Excluded from Q-Y1150 level (under research) Y2118 Q-L245 Misprint correction: the position confirmed (see Y2218) Y2218 Q-Y2200 Misprint correction: added in lieu of Y2118 YP505 Q-Y1150 hg19: 6388256 (->T) Z5901 Excluded from Q-Y1150 level (under research) _______________ Note: the SNPs under research are not included in the SNP scheme of Q1b haplogroup until their positions are clearly identified.