Original Research Article

The Korean Journal of Crop Science. 30 June 2015. 224-230
https://doi.org/10.7740/kjcs.2015.60.2.224

ABSTRACT


MAIN

Soybean (Glycine max) is one of the most important crops worldwide and it is providing oil and useful protein. Despite of many advantages, soybean has not been researched in genomic details than other model plants because of its complex genome structures, high haploid chromosome number (n=20), and large genome size. The 20 haploid chromosomes in soybean were almost doubled such as rice (n=12), maize (n=10) and Arabidopsis thaliana (n=5). After the soybean genome sequence of Williams 82 had been established (Schmutz et al., 2010), some papers have shown whole-genome resequencing studies of Glycine max and Glycine soja (Kim et al., 2010; Lam et al., 2010; Chung et al., 2014). The reference genome allows us to find more single nucleotide polymorphisms (SNP) (Zhu et al., 2003; Hyten et al., 2006; Lam et al., 2010, Chung et al., 2014), structural variation (SV), such as copy number variation (CNV), large insertions/deletions (InDels), inversions and translocations (McHale et al., 2012; Haun et al., 2011). Next generation sequencing (NGS) is also an important technology in the genomic studies and molecular breeding. In addition to its availability of the soybean genome sequence, NGS technologies and bioinformatics provides many opportunities to reveal genetic diversities such as SNP and InDel among soybean accessions. These technologies are valuable resources that have facilitated whole-genome genotyping, genome-wide association studies, predictions of gene functions, and analysis of genetic diversities in plants (Lam et al., 2010; Chung et al., 2014; Li et al., 2013).

The genome re-sequencing provides enormous number of new information onthe record of SNPs. Over 3.8 million SNPs were identified through high depth of re-sequencing of 10 cultivated and 6 wild accessions (Chung et al., 2014). Recently, SoySNP50k platform was developed, which comprise over 19,000 accessions (Song et al., 2013). Genome-wide identification of SNPs among soybean accessions was also reported and developed as markers for breeding (Zhu et al., 2003; Van et al., 2005). In contrast to SNPs which have been studied extensively, other forms of highly abundant genetic variation such as InDel, have received relatively less attention. Small InDels, a type of polymorphisms are common but functionally important type of polymorphism. Especially InDels in coding region cause single or more amino acid substitutions such as frameshift mutations and affect sequence conservations, structure and function of proteins. They are easily detected by based on mismatching in common beans (Zou et al., 2014). However many studies focus on single nucleotide variants (SNVs) or large structural variants (Wu et al., 2010; McHale et al., 2012; Lee et al., 2015). DNA polymorphisms positioned within coding regions are particularly important as it has a potential to change the protein function itself. Discovery of polymorphisms related to functional changes of genes is important in investigations on differences among various accessions and its main cause of diversions.

In this study, we have obtained RNA-sequencing data of 2 soybean accessions (Daewon and Hwangkeum) using NGS. The sequence data were analyzed to discover a large number of DNA polymorphisms, including single nucleotide variation (SNV), multiple nucleotide variation (MNV), insertion/deletion (InDel) and replacements. This data can be used in future studies on genotyping and marker development for molecular breeding.

MATERIALS & METHODS

Plant materials and RNA-sequencing

To detect polymorphisms, two Korean soybean accessions were analyzed in this study (Daewon and Hwangkeum). The soybean seeds were obtained from the National Institute of Crop Science, RDA, Suwon, Korea. The plants were grown until V5 stage at 2 5℃ under 16 h light / 8 h dark conditions. For all samples, total RNA was prepared from leaf tissues using an RNeasy Plant Mini Kit (Qiagen, Germany). The RNA quality was checked for integrity before performing the RNA sequencing process using a 2100 Bioanaylzer RNA Nanochip (Agilent Technologies). After a quality check, RNA sequencing using Illumina HiSeq platform was performed by Theragen, Inc. (Korea).

Alignments and variants detection

Raw reads from Illumina Hiseq 2000 sequencing platforms were imported to CLC Genomics Workbench 7.0 (CLC Bio) and read statistics were assessed using sequencing data quality control, followed by read trimming for quality. The raw read were trimmed with a quality score limit of 0.01. The CLC Genomics Workbench was used to map trimmed reads to the soybean genome sequence of Williams 82 from Phytozome v.9 (http://www.phytozome.net) used as the reference sequences. The mapping process followed defaults parameters, and global alignment setting. Consensus sequence were aligned to the reference sequence and analyzed for SNVs, MNVs, InDels and replacements. The variants were screened using the CLC Genomics Workbench 7.0 with the following parameters (minimum count: 3; minimum coverage: 10; minimum variants frequency: 35%).

RESULTS

RNA-Seq and assembly

A total of 78,994,236 reads were obtained from two accessions. Number of reads was mapped 27,534,208 and 51,460,028 in Daewon and Hwangkeum, respectively (Table 1). The total number of reads in Daewon was lower than the reads in Hwangkeum. After quality filtering and trimming, the 23,646,188 reads in Daewon (85.85%) were mapped to the reference genome (contained 56,044 genes and 177,294 transcripts), and 45,715,040 reads (88.84%) in Hwangkeum. The mapped reads uniquely covered 82.72% of the reference genome in Daewon and 86.23% reads were uniquely mapped in Hwangkeum.

Table 1.

RNA sequencing data assembled to soybean reference genome.

Mapped readsDaewonHwangkeum
Number of sequences%Number of sequences%
Reads mapped in pairs23,646,18885.8545,715,04088.84
- unique fragments11,391,33282.7222,186,18486.23
- non-specifically431,7623.14671,3362.61
Reads mapped in broken pairs2,495,2259.063,288,7166.39
Reads not mapped1,401,7955.092,456,2724.77
Total27,543,20810051,460,028100

Discovery of variants on transcriptome

Using the default parameters implemented in CLC genomics workbench, we identified a total 89,955 genetic variants between the reference genome and the RNA sequence of two soybean accessions. The number of variants was 34,411, comprising 29,777 SNPs, 1,311 MNVs, 1,754 insertions, 1,458 deletions and 111 replacements in Daewon; 55,544, comprising 47,418 SNPs, 2,293 MNVs, 3,230 insertions, 2,396 deletions and 207 replacements in Hwangkeum (Table 2).

Table 2.

Variants against reference genome.

TypeDaewonHwangkeum
HomozygousHeterozygousHomozygousHeterozygous
SNV19,9469,83131,54315,875
MNV5937189501,343
Insertion8998551,5421,688
Deletion1,0803781,705691
Replacement684310998
Total22,58611,82535,84919,695

The distribution of DNA polymorphisms was detected across the 20 soybean genomes and 21 scaffolds. The largest number of variants was observed in chromosome 18 of both accessions. However the smallest number of variation was observed on different chromosome, chromosome 17 of Daewon and chromosome 11 of Hwangkeum (Table 3).

Table 3.

Variants against each chromosome of reference genome.

ChromosomeDaewonHwangkeum
SNVMNVInsertionDeletionReplacementSNVMNVInsertionDeletionReplacement
Chr011,24932656441,870701241084
Chr021,5525810394112,58114820117418
Chr031,56163897652,62714715814318
Chr041,03928605521,74282139985
Chr051,756821308332,61610216712010
Chr061,55963937882,89715417214418
Chr071,56668975932,83313820914314
Chr081,8728712111032,86515621715310
Chr091,65361947772,333881521188
Chr101,42763856142,3791101661088
Chr111,24657836731,34876108953
Chr121,26568984972,01894170949
Chr131,9747912910882,94614720213410
Chr141,19585586131,957111126768
Chr151,57570827272,2581101511015
Chr161,40570716283,01914316313918
Chr1798630417351,469811171037
Chr182,062981078073,48816018813014
Chr191,40372726061,962811571159
Chr201,24564676172,052881399611
scaffold_21957660415010
scaffold_271000010000
scaffold_28201020111010
scaffold_32160000341210
scaffold_1237000090200
scaffold_1372100000000
scaffold_1656000000000
scaffold_20818210000000
scaffold_2167000050000
scaffold_2637000000000
scaffold_34440000200000
scaffold_6142000000000
scaffold_8461220010000
scaffold_20801000010010
scaffold_2200000150000
scaffold_440000060000
scaffold_3610000010000
scaffold_6330000070000
scaffold_8430000010000
scaffold_12410000030000
scaffold_17420000020000
Total29,7771,3111,7541,45811147,4182,2933,2302,396207

The highest SNP and MNV frequencies were found on chromosome 18 in both accessions, also on the other hand the lowest frequencies of SNP and MNV were found on different chromosomes. Other variants had different distributions in whole chromosomes. A relative high frequency of InDels was observed on chromosome 8 in both accessions, in comparison to other small number of InDel frequencies in different chromosomes. The 390 variants were found in location of the 21 scaffolds in the Williams 82 genome. The frequency of replacement showed to have a pretty even distribution among 20 chromosomes.

Annotation of variants

Some variants were detected in non-coding regions, introns and large repeat regions. These variants were excluded by amino acid change test with default parameter using CLC genomic workbench 7. After filtering out the variants, the remaining number of variants located in CDS regions was 13,225 nonsynonymous variants, comprising 11,522 SNVs, 665 MNVs, 481 insertions, 511 deletions, and 46 replacements in Hwangkeum and 9,611 non-synonymous variants were detected in Daewon that comprised 7,908 SNVs, 665 MNVs (Table 4), 481 insertions, 511 deletions and 46 replacements. The 4,290 genes and 5,672 genes were predicted to change protein functions through nonsynonymous variants in Daewon and Hwangkeum, respectively. Additionally, we observed that non-synonymous variants were not uniformly distributed within the chromosomes. The distributions of non-synonymous variants within coding regions and expression values of each accession are shown across the 20 soybean chromosomes (Fig. 1).

http://static.apub.kr/journalsite/sites/kjcs/2015-060-02/A0350600212/images/KJCS-60-224_F1.jpg
Fig. 1.
A circos plot showing differentially expressed genes and non-synonymous variants from RNA-sequencing. The outermost circle represents 20 soybean chromosomes (Chr 1-20) in different colours, and inner two circles represent gene expression value from RPKM (black) and distribution of variants (SNV and MNV: red, InDel: blue, Replacement: green) in Daewon (I) and Hwangkeum (II)

Table 4.

Non-synonymous variants detected within coding regions.

DaewonHwangkeum
TypeNon-synonymousSynonymousNon-synonymousSynonymous
SNV7,90811,51711,52216,030
MNV449204665285
Insertion316122481159
Deletion37805110
Replacement290461
Total9,08011,84313,22516,475

DISCUSSION

This study provides non-synonymous variants within coding regions (CDS) of two Korean soybean accessions using RNA-seq. We analyzed the transcripts and obtained a total 19,430 nonsynonymous SNPs, 1,114 MNVs, 1,686 InDels and 75 replacements against Williams 82 genome as a reference. These non-synonymous variants and expression data will possibly provide information for markers development and functional genetics. The reads from each Korean soybean accession were mapped on to reference genome. The mapping ratio of reads was shown highly coverage in each accession, which varied from 85.82 % in Daewon to 88.84 % in Hwangkeum. This coverage is similar to the result of whole genome sequence in various soybean accession studies (Zou et al., 2014; Chung et al., 2014).

Short reads was used to map on reference genome in RNA-seq. The short reads were possible to assemble large part of DNA sequence, especially the heterochromatic regions around centromeres and other highly repetitive regions. The plant genomes contain large repetitive sequences that more over 35% regions of genome are observed in rice and Arabidopsis (Arabidopsis Genome Initiative 2000; International Rice Genome sequencing project 2005). Soybean genome also has 57% heterochromatic and large repetitive regions in its genome (Schmutz et al., 2010). To selectively choose the non-synonymous variants on CDS, these variants within repetitive regions were removed according to other genotyping studies in soybean (Wu et al., 2010; Zou et al., 2014). After the basic variants were detected with default parameter, we classified the variants into two types, homozygous and heterozygous. A number of variants was 34,411 comprising 22,586 homozygous types and 9,831 heterozygous types in Daewon. The 35,849 homozygous variants and 19,695 heterozygous variants were shown in Hwangkeum. Because soybeans have been reproduced by self-pollination, the detected variants were expected to be predominantly homozygous variants. But the homozygous variants were found only 65% in both accessions. According to Pinson and Rutger (1993), genetic doubling event, fusion of genotypically different cells in chimeric callus, and abnormal meiosis that resulted in heterozygous diploid microspores could occur.

We also found that variants present in 4,290 genes and 5,672 genes in Daewon and Hwangkeum, respectively are likely to have a major impact on gene function. Similar results were found in soybean genome according to resequencing of wild and cultivated soybeans (4,600 genes, approximately) (Lam et al., 2010; Chung et al., 2014). The variants were classified as nonsynonymous and synonymous by amino acid change from CLC genomics workbench 7. The distribution of non-synonymous variants was similar to genome-wide variants. The SNVs and MNV were observed with highest frequency in chromosome 18 of both accessions. But the other variants had different distributions in whole chromosomes. The ratio of non-synonymous to synonymous SNP was 1.45 and 1.39 in Daewon and Hwangkeum, respectively (Table 4). This ratio is similar to ratio shown in soybean (1.40) which is higher than those in Arabidopsis, maize, and sorghum and rice (Clark et al., 2007; Hufford et al., 2012; Lam et al., 2010; McNally et al., 2009; Nelson et al., 2011).

Except partial regions of chromosome 4, 9 and 18, both accessions showed similar patterns of gene expression and non-synonymous polymorphisms against reference genome. In comparison to Hwangkeum, Daewon had lower variants density regions in chromosome 4. Hwangkeum had lower density of variants in chromosome 9 and 18. Large regions of repetitive centromeric and pericentromeric regions in each chromosome had low expression genes and non-synonymously variants. Found in other studies, centromeric regions showed low expressions of genes, in contrast with telomere regions with highly gene expression (Lee et al., 2013). The distribution of variants corresponds with non-centromeric regions, which is similar to the result from SNP genotyping research (Lee et al., 2015) (Fig. 1).

In conclusion, this study reveals large number of polymorphism and non-synonymous variants via RNA-seq of two Korean soybean accessions. The variants included SNVs, MNVs, InDels and replacements were detected in genome and coding regions. The genome-wide variants and non-synonymous variants will serve as valuable resources for genetic and genomic study. In addition, expression profiles and non-synonymous variants within coding regions can be used for identification of agronomic traits with analysis of functional relevance. We believe that this information will be help to find genes associated with useful agronomic trait in future studies.

ACKNOWLEDGEMENTS

This work was carried out with the support of “Cooperative Research Program for Agriculture Science & Technology Development (Project No. PJ00924302)” Rural Development Administration, Republic of Korea.

REFERENCES

1
Arabidopsis Genome Initiative, Nature, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, 408(6814); 796-815 (2000)
2
W-H Chung, N Jeong, J Kim, W K Lee, Y -G Lee, S-H Lee, W Yoon, J-H Kim, I-Y Choi, H-K Choi, J-K Moon, N Kim and S-C Jeong, DNA Res, Population Structure and Domestication Revealed by High-Depth Resequencing of Korean Cultivated and Wild Soybean Genomes, 21(2); 153-167 (2014)
3
R M Clark, G Schweikert, C Toomajian, S Ossowski, G Zeller, P Shinn, N Warthmann, T T Hu, G Fu, D A Hinds, H Chen, K A Frazer, D H Huson, B Schölkopf, M Nordborg, G Rätsch, J R Ecker and D Weigel, Science, Common Sequence Polymorphisms Shaping Genetic Diversity in Arabidopsis thaliana, 317(5836); 338-342 (2007)
4
W J Haun, D L Hyten, W W Xu, D J Gerhardt, T J Albert, T Richmond, J A Jeddeloh, G Jia, N M Springer, C P Vance and R M Stupar, Plant Physiol, The composition and origins of genomic variation among individuals of the soybean reference cultivar Williams 82, 155(2); 645-655 (2011)
5
M B Hufford, X Xu, J van Heerwaarden, T Pyhäjärvi, J-M Chia, R A Cartwright, R J Elshire, J C Glaubitz, K E Guill, S M Kaeppler, J Lai, P L Morrell, L M Shannon, C Song, N M Springer, R A Swanson-Wagner, P Tiffin, J Wang, G Zhang, J Doebley, M D McMullen, D Ware, E S Buckler, S Yang and J Ross-Ibarra, Nat. Genet, Comparative population genomics of maize domestication and improvement, 44(7); 808-811 (2012)
6
D L Hyten, Q Song, Y Zhu, I-Y Choi, R L Nelson, J M Costa, J E Specht, R C Shoemaker and P B Cregan, Proc. Natl. Acad. Sci. U.S.A, Impacts of genetic bottlenecks on soybean genome diversity, 103(45); 16666-16671 (2006)
7
International Rice Genome Sequencing Project, Nature, The map-based sequence of the rice genome, 436(11); 793-800 (2005)
8
M Y Kim, S Lee, K Van, T-H Kim, S-C Jeong, I-Y Choi, D-S Kim, Y-S Lee, D Park, J Ma, W-Y Kim, B-C Kim, S Park, K-A Lee, D H Kim, K H Kim, J H Shin, Y E Jang, K D Kim, W X Liu, T Chaisan, Y J Kang, Y-H Lee, K-H Kim, J-K Moon, J Schmutz, S A Jackson, J Bhak and S-H Lee, Proc. Natl. Acad. Sci. U.S.A, Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb Zucc) genome, 107(51); 22032-22037 (2010)
9
H-M Lam, X Xu, X Liu, W Chen, G Yang, F-L Wong, M-W Li, W He, N Qin, B Wang, J Li, M Jian, J Wang, G Shao, J Wang, S S-M Sun and G Zhang, Nat. Genet, Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection, 42(12); 1053-1059 (2010)
10
W K Lee, N Kim, J Kim, J-K Moon, N Jeong, I-Y Choi, S C Kim, W-H Chung, H S Kim, S-H Lee and S-C Jeong, Theor. Appl. Genet, Dynamic genetic features of chromosomes revealed by comparison of soybean genetic and sequence-based physical maps, 126(4); 1103-1119 (2013)
11
Y-G Lee, N Jeong, J H Kim, K Lee, K H Kim, A Pirani, B-K Ha, S-T Kang, B-S Park, J-K Moon, N Kim and S-C Jeong, Plant J, Development, validation and genetic analysis of a large soybean SNP genotyping array, 81(4); 625-636 (2015)
12
Y Li, S Zhao, J Ma, D Li, L Yan, J Li, X Qi, X Guo, L Zhang, W He, R Chang, Q Liang, Y Guo, C Ye, X Wang, Y Tao, R Guan, J Wang, Y Liu, L Jin, X Zhang, Z Liu, L Zhang, J Chen, K Wang, R Nielsen, R Li, P Chen, W Li, J C Reif, M Purugganan, J Wang, M Zhang, J Wang and L Qiu, BMC Genomics, Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing, 14; 579 (2013)
13
L K McHale, W J Haun, W W Xu, P B Bhaskar, J E Anderson, D L Hyten, D J Gerhardt, J A Jeddeloh and R M Stupar, Plant Physiol, Structural variants in the soybean genome localize to clusters of biotic stress-response genes, 159(4); 1295-1308 (2012)
14
K L McNally, K L Childs, R Bohnert, R M Davidson, K Zhao, V J Ulat, G Zeller, R M Clark, D R Hoen, T Bureau, R EStokowski, D G Ballinger, K A Frazer, D R Cox, B Padhukasahasram, C D Bustamante, D Weigel, D J Mackill, R M Bruskiewich, G Rätsch, C R Buell, H Leung and J E Leach, Proc. Natl. Acad. Sci. U.S.A, Genomewide SNP variation reveals relationships among landraces and modern varieties of rice, 106(30); 12273-12278 (2009)
15
J C Nelson, S Wang, Y Wu, X Li, G Antony, F F White and J Yu, BMC Genomics, Single-nucleotide polymorphism discovery by highthroughput sequencing in sorghum, 12; 352 (2011)
16
S R M Pinson and J N Rutger, Dev. Biol. Plant, Heterozygous diploid plants regenerated from anther culture of F1 rice plants. In Vitro Cell, 29(4); 174-179 (1993)
17
J Schmutz, S B Cannon, J Schlueter, J Ma, T Mitros, W Nelson, D L Hyten, Q Song, J J Thelen, J Cheng, D Xu, U Hellsten, G D May, Y Yu, T Sakurai, T Umezawa, M K Bhattacharyya, D Sandhu, B Valliyodan, E Lindquist, M Peto, D Grant, S Shu, D Goodstein, K Barry, M Futrell-Griggs, B Abernathy, J Du, Z Tian, L Zhu, N Gill, T Joshi, M Libault, A Sethuraman, X-C Zhang, K Shinozaki, H T Nguyen, R A Wing, P Cregan, J Specht, J Grimwood, D Rokhsar, G Stacey, R C Shoemaker and S A Jackson, Nature, Genome sequence of the palaeopolyploid soybean, 463(7278); 178-183 (2010)
18
Q Song, D L Hyten, G Jia, C V Quigley, E W Fickus, R L Nelson and P B Cregan, Plos One, Development and Evaluation of SoySNP50K, a High-Density Genotyping Array for Soybean, 8(1); e54985 (2013)
19
G K Subbaiyan, D L E Waters, S Katiyar, A R KSadananda, S Vaddadi and R J Henry, Plant Biotechnol. J, Genome-wide DNA polymorphisms in elite indica rice inbreds discovered by wholegenome sequencing, 10(6); 623-634 (2012)
20
K Van, E-Y Hwang, M Y Kim, H J Park, S-H Lee and P B Cregan, J. Hered, Discovery of SNPs in Soybean Genotypes Frequently Used as the Parents of Mapping Populations in the United States and Korea, 96(5); 529-535 (2005)
21
X Wu, C Ren, T Joshi, T Vuong, D Xu and H T Nguyen, BMC Genomics, SNP discovery by high-throughput sequencing in soybean, 11; 469 (2010)
22
Y L Zhu, Q J Song, D L Hyten, C P Van Tassell, L K Matukumalli, D R Grimm, S M Hyatt, E W Fickus, N D Young and P B Cregan, Genetics, Single-nucleotide polymorphisms in soybean, 163(3); 1123-1134 (2003)
23
X Zou, C Shi, R S Austin, D Merico, S Munholland, F Marsolais, A Navabi, W L Crosby, K P Pauls, K Yu and Y Cui, Mol. Breeding, Genome-wide single nucleotide polymorphism and Insertion-Deletion discovery through next-generation sequencing of reduced representation libraries in common bean, 33(4); 769-778 (2014)
페이지 상단으로 이동하기