As for other major crops, achieving a complete wheat genome sequence is essential for the application of genomics to breeding new and improved varieties. To overcome the complexities of the large, highly repetitive and hexaploid wheat genome, the International Wheat Genome Sequencing Consortium established a chromosome-based strategy that was validated by the construction of the physical map of chromosome 3B. Here, we present improved strategies for the construction of highly integrated and ordered wheat physical maps, using chromosome 1BL as a template, and illustrate their potential for evolutionary studies and map-based cloning.
Using a combination of novel high throughput marker assays and an assembly program, we developed a high quality physical map representing 93% of wheat chromosome 1BL, anchored and ordered with 5,489 markers including 1,161 genes. Analysis of the gene space organization and evolution revealed that gene distribution and conservation along the chromosome results from the superimposition of the ancestral grass and recent wheat evolutionary patterns, leading to a peak of synteny in the central part of the chromosome arm and an increased density of non-collinear genes towards the telomere. With a density of about 11 markers per Mb, the 1BL physical map provides 916 markers, including 193 genes, for fine mapping the 40 QTLs mapped on this chromosome.
Here, we demonstrate that high marker density physical maps can be developed in complex genomes such as wheat to accelerate map-based cloning, gain new insights into genome evolution, and provide a foundation for reference sequencing.
Cereals crops, such as rice, maize, sorghum and wheat, are major caloric sources for humans and farm animals. While reference genome sequencesare available and are already supporting crop improvement in a challenging environment for rice , sorghumand maize , wheat genomics and its applicationis lagging behind. The wheat genome has always been viewed as impossible to sequence because of the large amount of repetitive sequences (>80%) ,gigantic size (17 gigabases (Gb)) and the level of ploidy of bread wheat (2n = 6x = 42). Even with the rapid developments in DNA sequencing technologies that enable the production of gigabases of sequence within a few days , the short read lengths offered by these techniques and the large amount of repeated sequences present in the wheat genome make de novo assembly of non-genic regions extremely difficult . These difficulties can be circumvented by focusing only on the gene catalogue and ignoring the intergenic regions that mostly consist of transposable elements. However, this practice is not justified in light of the results of whole genome functional analyses such as the characterization of 1% of the human genome in the ENCODE project  and association studies performed in maize that clearly indicate the importance of intergenic regions in the regulation of genome expression. Thus, a complete wheat genome sequence is needed to access the complete catalogue of genes and regulatory elements and to provide a framework for understanding the impact of genomic variation on phenotypes. While long read single molecule sequencing may in the future enable tacklingof large and complex genomes using only whole genome shotgun (WGS) sequencing, the only feasible approach at this time to obtain a complete reference genome sequence of bread wheat is bacterial artificial chromosome (BAC) by BAC sequencing based on the construction of robust physical maps.
To reduce the complexity of physically mapping a 17 Gb hexaploid genome containing more than 80% similar or identical sequences, the International Wheat Genome Sequencing Consortium (IWGSC)  has adopted a strategy based on the individual sorting and analysis of chromosome or chromosome arms by flow cytometry  to construct specific BAC libraries. The first BAC library was used successfully to establish a chromosome landing-ready physical map of chromosome 3B, the largest wheat chromosome (1 Gb) . This physical map has been used in several studies to analyze the composition and organization of the wheat gene space, provide estimates of the gene number, and determine the relative proportion of transposable elements families in the wheat genome[5, 15, 16]. In contrast to early cytogenetic studies based on expressed sequence tag(EST) mapping suggesting that most of the genes are found in a few large, gene-rich regions,these analysesrevealed the presence of numerous small gene islands dispersed along the chromosome and no geneless region larger than 800 kilobases (kb). Moreover, access to physical maps and sequences helped to refine collinearityrelationships between wheat and the other grass genomesby providing a higher level of resolution than genetic or cytogenetic mapping[15, 16, 18].The strategy used to build the physical map of wheat chromosome 3B was based on a high-information-content fingerprinting method  and FingerPrinted Contigs (FPC) software[20, 21] for the assemblies. It resulted in 1,036 contigs with an N50 of 778 kb covering 82% of the chromosome. To improve physical assembly in complex genomes, new software, called Linear Topological Contig (LTC),has been developed recently as an attractive alternative to FPC. It enables longer, better ordered and more robust contigs to be built compared to FPC contigs . Physical maps are only useful when they are anchored to genetic maps and traits with markers. PCR methods used to anchor the physical map of chromosome 3B resulted in a marker density of 1.4 markers per megabase (Mb)and 56% of the physical map anchored. While useful for many map-based cloning projects, this marker density is far from that obtained in rice  or maize  (8 and 12 markers per Mb, respectively) and should be increased for breeding purposes. High throughput anchoring platforms that increasethe number of genes anchored to the physical maps have been developed in wheat recently but more anchoring resources and efforts are still needed. In addition to anchoring the physical map with markers, it is important to order the physical contigs along the chromosomes. Here, the wheat genome is again a challenge because of the uneven distribution and lack of recombination in more than half of the chromosomes.
In this work, we used a combination ofnew high throughput genotyping assays and synteny with other grass genomes to establish a physical map of the wheat chromosome 1BL with the highest marker density for a wheat physical map so far (11 markers per Mb), a high level of anchoring (74% in the deletion bins; 19% on the genetic map) and a good percentage (48%) of contigs ordered along the chromosome arm. This physical map allowed us to gain new insights into chromosome evolution and refine estimates of physical sizes of deletion bins.Further, it provides a powerful tool for chromosome landing and for sequencing chromosome 1BL in the near future. The new high throughput marker assays combined with the optimized assembly and ordering methodologies proposed here can be applied to other plant genomes with similar levels of redundancy and complexity.
A 1BL-specific BAC library, containing 92,160 clones originating from sorted wheat chromosome 1BL of Chinese Spring and representing 15.4x coverage of the arm , was fingerprinted with the SNaPshot technology.A total of65,413 high quality fingerprints (71%) wasobtained and used to build a physical map. A first automated assembly was performed with the FPC software [20, 21]following the guidelines adopted by the IWGSC. This resulted in an assembly of 43,523 fingerprints into 3,030 contigs representing 807 Mb(151% of chromosome 1BL) with a N50 of 434 kb and a L50 of 391.A minimal tiling path (MTP) of 8,597 clones was designed and re-arrayed for further marker screening and analyses. Sixty-three -dimensional (plate, row and column) pools from the MTP and 240 plate pools from the whole 1BL BAC library were produced.During the course of the project, a new software -LTC  -specifically developed to build physical maps in complex genomessuch as wheat,became available.To improve the assembly of the 1BL physical map for future sequencing, we performed an automated LTC assembly using the same 65,413 high quality fingerprints. It resulted in an assembly of 41,940 fingerprints(including 94.4% in common with the FPC assembly) into 694 contigs representing 502 Mb (94% of the chromosome arm) with a N50 value of 961kb and a L50 of162. The maximum contig size was of 5,800 kb in the LTC map, three times longer than the1,780 kb in the FPC. This improved LTCmap was used as a template for adding the marker and order information and for building a final version of the map.
At the beginning of the project, there were only 171 1BL-specific PCR markers (114 single sequence repeats (SSR) and 57 restriction fragment length polymorphisms (RFLP)) available publicly in the GrainGenes database. Thus, to develop a high density integrated physical map of chromosome 1BL, that is, a map comprising BAC contigs anchored to genetic and cytogenetic maps with a high number (>1,000) of molecular markers, we developed new molecular markers and anchored them to the 1BL physical contigs and genetic or cytogenetic maps.
Representation of the integrated physical and genetic map and distribution of recombination rate along wheat chromosome 1BL. (A) Representation of the 1BL deletion bin map. The centromere is represented as a grey circle and the nine deletion bins are represented by colored boxes as follows: C-1BL11-0.23 deletion bin in purple, 1BL11-0.23-0.32 in pink, the 1BL6-0.32-0.47 in blue, 1BL1-0.47-0.61 in light blue, 1BL14-0.61-0.69 in green, 1BL2-0.69-0.74 in light green, 1BL8-0.74-0.85 in yellow, 1BL3-0.85-0.89 in red and1BL4-0.89-1.00 in dark red. The number of physical contigs assigned to a bin and the cumulative size of these contigs are indicated. When contigs carried BACs that were assigned to two different consecutive bins indicating that they likely are at the junction between the bins, the contig was counted for 0.5 in each bin. (B) Representation of the 1BL neighbor genetic map. The map is divided into segments corresponding to the deletion bins except for deletion bins 1BL11-0.23-0.32 and 1BL6-0.32-0.47 that were merged. (C)Representation of the ratio between the genetic and the physical distances along the 1BL chromosome using physical contigs to estimate the bin sizes. The dotted line corresponds to the average ratio on the whole chromosome arm. Values are expressed in cM/Mb. 2b1af7f3a8