of high throughput sequencing has allowed researchers to generate copious
amounts of sequence data, raising the possibility that whole genome sequencing (WGS)
may become a viable method for cloning genes. For organisms such as C. elegans or Drosophila, genome coverage sufficient to guarantee that
every nucleotide in the genome is sampled at least once is easily attainable. 'galign' is software for identifying variants in C. elegans genomes from WGS data.
The newest, and most streamlined version of galign can be downloaded here (follow instructions in the short Readme file).
Source codes are available here. A paper describing 'galign' in more
detail is available
here. For a description of the older versions of the program with some information still relevant to the current 1.0 version, go here.
galign makes comparisons to the C. elegans genome release 195 annotation and then produces several files partitioned by whether lesions are examined in exons, introns, or intergenic regions. csnp_exons (or csnp_intron or csnp_intergenic) files are the raw SNP comparisons to the 195 genome. To weed out additional mutations in the background, these files are then compared to 25 different WGS runs we previously generated in the lab. This results in 3 more files: (1) exons_txt, which is the csnp_exons file in which SNPs common to the 25 sequenced genome files are noted with asterisks. (2) exons_nr.txt, is the exons_txt file in which asterisk containing SNPs have been removed. (3) exons_ems.txt, is a list of coding SNPs conforming to ethylmethanesulfonate (EMS)-induced mutational changes. _del files are predicted deletion sites based on the absence of reads (see more details in paper above). We have found the exons_nr.txt and _del files to be the most useful.