Back to SimWalk2 Overview

SimWalk2: What's New

If you are still using a version of SimWalk2 older than 2.60, then please consider this a mandatory upgrade since the fixes introduced in version 2.60 can effect the p-values of the statistics from the Non-Parametric Linkage analysis!

Changes from version 2.89 to 2.91


The Kinship Exchange File now includes the simple IBD probabilities; and hence is renamed the IBD & Kinship Exchange File (IKEF).


Fixed a bug in the code to detect inbreeding. Fixed a bug in the options used to compile the Sun version, which should now work on more data sets. Other platforms' executables should now be more robust as well.

Improved coordination with Mendel version 5.7 or greater, particularly when using SimWalk's kinship coefficients for QTL analysis by Mendel.

Reduced the number of SNPs the simwalk2snp version can handle to work around limitations in various systems. For most platforms, the number of SNPs is now 768 but for Windows it had to be reduced to 128.

Changes from version 2.86 to 2.89


Changed NPL scores. They are now scaled by subtracting the unconditional mean and dividing by the unconditional standard deviation. Here "unconditional" means without regard to any marker genotype data. These scaling factors are estimated by gene dropping at only one locus. Also, weighted each NPL score by the square root of the number of affecteds.

Changed NPL output. The NPL statistics now have more descriptive names. Also, in the overall output file, only values that are based on all the pedigrees are now output.


When performing the Mistyping option, new pedigree files are produced that echo the original pedigrees, except that each "mistyped genotype" is blanked out. Here "mistyped genotype" is defined as a genotype with mistyping probability above the threshold for significant mistyping. This threshold has the default value 0.5 but can be set using batch item #42. These new output pedigree files are called PEDNU-nn.mmm.

Slightly changed the main mistyping output files, TYPNG-nn.mmm, by adding the pedigree name to each result line. Since each genotype with a significant probability of being a mistyping (see batch item #42) is marked with "##", one can now look at only these lines and locate each mistyping in the entire dataset. For example, on some platforms,"grep -h ## TYP*" is an easy way to pull out all the significant results.

Added output files, AEF-nn.ALL and AEKEY-nn.TXT, reporting imputed, expected allele counts. These are always written under the Mistyping option.


New haplotype output file, HMNDL-nn.ALL, that lists the haplotypes in list-directed, comma-separated, MENDEL format.


Extends run length automatically if minimum genetic distance between flanking markers is below some threshold (0.2 cM) and the user has not already adjusted the run parameters accordingly.

Skips pedigrees with just one, non-inbred affected, when using any analysis that requires affecteds. (Pedigrees with no affecteds were already skipped.)

Added a second executable to the distribution package. The executable simwalk2snp is set-up to analyze data sets containing many biallelic markers.

Changes from version 2.83 to 2.86


The IBD analysis should now be much faster for the vast majority of pedigrees. Intermediate values in the IBD calculations are now usually stored in RAM. For the largest pedigrees, the huge number of intermediate values are still stored on disk. Also, the IBD output files have been redesigned and expanded. These files now contain the theoretical kinship coefficients as well as the empirical (also known as, conditional) kinship coefficients.

The empirical kinship coefficients are also output in a new file titled KEF-nn.ALL. KEF stands for Kinship Exchange Format. This file is designed to make it easy to export all the kinship data to QTL programs (that use variance component modeling for gene localization of quantitative traits) such as Mendel 5 or SOLAR. There is also a new file titled KEKEY-nn.TXT that contains a key to some of the labels in the KEF file.

There is a new batch item #39 that indicates which output files the IBD analysis should generate. This batch item takes the values:
KIN (to output only the two kinship files) or IBD (to output only the three IBD files for each pedigree) or ALL (to output all five types of files; the default value).

Batch item #49 has also been redesigned. This batch item determines at which points within the marker region, the IBD and kinship values will be calculated. As before, the first value is an integer that sets the number of evenly spaced points within each marker interval that will be used. This value may be 0 (zero), in which case only the values at the markers will be calculated.

If the first value in batch item #49 is set to -1, then the second value, which must be a positive real number, is used to set the increment in cM for a grid of points that starts at the first marker and extends up to the last marker. Note that the marker positions are always used, in addition to this grid of points. For example, setting batch item #49 using the three lines:


would cause SimWalk2 to calculate the IBD and kinship values at every 0.5 cM (on the sex-averaged map) starting at the first marker and continuing until the last marker is reached. Any marker position not falling on this grid would also be used as a calculation point.

Finally, recall that setting batch item #17 to -1 causes SimWalk2 to output the IBD calculations for all pairs of individuals.


You can now input multiple pairs of coordinated Locus and Pedigree files. This is useful, for example, if you have pedigrees from different ethnic groups, and thus with different allele frequencies. Another common occurrence is to have pedigrees that were genotyped in different laboratories, and thus have differing allele sets.

To use this feature, the number of loci, their names, and their map positions must be identical for all locus and pedigree files. Between any of the pairs of coordinated locus and pedigree files, at any marker locus, the number of alleles, their names, and their frequencies may differ. However, if there is a trait locus, its data must be indentical in all the locus files. If you have, for example, three pairs of locus and pedigree files, then set the first value in batch items #10 and #11 to "**3" (without the quotes) and follow this with the file names in order. To be concrete, using the lines:



would cause SimWalk2 to analyze all pedigrees in the file using the data in the file, and would similarly coordinate between pedigree-2.dat and locus-2.dat, etc. Any overall analysis involving the pedigrees from all three files would be computed appropriately. (Note that strictly speaking, parametric LOD scores computed using different allele frequencies are not summable. However, as is common, we ignore this subtlety.)

Finally, for all analysis options, all output tables that were previously tab-delimited have been reformatted to be comma-delimited (also known as, comma separated values or CVS).

Changes from version 2.82 to 2.83


In version 2.83 a feature has been added for users interested primarily in the NPL statistics STAT-D (NPL_pairs) and STAT-E (NPL_all). If the user has Mega2 2.3R3 (or later) and Merlin 0.9.2 (or later), then SimWalk2 2.83 (or later) can use the exact statistics computed quickly by Merlin on small pedigrees. SimWalk2 will combine these exact scores with the estimates it obtains for any large pedigrees and then compute all empirical p-values, both per pedigree and overall. This great time savings is coordinated using Mega2's option #24.

Changes from version 2.80 to 2.82


The mistyping analysis option that was introduced in version 2.80 is now greatly improved. The format of the output remains the same: at each observed genotype an overall posterior probability of mistyping at that genotype is produced. However, the MCMC sampling procedure is now much improved and the results now agree very well with exact analysis on all those pedigrees that can be examined using both methods.


It is now much easier to study possible locus heterogeneity. The degree of locus heterogeneity is commonly measured by alpha, defined as the a priori portion of pedigrees linked at a location, i.e., if there is a trait locus in the region, alpha is the proportion of these pedigrees that contains an affected genotype at this trait locus.

Overall parametric linkage analysis results are now always reported in two ways: (1) for alpha = 1.0 (i.e., assumming no locus heterogeneity) and (2) maximized over a grid of alpha values: 0.00, 0.05, 0.10, ..., 0.90, 0.95, 1.00. In addition, if an alpha value less than 1.0 is specified in batch item #13.2, then several more results are listed. These additional results include for each pedigree and each position the posterior probability that that pedigree is one of the linked pedigrees, given the selected value for alpha. This enables one to separate the linked and unlinked pedigrees in an unbiased fashion.

Changes from version 2.60 to 2.80


New option for SimWalk2 to run silently, i.e., with absolutely no screen output. See batch item #5 in the control file. This option is particularly useful for running SimWalk2 in the background.

New output file in all analysis options, VIDEO-nn.TXT. This file contains up to date progress messages describing the current state of the SimWalk2 run. This file is produced regardless of whether SimWalk2 is reporting messages to the screen.

New input data file for the locus map information. The order of the loci and the recombination frequencies should now be listed in the map data file, whose name is specified in batch item #9 in the control file. The map data file format makes it easy to produce. Please note that all analysis is performed using only those loci in both the map and locus data files.
(If there is a trait locus to be analyzed, then it must be the first locus listed in both the map and locus data files. Also, the previous method of specifying the locus map within the control file will continue to work, for now, but is no longer the preferred method.)


New parameter used to model complex traits. In batch item #13.2 in the control file one may now set the a priori proportion of pedigrees segregating an affected gene linked to the marker loci. That is, when one suspects the trait can be caused by more than one locus (locus heterogeneity), one can specify the likely proportion of the pedigrees in the data set that have an affected gene linked to the marker loci under study. The default value for this parameter is 1.0, i.e., all pedigrees have the linked gene. This parameter is used to weight the location scores, to appropriately take locus heterogeneity into account without "cherry picking" the pedigrees, which would bias the results.


New analysis option detects genotype errors. This analysis option is specified by setting batch item #1 to the value 5. Under this analysis option SimWalk2 reports the overall probability of mistyping at each observed genotype (in fact, at each observed allele). Unfortunately, genotype mistypings are common and can easily mask linkage. Some of these mistypings result in non-Mendelian inheritance and are reasonably easy to find; others are consistent with Mendelian inheritance and are revealed only by the decrease in pedigree likelihood due to the spurious excess recombinations the mistypings imply. By using a multipoint analysis that includes all the available data, SimWalk2 provides mistyping probabilities that take both types of errors into account. When genotypes are flagged with a significant probability of mistyping, the raw data should be re-evaluated and perhaps replicated. As a modicum of missing data is preferable to false data, removing data that is questionable should be considered as well.

Please let me know if you have any questions about SimWalk2 and in particular if there are any changes or additions that would make this program more useful to you.

Thank you,

Eric Sobel

Back to SimWalk2 Overview