Copyright (C) Hani J. Doss and B. Narasimhan -------------------------------------------- This file provides information on installing and using BSA (Bayesian Sensitivity Analysis) software. We expect any serious user of the software to read our report (ddg.ps.gz), which is included in the subdirectory ddg. Requirements: A) XLisp-Stat 3.52-5 or higher. Freely available from ftp://ftp.stat.umn.edu/pub/xlispstat/current B) Windows or Unix. Windows includes 95 and NT. (Mac version is in development.) C) On Unix, you will also need a C compiler. Step 1. ------ You probably received the entire package as a compressed archive named bsa.tgz. On Unix, the contents of the archive may be extracted into a directory called bsa by executing the commands: gunzip -c bsa.tgz | tar -xvf - If you are using GNU tar, this can be done in one shot via: tar -xvzf bsa.tgz To extract the files on Windows, you need to use an extractor like WinZip. See http://www.winzip.com for more details. Step 2 ------ On Windows, you skip this step. Macs are not supported yet because the authors don't know how to create a dynamic shared library. On Unix, you need to configure the software to your environment. Change to the bsa directory and type ./configure make This will compile the lisp files and create a shared library for your platform. Step 3 ------ Start using the program. In Unix, the command xlispstat BreastCancerRadiationOnly or xlispstat BreastCancerRadiationChemo will fire up the readymade examples. On Windows, you fire up Lisp-Stat and load the files BreastCancerRadiationOnly.lsp or BreastCancerRadiationChemo.lsp to proceed. If you are inclined to use commands instead of mouse-clicks, you can send ``messages'' to the master object in the listener window as shown in the following examples: (send BreastCancerRadiationOnly-master :current-hyperparameter-values '(1 4 150 12)) (send BreastCancerRadiationOnly-master :print-all-statistics) (send BreastCancerRadiationOnly-master :slot-value 'bsa::importance-weights) Note how the master-objects names are related to the lisp file names and how internal slot-names have to be prefixed by the package name. For dealing with a new problem, we provide a few points regarding the software. A number of inputs are required for running the program. These are discussed in detail in the literate program (bsa.ps) under the section titled ``Introduction.'' For convenience we repeat the details here. This excerpted part is indented two spaces for easy reference. First note that the software only does sensitivity analysis. No general facility is provided for generating observations from Markov chains. Indeed, since the range of models for which MCMC methods are applicable are large and such methods most likely involve problem-specific issues, it is our opinion that building such a supertool, if it is at all possible, is a non-trivial task. However, the Fortran program used in generating the output for our example is included along with this software and can be used for models similar to ours. Of course, any appropriate method may be used to generate the samples as long as the output is available in a form usable by our software. The requirements on the data that can be used with our software are spelt out below. Corresponding to each Markov chain output, there must be two files with the extensions ".in" (input file) and ".out" (output file). For example, "mc1.in" and "mc1.out". The input file must have the following structure. The first four items in the file can be anything, string or number, either on a single line or any conceivable combination of lines. The next three items *must* be the shape of the Gamma distribution on theta, the scale of the Gamma distribution on theta---the parametrization for shape a and scale b is proportional to x^{a-1} exp(-bx)---and M(R). The next three values values following these quantities can be anything, but the one following it should be the number of data points, that is, the number of sets or intervals. In the Fortran program we use -99 is used to denote infinity. Nothing else is read from the input file. The output file must have the following structure for each data point generated by the Markov chain. The value of theta must be followed by the number of distinct values of the data points, which must be followed by a frequency table of the actual data value and the corresponding frequency. The layout of the values on lines does not matter as long as at least a single white space delimits values. If this structure is violated, errors will result. A peek at the data files included with this software will help the reader. It is assumed that a proper installation of XLisp-Stat is available. For a new problem, you probably have several Markov chain output files although even one should work. (In the latter case, reweighting reduces to simple Importance Sampling.) a) It is best to create a new directory for your problem and have your data files there. For example, the directory "BreastCancer" contains relevant data files for our Breast Cancer data. b) The only files you actually need to run the program are: 1) Either one of bsa.fsl or bsa.lsp 2) Either one of utility.fsl or utility.lsp 3) Either one of call-by-reference.fsl or call-by-reference.lsp and 4) the shared library libbsa.so or libbsa.sl as the case may be. On Windows, instead of the shared library, we need the whole subdirectory "win". Copy these files/directories to where you have the data files and work there. 5) The file new-problem.lsp. Invoke Lisp-Stat and load the file named "new-problem.lsp". The first time (and first time only), the following inputs will be needed. Inputs ------ 1) An indentifier for uniquely identifying the run. Use a meaningful name here. Let us assume this is BreastCancer (the default) in the discussion below. 2) The number of Markov chain outputs that you want to use for reweighting. Must be >= 1, with 1 denoting straight Importance Sampling. 3) The names of the files containing output from Markov chains, *without the extensions*. The software will automatically tag on the extensions .in and .out when looking for files. 4) The number of points per chain to use in the dynamic reweighting. Thus if you specify 50 and have 8 chains, then 50 points from each of the eight chains (= 400) will be used. 5) An initial guess for maximizing the log-quasilikelihood which will provide an estimate of the constants of proportionality. 6) The range between which you want to vary the hyperparameters. If you use only one chain, then you *must* specify the range. Otherwise, the range will be a single point. If you specified many chains, the default settings for each hyperparameter will be the minimum and maximum values from values used in all Markov chains. The number of stops should be an odd number if you want to hit the middle of the interval. 7) The number of points to use in estimating the constants of proportionality. If you use all of the data, the estimation can take a while. It is almost always better to go with the default or less. (If you are really interested in using more points, then start off with 10, and use the estimates thus obtained to start your larger optimization. This will save you a lot of time.) Once you specify this, the maximization will take place. This is a good point to go refill your coffee cup. After the estimation, two files are created so that you are not bombarded with questions in subsequent explorations. For example, BreastCancer.lsp (and) BreastCancer.run For repeating the exploration next time, you only need to load the file BreastCancer.lsp into XLisp-Stat. This will bypass all the inputs we discussed above except for the question about ranges. The file BreastCancer.run contains pre-processed information for faster loading and will be used when BreastCancer.lsp is loaded. All files are text files and can be viewed with a text viewer. Examples for Breast Cancer Data ------------------------------- The two files BreastCancerRadiationOnly.lsp (and) BreastCancerRadiationChemo.lsp and the corresponding run files are provided for experimentation. These exist in the main directory "bsa" itself and concern the dataset on two treatments described in the report Dynamic Display of Changing Posterior in Bayesian Survival Analysis by Hani J. Doss and B. Narasimhan By default, they use 50 points each and 8 Markov chains. We wish to note that an earlier version of the software was used to produce the results in the report and subsequently a bug was found. This does not change any of the conclusions of the report but the numbers shown in table 1.1 in the report are off from the actual values obtained using the software. A replacement is provided in the file newtable.tex and shows that the agreement between estimates obtained by reweighting and those obtained by actual runs of Markov chains is, if anything, better that what the original table indicated. Just load the lisp files into XLisp-Stat to do the dynamic exploration. If you have a sufficiently fast machine, you can use more points. To completely reproduce our work from scratch, you need to use the data files in the subdirectory "BreastCancer". The authors may be contacted via email at: doss@stat.ohio-state.edu (Hani J. Doss) naras@stat.Stanford.EDU (B. Narasimhan) Note on the program itself -------------------------- The programs are written in a literate style using the Noweb literate programming tools. We provide two utility packages that one can use independently of the program: utility.lsp and call-by reference.lsp. The former contains functions we have found useful in writing Lispstat programs; the latter implements a call-by-reference glue between Lispstat and C. Enjoy!