################################################## ## Classic univariate statistics - R Tutorial ## ################################################## # Please read the tutorial's general instructions first, and prepare # for loading an external datafile, when you are going to use one. # The content of this file can be pasted directly into the R console. # It should blurp lost of errors only at the point where you # are supposed to have an external dataset available. # However, it is much handier to use the "display file" command # within the R menu to look at this file, and paste command by command # to the console, using control-V. Take your time to experiment a bit # with the listed commands. library(gnlm) #---------------------------------------- # .Rclassics: Univariate Statistics in R. #---------------------------------------- # -------------------------------- # | I. Probability Distributions | # -------------------------------- # Hypothesis testing in statistics makes use of probability distributions. # They are also an essential part of model checking # You can simulate probability distributions in R, # calculate p-values from them, and fit them to data. # Many distributions are available. # Let's simulate a sample from a binomial distribution. # We draw 100 samples of 20 individuals, and the # probability of success per individual is 0.34. help(Binomial) x<-rbinom(100,20,0.34) hist(x) # For model checking, quantile plots are often used. # I guess that everybody is familiar with normal probability plots. qqnorm(x) # However, you can make the same type of plots for other distributions # as well. These always have 'quantiles' of a probability distribution # on one axis, and ordered datapoints on the vertical. # If the distribution fits, you should observe a more or less straight line. plot(qt(ppoints(x),9),sort(x)) # compare sample x to a poisson with mean 9. plot(qbinom(ppoints(x),20,0.34),sort(x)) # compare to a binomial. plot(qgamma(ppoints(x),9),sort(x)) ## compare to a gamma. # Take a look at the ppoints() function. # We can also fit a distribution to a sample using likelihood theory. # Now follows an example using car accidents data discussed in # Lindsey (1995). f3 <- c(447,132,42,21,3,2) #Car Accidents y3 <- seq(0,5)#categories z3 <- fit.dist(y3,f3,"Poisson",plot=T,xlab="Number of accidents", main="",bty="L") # fit a Poisson # watch out, the AIC here is deviance plus parameters!!! z3a <- fit.dist(y3,f3,"negative binomial",exact=F,plot=T,add=T,lty=3) z3b <- fit.dist(y3,f3,"negative binomial") # fit.dist() allows us to fit a distribution either using exact # integration or an approximation to the probability distribution. # -------------------------- # | II. Hypothesis Testing | # -------------------------- # Hypothesis testing on samples is also available in R. # Please type in two samples of data. sample1<-scan() sample2<-scan() #We now run a test on them, to test whether they have equal variances var.test(sample1, sample2) # Many tests are available in R. Please type the following to take a # look at the available ones. library(help=ctest) # Try to run some on samples of data. # Another example from the help files: # Under (the assumption of) simple Mendelian inheritance, a cross # between plants of two particular genotypes produces progeny 1/4 of # which are ``dwarf'' and 3/4 of which are ``giant'', respectively. # In an experiment to determine if this assumption is reasonable, a # cross results in progeny having 243 dwarf and 682 giant plants. # If ``giant'' is taken as success, the null hypothesis is that p = # 3/4 and the alternative that p != 3/4. binom.test(c(682, 243), p = 3/4) binom.test(682, 682 + 243, p = 3/4) # Produces the same result. # What is the conclusion? # ------------------------------------------------------------- # When finished running this file, please continue with # the more advanced tutorial files. # There are more examples of specific graphics in those. # ------------------------------------------------------------- # Much of the material in this short tutorial comes from # Modern Applied Statistics with S # by W. N. Venables and B. D. Ripley (2002) # Introductory Statistics. A Modelling Approach # J. K. Lindsey. 1995. # Tom Van Dooren, version 17/10/2002