##################################################
 ##  Classic univariate statistics - R Tutorial  ##
 ##################################################

# Please read the tutorial's general instructions first, and prepare 
# for loading an external datafile, when you are going to use one.
# The content of this file can be pasted directly into the R console.
# It should blurp lost of errors only at the point where you 
# are supposed to have an external dataset available.
# However, it is much handier to use the "display file" command 
# within the R menu to look at this file, and paste command by command 
# to the console, using control-V.  Take your time to experiment a bit 
# with the listed commands.

library(gnlm)

#----------------------------------------


# .Rclassics: Univariate Statistics in R.


#----------------------------------------


# --------------------------------
# | I. Probability Distributions |
# --------------------------------


# Hypothesis testing in statistics makes use of probability distributions. 
# They are also an essential part of model checking
# You can simulate probability distributions in R, 
# calculate p-values from them, and fit them to data.
# Many distributions are available.
# Let's simulate a sample from a binomial distribution.
# We draw 100 samples of 20 individuals, and the 
# probability of success per individual is 0.34.

help(Binomial)

x<-rbinom(100,20,0.34)
hist(x)

# For model checking, quantile plots are often used.
# I guess that everybody is familiar with normal probability plots.

qqnorm(x)

# However, you can make the same type of plots for other distributions 
# as well. These always have 'quantiles' of a probability distribution 
# on one axis, and ordered datapoints on the vertical. 
# If the distribution fits, you should observe a more or less straight line. 

plot(qt(ppoints(x),9),sort(x)) # compare sample x to a poisson with mean 9.
plot(qbinom(ppoints(x),20,0.34),sort(x)) # compare to a binomial.
plot(qgamma(ppoints(x),9),sort(x)) ## compare to a gamma.

# Take a look at the ppoints() function.


# We can also fit a distribution to a sample using likelihood theory.
# Now follows an example using car accidents data discussed in
# Lindsey (1995).

f3 <- c(447,132,42,21,3,2) #Car Accidents
y3 <- seq(0,5)#categories
z3 <- fit.dist(y3,f3,"Poisson",plot=T,xlab="Number of accidents", main="",bty="L") # fit a Poisson

# watch out, the AIC here is deviance plus parameters!!!

z3a <- fit.dist(y3,f3,"negative binomial",exact=F,plot=T,add=T,lty=3)
z3b <- fit.dist(y3,f3,"negative binomial")

# fit.dist() allows us to fit a distribution either using exact 
# integration or an approximation to the probability distribution.


# --------------------------
# | II. Hypothesis Testing |
# --------------------------

# Hypothesis testing on samples is also available in R.
# Please type in two samples of data.

sample1<-scan()
sample2<-scan()

#We now run a test on them, to test whether they have equal variances

var.test(sample1, sample2)

# Many tests are available in R. Please type the following to take a 
# look at the available ones.

library(help=ctest)

# Try to run some on samples of data.

# Another example from the help files: 
# Under (the assumption of) simple Mendelian inheritance, a cross
# between plants of two particular genotypes produces progeny 1/4 of
# which are ``dwarf'' and 3/4 of which are ``giant'', respectively.
# In an experiment to determine if this assumption is reasonable, a
# cross results in progeny having 243 dwarf and 682 giant plants.
# If ``giant'' is taken as success, the null hypothesis is that p =
# 3/4 and the alternative that p != 3/4.

binom.test(c(682, 243), p = 3/4)
binom.test(682, 682 + 243, p = 3/4)   # Produces the same result.

# What is the conclusion?

# -------------------------------------------------------------
# When finished running this file, please continue with 
# the more advanced tutorial files.
# There are more examples of specific graphics in those.
# -------------------------------------------------------------


# Much of the material in this short tutorial comes from

# Modern Applied Statistics with S
# by W. N. Venables and B. D. Ripley (2002)


# Introductory Statistics. A Modelling Approach
# J. K. Lindsey. 1995.


# Tom Van Dooren, version 17/10/2002