Title: | Generalized Nonlinear Models |
---|---|
Description: | Functions to specify and fit generalized nonlinear models, including models with multiplicative interaction terms such as the UNIDIFF model from sociology and the AMMI model from crop science, and many others. Over-parameterized representations of models are used throughout; functions are provided for inference on estimable parameter combinations, as well as standard methods for diagnostics etc. |
Authors: | Heather Turner [aut, cre] , David Firth [aut] , Brian Ripley [ctb], Bill Venables [ctb], Douglas M. Bates [ctb], Martin Maechler [ctb] |
Maintainer: | Heather Turner <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.1-5 |
Built: | 2024-11-07 04:27:41 UTC |
Source: | https://github.com/hturner/gnm |
Functions to specify, fit and evaluate generalized nonlinear models.
gnm
provides functions to fit generalized nonlinear models by
maximum likelihood. Such models extend the class of generalized linear
models by allowing nonlinear terms in the predictor.
Some special cases are models with multiplicative interaction terms, such as the UNIDIFF and row-column association models from sociology and the AMMI and GAMMI models from crop science; stereotype models for ordered categorical response, and diagonal reference models for dependence on a square two-way classification.
gnm
is a major re-working of an earlier Xlisp-Stat package,
"Llama". Over-parameterized representations of models are used
throughout; functions are provided for inference on estimable
parameter combinations, as well as standard methods for diagnostics
etc.
The following documentation provides further information on the
gnm
package:
vignette("gnmOverview", package = "gnm")
file.show(system.file("NEWS", package = "gnm"))
Heather Turner and David Firth
Maintainer: Heather Turner <[email protected]>
http://www.warwick.ac.uk/go/gnm
gnm
for the model fitting function, with links to
associated functions.
demo(gnm)
demo(gnm)
Compute an analysis of deviance table for one or more generalized nonlinear models
## S3 method for class 'gnm' anova(object, ..., dispersion = NULL, test = NULL)
## S3 method for class 'gnm' anova(object, ..., dispersion = NULL, test = NULL)
object |
an object of class |
... |
additional objects of class |
dispersion |
the dispersion parameter for the fitting family. By
default it is derived from |
test |
(optional) a character string, (partially) matching one
of |
Specifying a single object gives a sequential analysis of deviance table for that fit. The rows of the table show the reduction in the residual deviance and the current residual deviance as each term in the formula is added in turn.
If more than one object is specified, the rows of the table show the residual deviance of the current model and the change in the residual deviance from the previous model. (This only makes statistical sense if the models are nested.) It is conventional to list the models from smallest to largest, but this is up to the user.
If test
is specified, the table will include test statistics
and/or p values for the reduction in deviance. For models with known
dispersion (e.g., binomial and Poisson fits) the chi-squared test is
most appropriate, and for those with dispersion estimated by moments
(e.g., 'gaussian', 'quasibinomial' and 'quasipoisson' fits) the F test
is most appropriate. Mallows' Cp statistic is the residual deviance
plus twice the estimate of times the residual degrees of
freedom, which is closely related to AIC (and a multiple of it if the
dispersion is known).
An object of class "anova"
inheriting from class "data.frame"
.
The comparison between two or more models will only
be valid if they are fitted to the same dataset. This may be a problem
if there are missing values and R's default of na.action = na.omit
is used; an error will be given in this case.
Modification of anova.glm
by the R Core Team. Adapted
for "gnm"
objects by Heather Turner.
set.seed(1) ## Fit a uniform association model separating diagonal effects Rscore <- scale(as.numeric(row(occupationalStatus)), scale = FALSE) Cscore <- scale(as.numeric(col(occupationalStatus)), scale = FALSE) Uniform <- glm(Freq ~ origin + destination + Diag(origin, destination) + Rscore:Cscore, family = poisson, data = occupationalStatus) ## Fit an association model with homogeneous row-column effects RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), family = poisson, data = occupationalStatus) ## Fit an association model with separate row and column effects RC <- gnm(Freq ~ origin + destination + Diag(origin, destination) + Mult(origin, destination), family = poisson, data = occupationalStatus) anova(RC, test = "Chisq") anova(Uniform, RChomog, RC, test = "Chisq")
set.seed(1) ## Fit a uniform association model separating diagonal effects Rscore <- scale(as.numeric(row(occupationalStatus)), scale = FALSE) Cscore <- scale(as.numeric(col(occupationalStatus)), scale = FALSE) Uniform <- glm(Freq ~ origin + destination + Diag(origin, destination) + Rscore:Cscore, family = poisson, data = occupationalStatus) ## Fit an association model with homogeneous row-column effects RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), family = poisson, data = occupationalStatus) ## Fit an association model with separate row and column effects RC <- gnm(Freq ~ origin + destination + Diag(origin, destination) + Mult(origin, destination), family = poisson, data = occupationalStatus) anova(RC, test = "Chisq") anova(Uniform, RChomog, RC, test = "Chisq")
asGnm
is a generic function which coerces objects of class
"glm" or "lm" to an object of class "gnm".
asGnm(object, ...)
asGnm(object, ...)
object |
an object of class "glm" or "lm". |
... |
additional arguments for method functions. |
Components are added to or removed from object
to produce an
object of class "gnm". This can be useful in model building,
see examples.
An object of class "gnm" - see gnm
for full description.
Heather Turner
Vargas, M, Crossa, J, van Eeuwijk, F, Sayre, K D and Reynolds, M P (2001). Interpreting treatment by environment interaction in agronomy trials. Agronomy Journal 93, 949–960.
set.seed(1) ## Scale yields to reproduce analyses reported in Vargas et al (2001) yield.scaled <- wheat$yield * sqrt(3/1000) treatment <- interaction(wheat$tillage, wheat$summerCrop, wheat$manure, wheat$N, sep = "") ## Fit linear model mainEffects <- lm(yield.scaled ~ year + treatment, data = wheat) ## Convert to gnm object to allow addition of Mult() term svdStart <- residSVD(mainEffects, year, treatment, 3) bilinear1 <- update(asGnm(mainEffects), . ~ . + Mult(year, treatment), start = c(coef(mainEffects), svdStart[,1]))
set.seed(1) ## Scale yields to reproduce analyses reported in Vargas et al (2001) yield.scaled <- wheat$yield * sqrt(3/1000) treatment <- interaction(wheat$tillage, wheat$summerCrop, wheat$manure, wheat$N, sep = "") ## Fit linear model mainEffects <- lm(yield.scaled ~ year + treatment, data = wheat) ## Convert to gnm object to allow addition of Mult() term svdStart <- residSVD(mainEffects, year, treatment, 3) bilinear1 <- update(asGnm(mainEffects), . ~ . + Mult(year, treatment), start = c(coef(mainEffects), svdStart[,1]))
Data from a study of patients suffering from back pain. Prognostic variables were recorded at presentation and progress was categorised three weeks after treatment.
backPain
backPain
A data frame with 101 observations on the following 4 variables.
length of previous attack.
pain change.
lordosis.
an ordered factor describing the progress of each
patient with levels worse
< same
<
slight.improvement
< moderate.improvement
<
marked.improvement
< complete.relief
.
https://ideas.repec.org/c/boc/bocode/s419001.html
Anderson, J. A. (1984) Regression and Ordered Categorical Variables. J. R. Statist. Soc. B, 46(1), 1-30.
set.seed(1) summary(backPain) ### Re-express as count data backPainLong <- expandCategorical(backPain, "pain") ### Fit models described in Table 5 of Anderson (1984) ### Logistic family models noRelationship <- gnm(count ~ pain, eliminate = id, family = "poisson", data = backPainLong) ## stereotype model oneDimensional <- update(noRelationship, ~ . + Mult(pain, x1 + x2 + x3)) ## multinomial logistic threeDimensional <- update(noRelationship, ~ . + pain:(x1 + x2 + x3)) ### Models to determine distinguishability in stereotype model ## constrain scale of category-specific multipliers oneDimensional <- update(noRelationship, ~ . + Mult(pain, offset(x1) + x2 + x3)) ## obtain identifiable contrasts; id possibly indistinguishable slopes getContrasts(oneDimensional, pickCoef(oneDimensional, "[.]pain")) ## Not run: ## (this part not needed for package testing) ## fit simpler models and compare .pain <- backPainLong$pain levels(.pain)[2:3] <- paste(levels(.pain)[2:3], collapse = " | ") fiveGroups <- update(noRelationship, ~ . + Mult(.pain, x1 + x2 + x3)) levels(.pain)[4:5] <- paste(levels(.pain)[4:5], collapse = " | ") fourGroups <- update(fiveGroups) levels(.pain)[2:3] <- paste(levels(.pain)[2:3], collapse = " | ") threeGroups <- update(fourGroups) ### Grouped continuous model, aka proportional odds model library(MASS) sixCategories <- polr(pain ~ x1 + x2 + x3, data = backPain) ### Obtain number of parameters and log-likelihoods for equivalent ### multinomial models as presented in Anderson (1984) logLikMultinom <- function(model, size){ object <- get(model) if (inherits(object, "gnm")) { l <- sum(object$y * log(object$fitted/size)) c(nParameters = object$rank - nlevels(object$eliminate), logLikelihood = l) } else c(nParameters = object$edf, logLikelihood = -deviance(object)/2) } size <- tapply(backPainLong$count, backPainLong$id, sum)[backPainLong$id] models <- c("threeDimensional", "oneDimensional", "noRelationship", "fiveGroups", "fourGroups", "threeGroups", "sixCategories") t(sapply(models, logLikMultinom, size)) ## End(Not run)
set.seed(1) summary(backPain) ### Re-express as count data backPainLong <- expandCategorical(backPain, "pain") ### Fit models described in Table 5 of Anderson (1984) ### Logistic family models noRelationship <- gnm(count ~ pain, eliminate = id, family = "poisson", data = backPainLong) ## stereotype model oneDimensional <- update(noRelationship, ~ . + Mult(pain, x1 + x2 + x3)) ## multinomial logistic threeDimensional <- update(noRelationship, ~ . + pain:(x1 + x2 + x3)) ### Models to determine distinguishability in stereotype model ## constrain scale of category-specific multipliers oneDimensional <- update(noRelationship, ~ . + Mult(pain, offset(x1) + x2 + x3)) ## obtain identifiable contrasts; id possibly indistinguishable slopes getContrasts(oneDimensional, pickCoef(oneDimensional, "[.]pain")) ## Not run: ## (this part not needed for package testing) ## fit simpler models and compare .pain <- backPainLong$pain levels(.pain)[2:3] <- paste(levels(.pain)[2:3], collapse = " | ") fiveGroups <- update(noRelationship, ~ . + Mult(.pain, x1 + x2 + x3)) levels(.pain)[4:5] <- paste(levels(.pain)[4:5], collapse = " | ") fourGroups <- update(fiveGroups) levels(.pain)[2:3] <- paste(levels(.pain)[2:3], collapse = " | ") threeGroups <- update(fourGroups) ### Grouped continuous model, aka proportional odds model library(MASS) sixCategories <- polr(pain ~ x1 + x2 + x3, data = backPain) ### Obtain number of parameters and log-likelihoods for equivalent ### multinomial models as presented in Anderson (1984) logLikMultinom <- function(model, size){ object <- get(model) if (inherits(object, "gnm")) { l <- sum(object$y * log(object$fitted/size)) c(nParameters = object$rank - nlevels(object$eliminate), logLikelihood = l) } else c(nParameters = object$edf, logLikelihood = -deviance(object)/2) } size <- tapply(backPainLong$count, backPainLong$id, sum)[backPainLong$id] models <- c("threeDimensional", "oneDimensional", "noRelationship", "fiveGroups", "fourGroups", "threeGroups", "sixCategories") t(sapply(models, logLikMultinom, size)) ## End(Not run)
Incidence of R. secalis on the leaves of ten varieties of barley grown at nine sites.
barley
barley
A data frame with 90 observations on the following 3 variables.
the proportion of leaf affected (values in [0,1])
a factor with 9 levels A
to I
a factor with 10 levels c(1:9, "X")
This dataset was used in Wedderburn's original paper (1974) on quasi-likelihood.
Originally in an unpublished Aberystwyth PhD thesis by J F Jenkyn.
Gabriel, K R (1998). Generalised bilinear regression. Biometrika 85, 689–700.
McCullagh, P and Nelder, J A (1989) Generalized Linear Models (2nd ed). Chapman and Hall.
Wedderburn, R W M (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton method. Biometrika 61, 439–47.
set.seed(1) ### Fit Wedderburn's logit model with variance proportional to [mu(1-mu)]^2 logitModel <- glm(y ~ site + variety, family = wedderburn, data = barley) fit <- fitted(logitModel) print(sum((barley$y - fit)^2 / (fit * (1-fit))^2)) ## Agrees with the chi-squared value reported in McCullagh and Nelder ## (1989, p331), which differs slightly from Wedderburn's reported value. ### Fit the biplot model as in Gabriel (1998, p694) biplotModel <- gnm(y ~ -1 + instances(Mult(site, variety), 2), family = wedderburn, data = barley) barleySVD <- svd(matrix(biplotModel$predictors, 10, 9)) A <- sweep(barleySVD$v, 2, sqrt(barleySVD$d), "*")[, 1:2] B <- sweep(barleySVD$u, 2, sqrt(barleySVD$d), "*")[, 1:2] ## These are essentially A and B as in Gabriel (1998, p694), from which ## the biplot is made by plot(rbind(A, B), pch = c(levels(barley$site), levels(barley$variety))) ## Fit the double-additive model as in Gabriel (1998, p697) variety.binary <- factor(match(barley$variety, c(2,3,6), nomatch = 0) > 0, labels = c("rest", "2,3,6")) doubleAdditive <- gnm(y ~ variety + Mult(site, variety.binary), family = wedderburn, data = barley) ## It is unclear why Gabriel's chi-squared statistics differ slightly ## from the ones produced in these fits. Possibly Gabriel adjusted the ## data somehow prior to fitting?
set.seed(1) ### Fit Wedderburn's logit model with variance proportional to [mu(1-mu)]^2 logitModel <- glm(y ~ site + variety, family = wedderburn, data = barley) fit <- fitted(logitModel) print(sum((barley$y - fit)^2 / (fit * (1-fit))^2)) ## Agrees with the chi-squared value reported in McCullagh and Nelder ## (1989, p331), which differs slightly from Wedderburn's reported value. ### Fit the biplot model as in Gabriel (1998, p694) biplotModel <- gnm(y ~ -1 + instances(Mult(site, variety), 2), family = wedderburn, data = barley) barleySVD <- svd(matrix(biplotModel$predictors, 10, 9)) A <- sweep(barleySVD$v, 2, sqrt(barleySVD$d), "*")[, 1:2] B <- sweep(barleySVD$u, 2, sqrt(barleySVD$d), "*")[, 1:2] ## These are essentially A and B as in Gabriel (1998, p694), from which ## the biplot is made by plot(rbind(A, B), pch = c(levels(barley$site), levels(barley$variety))) ## Fit the double-additive model as in Gabriel (1998, p697) variety.binary <- factor(match(barley$variety, c(2,3,6), nomatch = 0) > 0, labels = c("rest", "2,3,6")) doubleAdditive <- gnm(y ~ variety + Mult(site, variety.binary), family = wedderburn, data = barley) ## It is unclear why Gabriel's chi-squared statistics differ slightly ## from the ones produced in these fits. Possibly Gabriel adjusted the ## data somehow prior to fitting?
Average heights for 15 genotypes of barley recorded over 9 years.
barleyHeights
barleyHeights
A data frame with 135 observations on the following 3 variables.
height
average height over 4 replicates (cm)
year
a factor with 9 levels 1974
to 1982
genotype
a factor with 15 levels 1:15
Aastveit, A. H. and Martens, H. (1986). ANOVA interactions interpreted by partial least squares regression. Biometrics, 42, 829–844.
Chadoeuf, J and Denis, J B (1991). Asymptotic variances for the multiplicative interaction model. J. App. Stat. 18(3), 331–353.
set.seed(1) ## Fit AMMI-1 model barleyModel <- gnm(height ~ year + genotype + Mult(year, genotype), data = barleyHeights) ## Get row and column scores with se's gamma <- getContrasts(barleyModel, pickCoef(barleyModel, "[.]y"), ref = "mean", scaleWeights = "unit") delta <- getContrasts(barleyModel, pickCoef(barleyModel, "[.]g"), ref = "mean", scaleWeights = "unit") ## Corresponding CI's similar to Chadoeuf and Denis (1991) Table 8 ## (allowing for change in sign) gamma[[2]][,1] + (gamma[[2]][,2]) %o% c(-1.96, 1.96) delta[[2]][,1] + (delta[[2]][,2]) %o% c(-1.96, 1.96) ## Multiplier of row and column scores height <- matrix(scale(barleyHeights$height, scale = FALSE), 15, 9) R <- height - outer(rowMeans(height), colMeans(height), "+") svd(R)$d[1]
set.seed(1) ## Fit AMMI-1 model barleyModel <- gnm(height ~ year + genotype + Mult(year, genotype), data = barleyHeights) ## Get row and column scores with se's gamma <- getContrasts(barleyModel, pickCoef(barleyModel, "[.]y"), ref = "mean", scaleWeights = "unit") delta <- getContrasts(barleyModel, pickCoef(barleyModel, "[.]g"), ref = "mean", scaleWeights = "unit") ## Corresponding CI's similar to Chadoeuf and Denis (1991) Table 8 ## (allowing for change in sign) gamma[[2]][,1] + (gamma[[2]][,2]) %o% c(-1.96, 1.96) delta[[2]][,1] + (delta[[2]][,2]) %o% c(-1.96, 1.96) ## Multiplier of row and column scores height <- matrix(scale(barleyHeights$height, scale = FALSE), 15, 9) R <- height - outer(rowMeans(height), colMeans(height), "+") svd(R)$d[1]
A 4-way contingency table of vote by class by religion in four French elections
cautres
cautres
A table of counts, with classifying factors vote
(levels
1:2
), class
(levels 1:6
) and religion
(levels 1:4
) and election
(levels 1:4
).
Bruno Cautres
Cautres, B, Heath, A F and Firth, D (1998). Class, religion and vote in Britain and France. La Lettre de la Maison Francaise 8.
set.seed(1) ## Fit a "double UNIDIFF" model with the religion-vote and class-vote ## interactions both modulated by nonnegative election-specific multipliers doubleUnidiff <- gnm(Freq ~ election*vote + election*class*religion + Mult(Exp(election), religion:vote) + Mult(Exp(election), class:vote), family = poisson, data = cautres) ## Deviance should be 133.04 ## Examine the multipliers of the class-vote log odds ratios ofInterest(doubleUnidiff) <- pickCoef(doubleUnidiff, "class:vote[).]") coef(doubleUnidiff) ## Coefficients of interest: ## Mult(Exp(.), class:vote).election1 ## -0.38357138 ## Mult(Exp(.), class:vote).election2 ## 0.29816599 ## Mult(Exp(.), class:vote).election3 ## 0.06580307 ## Mult(Exp(.), class:vote).election4 ## -0.02174104 ## Re-parameterize by setting Mult2.Factor1.election1 to zero getContrasts(doubleUnidiff, ofInterest(doubleUnidiff)) ## estimate SE ## Mult(Exp(.), class:vote).election1 0.0000000 0.0000000 ## Mult(Exp(.), class:vote).election2 0.6817374 0.2401644 ## Mult(Exp(.), class:vote).election3 0.4493745 0.2473521 ## Mult(Exp(.), class:vote).election4 0.3618301 0.2534754 ## quasiSE quasiVar ## Mult(Exp(.), class:vote).election1 0.22854401 0.052232363 ## Mult(Exp(.), class:vote).election2 0.07395886 0.005469913 ## Mult(Exp(.), class:vote).election3 0.09475938 0.008979340 ## Mult(Exp(.), class:vote).election4 0.10934798 0.011956981 ## Same thing but with election 4 as reference category: getContrasts(doubleUnidiff, rev(ofInterest(doubleUnidiff))) ## estimate SE ## Mult(Exp(.), class:vote).election4 0.00000000 0.0000000 ## Mult(Exp(.), class:vote).election3 0.08754436 0.1446833 ## Mult(Exp(.), class:vote).election2 0.31990727 0.1320022 ## Mult(Exp(.), class:vote).election1 -0.36183013 0.2534754 ## quasiSE quasiVar ## Mult(Exp(.), class:vote).election4 0.10934798 0.011956981 ## Mult(Exp(.), class:vote).election3 0.09475938 0.008979340 ## Mult(Exp(.), class:vote).election2 0.07395886 0.005469913 ## Mult(Exp(.), class:vote).election1 0.22854401 0.052232363
set.seed(1) ## Fit a "double UNIDIFF" model with the religion-vote and class-vote ## interactions both modulated by nonnegative election-specific multipliers doubleUnidiff <- gnm(Freq ~ election*vote + election*class*religion + Mult(Exp(election), religion:vote) + Mult(Exp(election), class:vote), family = poisson, data = cautres) ## Deviance should be 133.04 ## Examine the multipliers of the class-vote log odds ratios ofInterest(doubleUnidiff) <- pickCoef(doubleUnidiff, "class:vote[).]") coef(doubleUnidiff) ## Coefficients of interest: ## Mult(Exp(.), class:vote).election1 ## -0.38357138 ## Mult(Exp(.), class:vote).election2 ## 0.29816599 ## Mult(Exp(.), class:vote).election3 ## 0.06580307 ## Mult(Exp(.), class:vote).election4 ## -0.02174104 ## Re-parameterize by setting Mult2.Factor1.election1 to zero getContrasts(doubleUnidiff, ofInterest(doubleUnidiff)) ## estimate SE ## Mult(Exp(.), class:vote).election1 0.0000000 0.0000000 ## Mult(Exp(.), class:vote).election2 0.6817374 0.2401644 ## Mult(Exp(.), class:vote).election3 0.4493745 0.2473521 ## Mult(Exp(.), class:vote).election4 0.3618301 0.2534754 ## quasiSE quasiVar ## Mult(Exp(.), class:vote).election1 0.22854401 0.052232363 ## Mult(Exp(.), class:vote).election2 0.07395886 0.005469913 ## Mult(Exp(.), class:vote).election3 0.09475938 0.008979340 ## Mult(Exp(.), class:vote).election4 0.10934798 0.011956981 ## Same thing but with election 4 as reference category: getContrasts(doubleUnidiff, rev(ofInterest(doubleUnidiff))) ## estimate SE ## Mult(Exp(.), class:vote).election4 0.00000000 0.0000000 ## Mult(Exp(.), class:vote).election3 0.08754436 0.1446833 ## Mult(Exp(.), class:vote).election2 0.31990727 0.1320022 ## Mult(Exp(.), class:vote).election1 -0.36183013 0.2534754 ## quasiSE quasiVar ## Mult(Exp(.), class:vote).election4 0.10934798 0.011956981 ## Mult(Exp(.), class:vote).election3 0.09475938 0.008979340 ## Mult(Exp(.), class:vote).election2 0.07395886 0.005469913 ## Mult(Exp(.), class:vote).election1 0.22854401 0.052232363
For each of a specified set of linear combinations of parameters from a
gnm
model, checks numerically whether the combination's
estimate is invariant to re-parameterization of the model.
checkEstimable(model, combMatrix = diag(length(coef(model))), tolerance = NULL)
checkEstimable(model, combMatrix = diag(length(coef(model))), tolerance = NULL)
model |
a model object of class |
combMatrix |
numeric: either a vector of length the same as
|
tolerance |
numeric: a threshold value for detection of
non-estimability. If |
A logical vector of length equal to the number of parameter combinations
tested; NA
where a parameter combination is identically zero.
David Firth and Heather Turner
Catchpole, E.A. and Morgan, B.J.T. (1997). Detecting parameter redundancy. Biometrika, 84, 187–196.
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## Check whether multiplier contrast educ4 - educ5 is estimable ofInterest(unidiff) <- pickCoef(unidiff, "[.]educ") mycontrast <- numeric(length(coef(unidiff))) mycontrast[ofInterest(unidiff)[4:5]] <- c(1, -1) checkEstimable(unidiff, mycontrast) ## should be TRUE ## Check whether multiplier educ4 itself is estimable mycontrast[ofInterest(unidiff)[5]] <- 0 checkEstimable(unidiff, mycontrast) ## should be FALSE -- only *differences* are identified here
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## Check whether multiplier contrast educ4 - educ5 is estimable ofInterest(unidiff) <- pickCoef(unidiff, "[.]educ") mycontrast <- numeric(length(coef(unidiff))) mycontrast[ofInterest(unidiff)[4:5]] <- c(1, -1) checkEstimable(unidiff, mycontrast) ## should be TRUE ## Check whether multiplier educ4 itself is estimable mycontrast[ofInterest(unidiff)[5]] <- 0 checkEstimable(unidiff, mycontrast) ## should be FALSE -- only *differences* are identified here
Computes confidence intervals for one or more parameters in a generalized nonlinear model, based on the profiled deviance.
## S3 method for class 'gnm' confint(object, parm = ofInterest(object), level = 0.95, trace = FALSE, ...) ## S3 method for class 'profile.gnm' confint(object, parm = names(object), level = 0.95, ...)
## S3 method for class 'gnm' confint(object, parm = ofInterest(object), level = 0.95, trace = FALSE, ...) ## S3 method for class 'profile.gnm' confint(object, parm = names(object), level = 0.95, ...)
object |
an object of class |
parm |
(optional) either a numeric vector of indices or a
character vector of names, specifying the parameters for which
confidence intervals are to be estimated. If |
level |
the confidence level required. |
trace |
a logical value indicating whether profiling should be traced. |
... |
arguments passed to or from other methods |
These are methods for the generic function confint
in the
base
package.
For "gnm"
objects, profile.gnm
is first called to
profile the deviance over each parameter specified by parm
, or
over all parameters in the model if parm
is NULL
.
The method for "profile.gnm"
objects is then called, which
interpolates the deviance profiles to estimate the limits of the
confidence interval for each parameter, see profile.gnm
for more details.
If a "profile.gnm"
object is passed directly to confint
,
parameters specified by parm
must be a subset of the profiled
parameters.
For unidentified parameters a confidence interval cannot be calculated
and the limits will be returned as NA
. If the deviance curve
has an asymptote and a limit of the confidence interval cannot be
reached, the limit will be returned as -Inf
or Inf
appropriately. If the range of the profile does not extend far enough
to estimate a limit of the confidence interval, the limit will be
returned as NA
. In such cases, it may be desirable create a
profile object directly, see profile.gnm
for more
details.
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in % (by default 2.5% and 97.5%).
Modification of MASS:::confint.glm
by W. N. Venables and
B. D. Ripley. Adapted for "gnm"
objects by Heather Turner.
profile.gnm
, gnm
,
confint.glm
, profile.glm
### Example in which profiling doesn't take too long count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), constrain = "delta1", family = binomial, data = voting) ## profile diagonal effects confint(classMobility, parm = 3:7, trace = TRUE) ## Not run: ### Profiling takes much longer here, but example more interesting! unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", constrain = "[.]educ1", family = poisson, data = yaish, subset = (dest != 7)) ## Letting 'confint' compute profile confint(unidiff, trace = TRUE) ## 2.5 % 97.5 % ## Mult(Exp(.), orig:dest).educ1 NA NA ## Mult(Exp(.), orig:dest).educ2 -0.5978901 0.1022447 ## Mult(Exp(.), orig:dest).educ3 -1.4836854 -0.2362378 ## Mult(Exp(.), orig:dest).educ4 -2.5792398 -0.2953420 ## Mult(Exp(.), orig:dest).educ5 -Inf -0.7007616 ## Creating profile object first with user-specified stepsize prof <- profile(unidiff, trace = TRUE, stepsize = 0.1) confint(prof, ofInterest(unidiff)[2:5]) ## 2.5 % 97.5 % ## Mult(Exp(.), orig:dest).educ2 -0.5978324 0.1022441 ## Mult(Exp(.), orig:dest).educ3 -1.4834753 -0.2362138 ## Mult(Exp(.), orig:dest).educ4 NA -0.2950790 ## Mult(Exp(.), orig:dest).educ5 NA NA ## For 95% confidence interval, need to estimate parameters for which ## z = +/- 1.96. Profile has not gone far enough for last two parameters range(prof[[4]]$z) ## -1.566601 2.408650 range(prof[[5]]$z) ## -0.5751376 1.1989487 ## End(Not run)
### Example in which profiling doesn't take too long count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), constrain = "delta1", family = binomial, data = voting) ## profile diagonal effects confint(classMobility, parm = 3:7, trace = TRUE) ## Not run: ### Profiling takes much longer here, but example more interesting! unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", constrain = "[.]educ1", family = poisson, data = yaish, subset = (dest != 7)) ## Letting 'confint' compute profile confint(unidiff, trace = TRUE) ## 2.5 % 97.5 % ## Mult(Exp(.), orig:dest).educ1 NA NA ## Mult(Exp(.), orig:dest).educ2 -0.5978901 0.1022447 ## Mult(Exp(.), orig:dest).educ3 -1.4836854 -0.2362378 ## Mult(Exp(.), orig:dest).educ4 -2.5792398 -0.2953420 ## Mult(Exp(.), orig:dest).educ5 -Inf -0.7007616 ## Creating profile object first with user-specified stepsize prof <- profile(unidiff, trace = TRUE, stepsize = 0.1) confint(prof, ofInterest(unidiff)[2:5]) ## 2.5 % 97.5 % ## Mult(Exp(.), orig:dest).educ2 -0.5978324 0.1022441 ## Mult(Exp(.), orig:dest).educ3 -1.4834753 -0.2362138 ## Mult(Exp(.), orig:dest).educ4 NA -0.2950790 ## Mult(Exp(.), orig:dest).educ5 NA NA ## For 95% confidence interval, need to estimate parameters for which ## z = +/- 1.96. Profile has not gone far enough for last two parameters range(prof[[4]]$z) ## -1.566601 2.408650 range(prof[[5]]$z) ## -0.5751376 1.1989487 ## End(Not run)
A symbolic wrapper to specify a constant in the predictor of a
"nonlin"
function.
Const(const)
Const(const)
const |
a numeric value. |
A call to rep
used to create a variable representing the
constant in the model frame.
Const
may only be used in the predictor of a "nonlin"
function. Use offset
to specify a constant in the model formula.
Heather Turner
## One way to fit the logistic function without conditional ## linearity as in ?nls library(gnm) set.seed(1) DNase1 <- subset(DNase, Run == 1) test <- gnm(density ~ -1 + Mult(1, Inv(Const(1) + Exp(Mult(1 + offset(-log(conc)), Inv(1))))), start = c(NA, 0, 1), data = DNase1, trace = TRUE) coef(test)
## One way to fit the logistic function without conditional ## linearity as in ?nls library(gnm) set.seed(1) DNase1 <- subset(DNase, Run == 1) test <- gnm(density ~ -1 + Mult(1, Inv(Const(1) + Exp(Mult(1 + offset(-log(conc)), Inv(1))))), start = c(NA, 0, 1), data = DNase1, trace = TRUE) coef(test)
Converts two or more factors into a new factor whose value is 0 where the original factors are not all equal, and nonzero otherwise.
Diag(..., binary = FALSE)
Diag(..., binary = FALSE)
... |
One or more factors |
binary |
Logical |
Used mainly in regression models for data classified by
two or more factors with the same levels. By default,
operates on k-level factors to produce a new factor having k+1 levels;
if binary = TRUE
is specified, the result is a coarser binary
variable equal to 1 where all of the input factors are equal and 0
otherwise.
If the original levels are identical the levels of the factor created
in the binary = FALSE
case will be in the same order, with
"."
added as the first level. Otherwise the levels of the new
factor will be "."
followed by the sorted combined levels.
Either a factor (if binary = FALSE
) or a 0-1 numeric vector
(if binary = TRUE
).
David Firth and Heather Turner
rowfac <- gl(4, 4, 16) colfac <- gl(4, 1, 16) diag4by4 <- Diag(rowfac, colfac) matrix(Diag(rowfac, colfac, binary = TRUE), 4, 4)
rowfac <- gl(4, 4, 16) colfac <- gl(4, 1, 16) diag4by4 <- Diag(rowfac, colfac) matrix(Diag(rowfac, colfac, binary = TRUE), 4, 4)
Dref is a function of class "nonlin"
to specify a diagonal
reference term in the formula argument to gnm
.
Dref(..., delta = ~ 1)
Dref(..., delta = ~ 1)
... |
a comma-separated list of two or more factors. |
delta |
a formula with no left-hand-side specifying the model for each factor weight. |
Dref
specifies diagonal reference terms as introduced by
Sobel (1981, 1985). Such terms comprise an additive component for
each factor of the form
where is the weight for factor
,
is the diagonal effect for level
and
is the level of factor
for the given data point.
The weights are constrained to be nonnegative and to sum to one as follows
and the are modelled as specified by the
delta
argument (constant weights by default). The
returned parameters are those in the model for
, rather than the implied weights
. The
DrefWeights
function will take a fitted gnm
model and return the weights , along with their standard
errors.
If the factors passed to Dref
do not have exactly the same
levels, the set of levels in the diagonal reference term is taken to
be the union of the factor levels, sorted into increasing order.
A list with the anticipated components of a "nonlin" function:
predictors |
the factors passed to |
common |
an index to specify that common effects are to be estimated across the factors. |
term |
a function to create a deparsed mathematical expression of the term, given labels for the predictors. |
start |
a function to generate starting values for the parameters. |
call |
the call to use as a prefix for parameter labels. |
Heather Turner
Sobel, M. E. (1981), Diagonal mobility models: A substantively motivated class of designs for the analysis of mobility effects. American Sociological Review 46, 893–906.
Sobel, M. E. (1985), Social mobility and fertility revisited: Some new models for the analysis of the mobility effects hypothesis. American Sociological Review 50, 699–712.
Clifford, P. and Heath, A. F. (1993) The Political Consequences of Social Mobility. J. Roy. Stat. Soc. A, 156(1), 51-61.
Van der Slik, F. W. P., De Graaf, N. D and Gerris, J. R. M. (2002) Conformity to Parental Rules: Asymmetric Influences of Father's and Mother's Levels of Education. European Sociological Review 18(4), 489 – 502.
### Examples from Clifford and Heath paper ### (Results differ slightly - possible transcription error in ### published data?) set.seed(1) ## reconstruct counts voting Labour/non-Labour count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) ## fit diagonal reference model with constant weights classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) DrefWeights(classMobility) ## create factors indicating movement in and out of salariat (class 1) upward <- with(voting, origin != 1 & destination == 1) downward <- with(voting, origin == 1 & destination != 1) ## fit separate weights for the "socially mobile" groups socialMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward + upward), family = binomial, data = voting) DrefWeights(socialMobility) ## fit separate weights for downwardly mobile groups only downwardMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward), family = binomial, data = voting) DrefWeights(downwardMobility) ## Not run: ### Examples from Van der Slik paper ### For illustration only - data not publically available ### Using data in data.frame named 'conformity', with variables ### MCFM - mother's conformity score ### FCFF - father's conformity score ### MOPLM - a factor describing the mother's education with 7 levels ### FOPLF - a factor describing the father's education with 7 levels ### AGEM - mother's birth cohort ### MRMM - mother's traditional role model ### FRMF - father's traditional role model ### MWORK - mother's employment ### MFCM - mother's family conflict score ### FFCF - father's family conflict score set.seed(1) ## Models for mothers' conformity score as specified in Figure 1 A <- gnm(MCFM ~ -1 + AGEM + MRMM + FRMF + MWORK + MFCM + Dref(MOPLM, FOPLF), family = gaussian, data = conformity, verbose = FALSE) A ## Call: ## gnm(formula = MCFM ~ -1 + AGEM + MRMM + FRMF + MWORK + MFCM + ## Dref(MOPLM, FOPLF), family = gaussian, data = conformity, ## verbose = FALSE) ## ## Coefficients: ## AGEM MRMM ## 0.06363 -0.32425 ## FRMF MWORK ## -0.25324 -0.06430 ## MFCM Dref(MOPLM, FOPLF)delta1 ## -0.06043 -0.33731 ## Dref(MOPLM, FOPLF)delta2 Dref(., .).MOPLM|FOPLF1 ## -0.02505 4.95121 ## Dref(., .).MOPLM|FOPLF2 Dref(., .).MOPLM|FOPLF3 ## 4.86329 4.86458 ## Dref(., .).MOPLM|FOPLF4 Dref(., .).MOPLM|FOPLF5 ## 4.72343 4.43516 ## Dref(., .).MOPLM|FOPLF6 Dref(., .).MOPLM|FOPLF7 ## 4.18873 4.43378 ## ## Deviance: 425.3389 ## Pearson chi-squared: 425.3389 ## Residual df: 576 ## Weights as in Table 4 DrefWeights(A) ## Refitting with parameters of first Dref weight constrained to zero ## $MOPLM ## weight se ## 0.4225636 0.1439829 ## ## $FOPLF ## weight se ## 0.5774364 0.1439829 F <- gnm(MCFM ~ -1 + AGEM + MRMM + FRMF + MWORK + MFCM + Dref(MOPLM, FOPLF, delta = ~1 + MFCM), family = gaussian, data = conformity, verbose = FALSE) F ## Call: ## gnm(formula = MCFM ~ -1 + AGEM + MRMM + FRMF + MWORK + MFCM + ## Dref(MOPLM, FOPLF, delta = ~1 + MFCM), family = gaussian, ## data = conformity, verbose = FALSE) ## ## ## Coefficients: ## AGEM ## 0.05818 ## MRMM ## -0.32701 ## FRMF ## -0.25772 ## MWORK ## -0.07847 ## MFCM ## -0.01694 ## Dref(MOPLM, FOPLF, delta = ~ . + MFCM).delta1(Intercept) ## 1.03515 ## Dref(MOPLM, FOPLF, delta = ~ 1 + .).delta1MFCM ## -1.77756 ## Dref(MOPLM, FOPLF, delta = ~ . + MFCM).delta2(Intercept) ## -0.03515 ## Dref(MOPLM, FOPLF, delta = ~ 1 + .).delta2MFCM ## 2.77756 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF1 ## 4.82476 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF2 ## 4.88066 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF3 ## 4.83969 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF4 ## 4.74850 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF5 ## 4.42020 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF6 ## 4.17957 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF7 ## 4.40819 ## ## Deviance: 420.9022 ## Pearson chi-squared: 420.9022 ## Residual df: 575 ## ## ## Standard error for MFCM == 1 lower than reported by Van der Slik et al DrefWeights(F) ## Refitting with parameters of first Dref weight constrained to zero ## $MOPLM ## MFCM weight se ## 1 1 0.02974675 0.2277711 ## 2 0 0.74465224 0.2006916 ## ## $FOPLF ## MFCM weight se ## 1 1 0.9702532 0.2277711 ## 2 0 0.2553478 0.2006916 ## End(Not run)
### Examples from Clifford and Heath paper ### (Results differ slightly - possible transcription error in ### published data?) set.seed(1) ## reconstruct counts voting Labour/non-Labour count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) ## fit diagonal reference model with constant weights classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) DrefWeights(classMobility) ## create factors indicating movement in and out of salariat (class 1) upward <- with(voting, origin != 1 & destination == 1) downward <- with(voting, origin == 1 & destination != 1) ## fit separate weights for the "socially mobile" groups socialMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward + upward), family = binomial, data = voting) DrefWeights(socialMobility) ## fit separate weights for downwardly mobile groups only downwardMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward), family = binomial, data = voting) DrefWeights(downwardMobility) ## Not run: ### Examples from Van der Slik paper ### For illustration only - data not publically available ### Using data in data.frame named 'conformity', with variables ### MCFM - mother's conformity score ### FCFF - father's conformity score ### MOPLM - a factor describing the mother's education with 7 levels ### FOPLF - a factor describing the father's education with 7 levels ### AGEM - mother's birth cohort ### MRMM - mother's traditional role model ### FRMF - father's traditional role model ### MWORK - mother's employment ### MFCM - mother's family conflict score ### FFCF - father's family conflict score set.seed(1) ## Models for mothers' conformity score as specified in Figure 1 A <- gnm(MCFM ~ -1 + AGEM + MRMM + FRMF + MWORK + MFCM + Dref(MOPLM, FOPLF), family = gaussian, data = conformity, verbose = FALSE) A ## Call: ## gnm(formula = MCFM ~ -1 + AGEM + MRMM + FRMF + MWORK + MFCM + ## Dref(MOPLM, FOPLF), family = gaussian, data = conformity, ## verbose = FALSE) ## ## Coefficients: ## AGEM MRMM ## 0.06363 -0.32425 ## FRMF MWORK ## -0.25324 -0.06430 ## MFCM Dref(MOPLM, FOPLF)delta1 ## -0.06043 -0.33731 ## Dref(MOPLM, FOPLF)delta2 Dref(., .).MOPLM|FOPLF1 ## -0.02505 4.95121 ## Dref(., .).MOPLM|FOPLF2 Dref(., .).MOPLM|FOPLF3 ## 4.86329 4.86458 ## Dref(., .).MOPLM|FOPLF4 Dref(., .).MOPLM|FOPLF5 ## 4.72343 4.43516 ## Dref(., .).MOPLM|FOPLF6 Dref(., .).MOPLM|FOPLF7 ## 4.18873 4.43378 ## ## Deviance: 425.3389 ## Pearson chi-squared: 425.3389 ## Residual df: 576 ## Weights as in Table 4 DrefWeights(A) ## Refitting with parameters of first Dref weight constrained to zero ## $MOPLM ## weight se ## 0.4225636 0.1439829 ## ## $FOPLF ## weight se ## 0.5774364 0.1439829 F <- gnm(MCFM ~ -1 + AGEM + MRMM + FRMF + MWORK + MFCM + Dref(MOPLM, FOPLF, delta = ~1 + MFCM), family = gaussian, data = conformity, verbose = FALSE) F ## Call: ## gnm(formula = MCFM ~ -1 + AGEM + MRMM + FRMF + MWORK + MFCM + ## Dref(MOPLM, FOPLF, delta = ~1 + MFCM), family = gaussian, ## data = conformity, verbose = FALSE) ## ## ## Coefficients: ## AGEM ## 0.05818 ## MRMM ## -0.32701 ## FRMF ## -0.25772 ## MWORK ## -0.07847 ## MFCM ## -0.01694 ## Dref(MOPLM, FOPLF, delta = ~ . + MFCM).delta1(Intercept) ## 1.03515 ## Dref(MOPLM, FOPLF, delta = ~ 1 + .).delta1MFCM ## -1.77756 ## Dref(MOPLM, FOPLF, delta = ~ . + MFCM).delta2(Intercept) ## -0.03515 ## Dref(MOPLM, FOPLF, delta = ~ 1 + .).delta2MFCM ## 2.77756 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF1 ## 4.82476 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF2 ## 4.88066 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF3 ## 4.83969 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF4 ## 4.74850 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF5 ## 4.42020 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF6 ## 4.17957 ## Dref(., ., delta = ~ 1 + MFCM).MOPLM|FOPLF7 ## 4.40819 ## ## Deviance: 420.9022 ## Pearson chi-squared: 420.9022 ## Residual df: 575 ## ## ## Standard error for MFCM == 1 lower than reported by Van der Slik et al DrefWeights(F) ## Refitting with parameters of first Dref weight constrained to zero ## $MOPLM ## MFCM weight se ## 1 1 0.02974675 0.2277711 ## 2 0 0.74465224 0.2006916 ## ## $FOPLF ## MFCM weight se ## 1 1 0.9702532 0.2277711 ## 2 0 0.2553478 0.2006916 ## End(Not run)
Intergenerational class mobility among the male populations of England and Wales; France, and Sweden.
erikson
erikson
A table of counts, with classifying factors origin
(father's
class; levels I
, II
, III
, IVa
,
IVb
, IVc
, V/VI
,
VIIa
, VIIb
) destination
(son's
class; levels as before), and country
(son's country of
residence; levels EW
, F
, S
).
Hauser, R. M. (1984) Vertical Class Mobility in England, France and Sweden. Acta Sociol., 27(2), 87-110.
Erikson, R., Goldthorpe, J. H. and Portocarero, L. (1982) Social Fluidity in Industrial Nations: England, France and Sweden. Brit. J. Sociol. 33(1), 1-34.
Xie, Y. (1992) The Log-multiplicative Layer Effect Model for Comparing Mobility Tables. Am. Sociol. Rev. 57(3), 380-395.
set.seed(1) ### Collapse to 7 by 7 table as in Erikson (1982) erikson <- as.data.frame(erikson) lvl <- levels(erikson$origin) levels(erikson$origin) <- levels(erikson$destination) <- c(rep(paste(lvl[1:2], collapse = " + "), 2), lvl[3], rep(paste(lvl[4:5], collapse = " + "), 2), lvl[6:9]) erikson <- xtabs(Freq ~ origin + destination + country, data = erikson) ### Fit the models given in first half of Table 3 of Xie (1992) ## Null association between origin and destination nullModel <- gnm(Freq ~ country*origin + country*destination, family = poisson, data = erikson) ## Full interaction, common to all countries commonInteraction <- update(nullModel, ~ . + origin:destination) ## Full Interaction, different multiplier for each country multInteraction <- update(nullModel, ~ . + Mult(Exp(country), origin:destination)) ### Create array of interaction levels as in Table 2 of Xie (1992) levelMatrix <- matrix(c(2, 3, 4, 6, 5, 6, 6, 3, 3, 4, 6, 4, 5, 6, 4, 4, 2, 5, 5, 5, 5, 6, 6, 5, 1, 6, 5, 2, 4, 4, 5, 6, 3, 4, 5, 5, 4, 5, 5, 3, 3, 5, 6, 6, 5, 3, 5, 4, 1), 7, 7, byrow = TRUE) ### Fit models in second half of Table 3 in Xie (1992) ## Interaction specified by levelMatrix, common to all countries commonTopo <- update(nullModel, ~ . + Topo(origin, destination, spec = levelMatrix)) ## Interaction specified by levelMatrix, different multiplier for ## each country multTopo <- update(nullModel, ~ . + Mult(Exp(country), Topo(origin, destination, spec = levelMatrix))) ## Interaction specified by levelMatrix, different effects for ## each country separateTopo <- update(nullModel, ~ . + country:Topo(origin, destination, spec = levelMatrix))
set.seed(1) ### Collapse to 7 by 7 table as in Erikson (1982) erikson <- as.data.frame(erikson) lvl <- levels(erikson$origin) levels(erikson$origin) <- levels(erikson$destination) <- c(rep(paste(lvl[1:2], collapse = " + "), 2), lvl[3], rep(paste(lvl[4:5], collapse = " + "), 2), lvl[6:9]) erikson <- xtabs(Freq ~ origin + destination + country, data = erikson) ### Fit the models given in first half of Table 3 of Xie (1992) ## Null association between origin and destination nullModel <- gnm(Freq ~ country*origin + country*destination, family = poisson, data = erikson) ## Full interaction, common to all countries commonInteraction <- update(nullModel, ~ . + origin:destination) ## Full Interaction, different multiplier for each country multInteraction <- update(nullModel, ~ . + Mult(Exp(country), origin:destination)) ### Create array of interaction levels as in Table 2 of Xie (1992) levelMatrix <- matrix(c(2, 3, 4, 6, 5, 6, 6, 3, 3, 4, 6, 4, 5, 6, 4, 4, 2, 5, 5, 5, 5, 6, 6, 5, 1, 6, 5, 2, 4, 4, 5, 6, 3, 4, 5, 5, 4, 5, 5, 3, 3, 5, 6, 6, 5, 3, 5, 4, 1), 7, 7, byrow = TRUE) ### Fit models in second half of Table 3 in Xie (1992) ## Interaction specified by levelMatrix, common to all countries commonTopo <- update(nullModel, ~ . + Topo(origin, destination, spec = levelMatrix)) ## Interaction specified by levelMatrix, different multiplier for ## each country multTopo <- update(nullModel, ~ . + Mult(Exp(country), Topo(origin, destination, spec = levelMatrix))) ## Interaction specified by levelMatrix, different effects for ## each country separateTopo <- update(nullModel, ~ . + country:Topo(origin, destination, spec = levelMatrix))
A utility function to print information on final iteration in
gnm
fit, intended for use when gnm
has not converged.
exitInfo(object)
exitInfo(object)
object |
a |
If gnm
has not converged within the pre-specified maximum
number of iterations, it may be because the algorithm has converged to
a non-solution of the likelihood equations. In order to determine
appropriate action, it is necessary to differentiate this case from
one of near-convergence to the solution.
exitInfo
prints the absolute score and the corresponding
convergence criterion for all parameters which failed to meet the
convergence criterion at the last iteration. Clearly a small number of
parameters with scores close to the criterion suggests
near-convergence.
Heather Turner
Vargas, M, Crossa, J, van Eeuwijk, F, Sayre, K D and Reynolds, M P (2001). Interpreting treatment by environment interaction in agronomy trials. Agronomy Journal 93, 949–960.
## Fit a "double UNIDIFF" model with low iterMax for illustration! set.seed(1) doubleUnidiff <- gnm(Freq ~ election*vote + election*class*religion + Mult(Exp(election), religion:vote) + Mult(Exp(election), class:vote), family = poisson, data = cautres, iterMax = 10) exitInfo(doubleUnidiff)
## Fit a "double UNIDIFF" model with low iterMax for illustration! set.seed(1) doubleUnidiff <- gnm(Freq ~ election*vote + election*class*religion + Mult(Exp(election), religion:vote) + Mult(Exp(election), class:vote), family = poisson, data = cautres, iterMax = 10) exitInfo(doubleUnidiff)
A function of class "nonlin"
to specify the exponential
of a predictor in the formula argument to gnm
.
Exp(expression, inst = NULL)
Exp(expression, inst = NULL)
expression |
a symbolic expression representing the (possibly nonlinear) predictor. |
inst |
(optional) an integer specifying the instance number of the term. |
The expression
argument is interpreted as the right hand side
of a formula in an object of class "formula"
, except that an
intercept term is not added by default. Any function of class
"nonlin"
may be used in addition to the usual operators and
functions.
A list with the components required of a "nonlin"
function:
predictors |
the |
term |
a function to create a deparsed mathematical expression of the term, given a label for the predictor. |
call |
the call to use as a prefix for parameter labels. |
Heather Turner and David Firth
set.seed(1) ## Using 'Mult' with 'Exp' to constrain the first constituent multiplier ## to be non-negative ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7))
set.seed(1) ## Using 'Mult' with 'Exp' to constrain the first constituent multiplier ## to be non-negative ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7))
Expands the rows of a data frame by re-expressing observations of a
categorical variable specified by catvar
, such that the
column(s) corresponding to catvar
are replaced by a factor
specifying the possible categories for each observation and a vector
of 0/1 counts over these categories.
expandCategorical(data, catvar, sep = ".", countvar = "count", idvar = "id", as.ordered = FALSE, group = TRUE)
expandCategorical(data, catvar, sep = ".", countvar = "count", idvar = "id", as.ordered = FALSE, group = TRUE)
data |
a data frame. |
catvar |
a character vector specifying factors in |
sep |
a character string used to separate the concatenated
values of |
countvar |
(optional) a character string to be used for the name of the new count variable. |
idvar |
(optional) a character string to be used for the name of the new factor identifying the original rows (cases). |
as.ordered |
logical - whether the new interaction factor should
be of class |
group |
logical: whether or not to group individuals with common values over all covariates. |
Each row of the data frame is replicated times, where
is the number of levels of the interaction of the factors specified by
catvar
. In the expanded data frame, the columns specified by
catvar
are replaced by a factor specifying the possible
categories for each case, named by the concatenated values of
catvar
separated by sep
. The ordering of factor levels
will be preserved in the creation of the new factor, but this factor
will not be of class "ordered"
unless the argument
as.ordered = TRUE
. A variable with name countvar
is added
to the data frame which is equal to 1 for the observed category in each
case and 0 elsewhere. Finally a factor with name idvar
is added
to index the cases.
The expanded data frame as described in Details.
Re-expressing categorical data in this way allows a multinomial response to be modelled as a poisson response, see examples.
Heather Turner
Anderson, J. A. (1984) Regression and Ordered Categorical Variables. J. R. Statist. Soc. B, 46(1), 1-30.
### Example from help(multinom, package = "nnet") library(MASS) example(birthwt) library(nnet) bwt.mu <- multinom(low ~ ., data = bwt) ## Equivalent using gnm - include unestimable main effects in model so ## that interactions with low0 automatically set to zero, else could use ## 'constrain' argument. bwtLong <- expandCategorical(bwt, "low", group = FALSE) bwt.po <- gnm(count ~ low*(. - id), eliminate = id, data = bwtLong, family = "poisson") summary(bwt.po) # same deviance; df reflect extra id parameters ### Example from ?backPain set.seed(1) summary(backPain) backPainLong <- expandCategorical(backPain, "pain") ## Fit models described in Table 5 of Anderson (1984) noRelationship <- gnm(count ~ pain, eliminate = id, family = "poisson", data = backPainLong) oneDimensional <- update(noRelationship, ~ . + Mult(pain, x1 + x2 + x3))
### Example from help(multinom, package = "nnet") library(MASS) example(birthwt) library(nnet) bwt.mu <- multinom(low ~ ., data = bwt) ## Equivalent using gnm - include unestimable main effects in model so ## that interactions with low0 automatically set to zero, else could use ## 'constrain' argument. bwtLong <- expandCategorical(bwt, "low", group = FALSE) bwt.po <- gnm(count ~ low*(. - id), eliminate = id, data = bwtLong, family = "poisson") summary(bwt.po) # same deviance; df reflect extra id parameters ### Example from ?backPain set.seed(1) summary(backPain) backPainLong <- expandCategorical(backPain, "pain") ## Fit models described in Table 5 of Anderson (1984) noRelationship <- gnm(count ~ pain, eliminate = id, family = "poisson", data = backPainLong) oneDimensional <- update(noRelationship, ~ . + Mult(pain, x1 + x2 + x3))
Cross-classification of the occupation of respondent and that of their closest friend. Data taken from wave 10 (year 2000) of the British Household Panel Survey.
friend
friend
A table of counts, with classifying factors r
(respondent's
occupational category; levels 1:31
) and c
(friend's
occupational category; levels 1:31
).
Chan, T.W. and Goldthorpe, J.H. (2004) Is there a status order in contemporary British society: Evidence from the occupational structure of friendship, European Sociological Review, 20, 383–401.
set.seed(1) ### Fit an association model with homogeneous row-column effects rc1 <- gnm(Freq ~ r + c + Diag(r,c) + MultHomog(r, c), family = poisson, data = friend) rc1 ## Not run: ### Extend to two-component interaction rc2 <- update(rc1, . ~ . + MultHomog(r, c, inst = 2), etastart = rc1$predictors) rc2 ## End(Not run)
set.seed(1) ### Fit an association model with homogeneous row-column effects rc1 <- gnm(Freq ~ r + c + Diag(r,c) + MultHomog(r, c), family = poisson, data = friend) rc1 ## Not run: ### Extend to two-component interaction rc2 <- update(rc1, . ~ . + MultHomog(r, c, inst = 2), etastart = rc1$predictors) rc2 ## End(Not run)
Computes contrasts or scaled contrasts for a set of (non-eliminated)
parameters from a gnm
model, and computes standard errors
for the estimated contrasts. Where possible, quasi standard errors are
also computed.
getContrasts(model, set = NULL, ref = "first", scaleRef = "mean", scaleWeights = NULL, dispersion = NULL, check = TRUE, ...)
getContrasts(model, set = NULL, ref = "first", scaleRef = "mean", scaleWeights = NULL, dispersion = NULL, check = TRUE, ...)
model |
a model object of class |
set |
a vector of indices (numeric) or coefficient names
(character). If |
ref |
either a single numeric index, or a vector
of real numbers which sum to 1, or one of the character
strings |
scaleRef |
as for |
scaleWeights |
either |
dispersion |
either |
check |
|
... |
arguments to pass to other functions. |
The indices in set
must all be in 1:length(coef(object))
. If
set = NULL
, a dialog is presented for the selection
of indices (model coefficients).
For the set of coefficients selected, contrasts and their standard
errors are computed. A check is performed first on the estimability of
all such contrasts (if check = TRUE
) or on a specified subset
(if check
is a numeric index vector). The specific
contrasts to be computed are controlled by the choice of ref
:
this may be "first"
(the default), for contrasts with the first
of the selected coefficients, or "last"
for contrasts with the
last, or "mean"
for contrasts with the arithmetic mean of the
coefficients in the selected set; or it may be an arbitrary vector of
weights (summing to 1, not necessarily all non-negative) which specify
a weighted mean against which contrasts are taken; or it may be a
single index specifying one of the coefficients with which all
contrasts should be taken. Thus, for example, ref = 1
is
equivalent to ref = "first"
, and ref = c(1/3, 1/3, 1/3)
is equivalent to ref = "mean"
when there are three coefficients
in the selected set
.
The contrasts may be scaled by
where is a contrast of the r'th coefficient in
set
with the reference level specified by scaleRef
and is a
vector of weights (of the same length as
set
)
specified by scaleWeights
. If
scaleWeights
is NULL
(the default), scaleRef
is ignored and no scaling is performed. Other options for
scaleWeights
are "unit"
for weights equal to one and
"setLength"
for weights equal to the reciprocal of
length(set)
. If scaleRef
is the same as ref
, these options constrain the sum of squared
contrasts to 1 and length(set)
respectively.
Quasi-variances (and corresponding quasi standard errors) are
reported for unscaled contrasts where possible. These statistics are
invariant to the choice of ref
, see Firth (2003) or Firth and
Menezes (2004) for more details.
An object of class
qv
— see qvcalc
.
David Firth and Heather Turner
Firth, D (2003). Overcoming the reference category problem in the presentation of statistical models. Sociological Methodology 33, 1–18.
Firth, D and Menezes, R X de (2004). Quasi-variances. Biometrika 91, 65–80.
gnm
, se.gnm
,
checkEstimable
, qvcalc
,
ofInterest
### Unscaled contrasts ### set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels -- see ?yaish unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", family = poisson, data = yaish, subset = (dest != 7)) ## Examine the education multipliers (differences on the log scale): unidiffContrasts <- getContrasts(unidiff, ofInterest(unidiff)) plot(unidiffContrasts, main = "Unidiff multipliers (log scale): intervals based on quasi standard errors", xlab = "Education level", levelNames = 1:5) ### Scaled contrasts (elliptical contrasts) ### set.seed(1) ## Goodman Row-Column association model fits well (deviance 3.57, df 8) mentalHealth$MHS <- C(mentalHealth$MHS, treatment) mentalHealth$SES <- C(mentalHealth$SES, treatment) RC1model <- gnm(count ~ SES + MHS + Mult(SES, MHS), family = poisson, data = mentalHealth) ## Row scores and column scores are both unnormalized in this ## parameterization of the model ## The scores can be normalized as in Agresti's eqn (9.15): rowProbs <- with(mentalHealth, tapply(count, SES, sum) / sum(count)) colProbs <- with(mentalHealth, tapply(count, MHS, sum) / sum(count)) mu <- getContrasts(RC1model, pickCoef(RC1model, "[.]SES"), ref = rowProbs, scaleRef = rowProbs, scaleWeights = rowProbs) nu <- getContrasts(RC1model, pickCoef(RC1model, "[.]MHS"), ref = colProbs, scaleRef = colProbs, scaleWeights = colProbs) all.equal(sum(mu$qv[,1] * rowProbs), 0) all.equal(sum(nu$qv[,1] * colProbs), 0) all.equal(sum(mu$qv[,1]^2 * rowProbs), 1) all.equal(sum(nu$qv[,1]^2 * colProbs), 1)
### Unscaled contrasts ### set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels -- see ?yaish unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", family = poisson, data = yaish, subset = (dest != 7)) ## Examine the education multipliers (differences on the log scale): unidiffContrasts <- getContrasts(unidiff, ofInterest(unidiff)) plot(unidiffContrasts, main = "Unidiff multipliers (log scale): intervals based on quasi standard errors", xlab = "Education level", levelNames = 1:5) ### Scaled contrasts (elliptical contrasts) ### set.seed(1) ## Goodman Row-Column association model fits well (deviance 3.57, df 8) mentalHealth$MHS <- C(mentalHealth$MHS, treatment) mentalHealth$SES <- C(mentalHealth$SES, treatment) RC1model <- gnm(count ~ SES + MHS + Mult(SES, MHS), family = poisson, data = mentalHealth) ## Row scores and column scores are both unnormalized in this ## parameterization of the model ## The scores can be normalized as in Agresti's eqn (9.15): rowProbs <- with(mentalHealth, tapply(count, SES, sum) / sum(count)) colProbs <- with(mentalHealth, tapply(count, MHS, sum) / sum(count)) mu <- getContrasts(RC1model, pickCoef(RC1model, "[.]SES"), ref = rowProbs, scaleRef = rowProbs, scaleWeights = rowProbs) nu <- getContrasts(RC1model, pickCoef(RC1model, "[.]MHS"), ref = colProbs, scaleRef = colProbs, scaleWeights = colProbs) all.equal(sum(mu$qv[,1] * rowProbs), 0) all.equal(sum(nu$qv[,1] * colProbs), 0) all.equal(sum(mu$qv[,1]^2 * rowProbs), 1) all.equal(sum(nu$qv[,1]^2 * colProbs), 1)
gnm
fits generalised nonlinear models using an
over-parameterized representation. Nonlinear terms are specified by
calls to functions of class "nonlin"
.
gnm(formula, eliminate = NULL, ofInterest = NULL, constrain = numeric(0), constrainTo = numeric(length(constrain)), family = gaussian, data = NULL, subset, weights, na.action, method = "gnmFit", checkLinear = TRUE, offset, start = NULL, etastart = NULL, mustart = NULL, tolerance = 1e-06, iterStart = 2, iterMax = 500, trace = FALSE, verbose = TRUE, model = TRUE, x = TRUE, termPredictors = FALSE, ridge = 1e-08, ...)
gnm(formula, eliminate = NULL, ofInterest = NULL, constrain = numeric(0), constrainTo = numeric(length(constrain)), family = gaussian, data = NULL, subset, weights, na.action, method = "gnmFit", checkLinear = TRUE, offset, start = NULL, etastart = NULL, mustart = NULL, tolerance = 1e-06, iterStart = 2, iterMax = 500, trace = FALSE, verbose = TRUE, model = TRUE, x = TRUE, termPredictors = FALSE, ridge = 1e-08, ...)
formula |
a symbolic description of the nonlinear predictor. |
eliminate |
a factor to be included as the first term in the
model. |
ofInterest |
optional coefficients of interest, specified by a
regular expression, a numeric vector of indices, a character vector of
names, or "[?]" to select from a Tk dialog. If |
constrain |
(non-eliminated) coefficients to constrain, specified by a regular expression, a numeric vector of indices, a logical vector, a character vector of names, or "[?]" to select from a Tk dialog. |
constrainTo |
a numeric vector of the same length as
|
family |
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See |
data |
an optional data frame containing the variables in the model.
If not found in |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
weights |
an optional vector of weights to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data
contain |
method |
the method to be used: either |
checkLinear |
logical: if |
offset |
this can be used to specify an a priori known component to
be added to the predictor during fitting. |
start |
a vector of starting values for the parameters in the
model; if a starting value is |
etastart |
starting values for the linear predictor. |
mustart |
starting values for the vector of means. |
tolerance |
a positive numeric value specifying the tolerance level for convergence. |
iterStart |
a positive integer specifying the number of start-up iterations to perform. |
iterMax |
a positive integer specifying the maximum number of main iterations to perform. |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
logical: if |
model |
logical: if |
x |
logical: if |
termPredictors |
logical: if |
ridge |
numeric, a positive value for the ridge constant to be used in the fitting algorithm |
... |
further arguments passed to fitting function. |
Models for gnm
are specified by giving a symbolic description
of the nonlinear predictor, of the form response ~ terms
. The
response
is typically a numeric vector, see later in this
section for alternatives. The usual symbolic language may be used to
specify any linear terms, see formula
for details.
Nonlinear terms may be specified by calls to functions of class
"nonlin". There are several "nonlin" functions in the gnm
package. Some of these specify simple
mathematical functions of predictors: Exp
, Mult
, and
Inv
. Others specify more specialised nonlinear terms, in
particular MultHomog
specifies homogeneous multiplicative
interactions and Dref
specifies diagonal reference terms. Users
may also define their own "nonlin" functions, see
nonlin.function
for details.
The eliminate
argument may be used to specify a factor that
is to be included as the first term in the model (since an intercept
is then redundant, none is fitted). The structure of the factor is
exploited to improve computational efficiency — substantially so if
the eliminate
d factor has a large number of levels. Use of
eliminate
is designed for factors that are required in the
model but are not of direct interest (e.g., terms needed to fit
multinomial-response models as conditional Poisson models). See
backPain
for an example.
The ofInterest
argument may be used to specify coefficients of
interest, the indices of which are returned in the ofInterest
component of the model object. print()
displays of the model
object or its components obtained using accessor functions such as
coef()
etc, will only show these coefficients. In addition
methods for "gnm"
objects which may be applied to a subset of
the parameters are by default applied to the coefficients of interest.
See ofInterest
for accessor and replacement functions.
For contingency tables, the data may be provided as an object of class
"table"
from which the frequencies will be extracted to use
as the response. In this case, the response should be specified as
Freq
in the model formula. The "predictors"
,
"fitted.values"
, "residuals"
, "prior.weights"
,
"weights"
, "y"
and "offset"
components of
the returned gnm
fit will be tables with the same format as the
data, completed with NA
s where necessary.
For binomial models, the response
may be specified as a factor
in which the first level denotes failure and all other levels denote
success, as a two-column matrix with the columns giving the numbers
of successes and failures, or as a vector of the proportions of
successes.
The gnm
fitting algorithm consists of two stages. In the start-up
iterations, any nonlinear parameters that are not specified by either the
start
argument of gnm
or a plug-in function are
updated one parameter at a time, then the linear parameters are
jointly updated before the next iteration. In the main iterations, all
the parameters are jointly updated, until convergence is reached or
the number or iterations reaches iterMax
. To solve the
(typically rank-deficient) least squares problem at the heart of the
gnm
fitting algorithm, the design matrix is standardized and
regularized (in the Levenberg-Marquardt sense) prior to solving; the
ridge
argument provides a degree of control over the
regularization performed (smaller values may sometimes give faster
convergence but can lead to numerical instability).
Convergence is judged by comparing the squared components of the score vector
with corresponding elements of the diagonal of the Fisher information
matrix. If, for all components of the score vector, the ratio is less
than tolerance^2
, or the corresponding diagonal element of the
Fisher information matrix is less than 1e-20, iterations cease. If the
algorithm has not converged by iterMax
iterations,
exitInfo
can be used to print information on the
parameters which failed the convergence criteria at the last iteration.
By default, gnm
uses an over-parameterized representation of
the model that is being fitted. Only minimal identifiability constraints
are imposed, so that in general a random parameterization is obtained.
The parameter estimates are ordered so that those for any linear terms
appear first.
getContrasts
may be used to obtain estimates of
specified scaled contrasts, if these contrasts are identifiable. For
example, getContrasts
may be used to estimate the contrasts
between the first level of a factor and the rest, and obtain standard
errors.
If appropriate constraints are known in advance, or have been
determined from a gnm
fit, the model may be (re-)fitted using
the constrain
argument to specify coefficients which should be
set to values specified by constrainTo
. Constraints should only
be specified for non-eliminated parameters. update
provides a convenient way of re-fitting a gnm
model with new
constraints.
If method = "gnmFit"
, gnm
returns NULL
if the
algorithm has failed and an object of class "gnm"
otherwise. A
"gnm"
object inherits first from "glm"
then "lm"
and is a list containing the following components:
call |
the matched call. |
formula |
the formula supplied. |
constrain |
a numeric vector specifying any coefficients that were constrained in the fitting process. |
constrainTo |
a numeric vector of the same length as
|
family |
the |
prior.weights |
the case weights initially supplied. |
terms |
the |
data |
the |
na.action |
the |
xlevels |
a record of the levels of the factors used in fitting. |
y |
the response used. |
offset |
the offset vector used. |
coefficients |
a named vector of non-eliminated coefficients,
with an attribute |
eliminate |
the |
ofInterest |
a named numeric vector of indices corresponding
to non-eliminated coefficients, or |
predictors |
the fitted values on the link scale. |
fitted.values |
the fitted mean values, obtained by transforming the predictors by the inverse of the link function. |
deviance |
up to a constant, minus twice the maximised log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero. |
aic |
Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters (so assuming that the dispersion is known). |
iter |
the number of main iterations. |
conv |
logical indicating whether the main iterations
converged, with an attribute for use by |
weights |
the working weights, that is, the weights used in the last iteration. |
residuals |
the working residuals, that is, the residuals from the last iteration. |
df.residual |
the residual degrees of freedom. |
rank |
the numeric rank of the fitted model. |
The list may also contain the components model
, x
,
or termPredictors
if requested in the arguments to gnm
.
If a table was passed to data
and the default for
na.action
was not overridden, the list will also contain a
table.attr
component, for use by the extractor functions.
If a binomial gnm
model is specified by giving a two-column
response, the weights returned by prior.weights
are the total
numbers of cases (factored by the supplied case weights) and the
component y
of the result is the proportion of successes.
The function summary.gnm
may be used to obtain and print
a summary of the results, whilst plot.gnm
may be used
for model diagnostics.
The generic functions formula
, family
,
terms
, coefficients
,
fitted.values
, deviance
,
extractAIC
, weights
,
residuals
, df.residual
,
model.frame
, model.matrix
,
vcov
and termPredictors
maybe used to
extract components from the object returned by gnm
or to
construct the relevant objects where necessary.
Note that the generic functions weights
and
residuals
do not act as straight-forward accessor
functions for gnm
objects, but return the prior weights and
deviance residuals respectively, as for glm
objects.
Regular expression matching is performed using grep
with
default settings.
Heather Turner and David Firth
Cautres, B, Heath, A F and Firth, D (1998). Class, religion and vote in Britain and France. La Lettre de la Maison Francaise 8.
formula
for the symbolic language used to specify
formulae.
Diag
and Symm
for specifying special types
of interaction.
Exp
, Mult
, Inv
, MultHomog
,
Dref
and nonlin.function
for incorporating
nonlinear terms in the formula
argument to gnm
.
residuals.glm
and the generic functions
coef
, fitted
, etc. for extracting
components from gnm
objects.
exitInfo
to print more information on last iteration
when gnm
has not converged.
getContrasts
to estimate (identifiable) scaled contrasts
from a gnm
model.
### Analysis of a 4-way contingency table set.seed(1) print(cautres) ## Fit a "double UNIDIFF" model with the religion-vote and class-vote ## interactions both modulated by nonnegative election-specific ## multipliers. doubleUnidiff <- gnm(Freq ~ election:vote + election:class:religion + Mult(Exp(election), religion:vote) + Mult(Exp(election), class:vote), family = poisson, data = cautres) ## Examine the multipliers of the class-vote log odds ratios ofInterest(doubleUnidiff) <- pickCoef(doubleUnidiff, "class:vote[).]") coef(doubleUnidiff) ## Coefficients of interest: ## Mult(Exp(.), class:vote).election1 ## -0.38357138 ## Mult(Exp(.), class:vote).election2 ## 0.29816599 ## Mult(Exp(.), class:vote).election3 ## 0.06580307 ## Mult(Exp(.), class:vote).election4 ## -0.02174104 ## Re-parameterize by setting first multiplier to zero getContrasts(doubleUnidiff, ofInterest(doubleUnidiff)) ## estimate SE ## Mult(Exp(.), class:vote).election1 0.0000000 0.0000000 ## Mult(Exp(.), class:vote).election2 0.6817374 0.2401644 ## Mult(Exp(.), class:vote).election3 0.4493745 0.2473521 ## Mult(Exp(.), class:vote).election4 0.3618301 0.2534754 ## quasiSE quasiVar ## Mult(Exp(.), class:vote).election1 0.22854401 0.052232363 ## Mult(Exp(.), class:vote).election2 0.07395886 0.005469913 ## Mult(Exp(.), class:vote).election3 0.09475938 0.008979340 ## Mult(Exp(.), class:vote).election4 0.10934798 0.011956981 ## Same thing but with last multiplier as reference category: getContrasts(doubleUnidiff, rev(ofInterest(doubleUnidiff))) ## estimate SE ## Mult(Exp(.), class:vote).election4 0.00000000 0.0000000 ## Mult(Exp(.), class:vote).election3 0.08754436 0.1446833 ## Mult(Exp(.), class:vote).election2 0.31990727 0.1320022 ## Mult(Exp(.), class:vote).election1 -0.36183013 0.2534754 ## quasiSE quasiVar ## Mult(Exp(.), class:vote).election4 0.10934798 0.011956981 ## Mult(Exp(.), class:vote).election3 0.09475938 0.008979340 ## Mult(Exp(.), class:vote).election2 0.07395886 0.005469913 ## Mult(Exp(.), class:vote).election1 0.22854401 0.052232363 ## Re-fit model with first multiplier set to zero doubleUnidiffConstrained <- update(doubleUnidiff, constrain = ofInterest(doubleUnidiff)[1]) ## Examine the multipliers of the class-vote log odds ratios coef(doubleUnidiffConstrained)[ofInterest(doubleUnidiff)] ## ...as using 'getContrasts' (to 4 d.p.).
### Analysis of a 4-way contingency table set.seed(1) print(cautres) ## Fit a "double UNIDIFF" model with the religion-vote and class-vote ## interactions both modulated by nonnegative election-specific ## multipliers. doubleUnidiff <- gnm(Freq ~ election:vote + election:class:religion + Mult(Exp(election), religion:vote) + Mult(Exp(election), class:vote), family = poisson, data = cautres) ## Examine the multipliers of the class-vote log odds ratios ofInterest(doubleUnidiff) <- pickCoef(doubleUnidiff, "class:vote[).]") coef(doubleUnidiff) ## Coefficients of interest: ## Mult(Exp(.), class:vote).election1 ## -0.38357138 ## Mult(Exp(.), class:vote).election2 ## 0.29816599 ## Mult(Exp(.), class:vote).election3 ## 0.06580307 ## Mult(Exp(.), class:vote).election4 ## -0.02174104 ## Re-parameterize by setting first multiplier to zero getContrasts(doubleUnidiff, ofInterest(doubleUnidiff)) ## estimate SE ## Mult(Exp(.), class:vote).election1 0.0000000 0.0000000 ## Mult(Exp(.), class:vote).election2 0.6817374 0.2401644 ## Mult(Exp(.), class:vote).election3 0.4493745 0.2473521 ## Mult(Exp(.), class:vote).election4 0.3618301 0.2534754 ## quasiSE quasiVar ## Mult(Exp(.), class:vote).election1 0.22854401 0.052232363 ## Mult(Exp(.), class:vote).election2 0.07395886 0.005469913 ## Mult(Exp(.), class:vote).election3 0.09475938 0.008979340 ## Mult(Exp(.), class:vote).election4 0.10934798 0.011956981 ## Same thing but with last multiplier as reference category: getContrasts(doubleUnidiff, rev(ofInterest(doubleUnidiff))) ## estimate SE ## Mult(Exp(.), class:vote).election4 0.00000000 0.0000000 ## Mult(Exp(.), class:vote).election3 0.08754436 0.1446833 ## Mult(Exp(.), class:vote).election2 0.31990727 0.1320022 ## Mult(Exp(.), class:vote).election1 -0.36183013 0.2534754 ## quasiSE quasiVar ## Mult(Exp(.), class:vote).election4 0.10934798 0.011956981 ## Mult(Exp(.), class:vote).election3 0.09475938 0.008979340 ## Mult(Exp(.), class:vote).election2 0.07395886 0.005469913 ## Mult(Exp(.), class:vote).election1 0.22854401 0.052232363 ## Re-fit model with first multiplier set to zero doubleUnidiffConstrained <- update(doubleUnidiff, constrain = ofInterest(doubleUnidiff)[1]) ## Examine the multipliers of the class-vote log odds ratios coef(doubleUnidiffConstrained)[ofInterest(doubleUnidiff)] ## ...as using 'getContrasts' (to 4 d.p.).
The voting record of every representative in the 2001 House, on 20 roll calls selected by Americans for Democratic Action. Each row is the record of one representative; the first column records the representative's registered party allegiance.
House2001
House2001
A data frame with 439 observations on the following 21 variables.
party
a factor with levels D
I
N
R
HR333.BankruptcyOverhaul.Yes
a numeric vector
SJRes6.ErgonomicsRuleDisapproval.No
a numeric vector
HR3.IncomeTaxReduction.No
a numeric vector
HR6.MarriageTaxReduction.Yes
a numeric vector
HR8.EstateTaxRelief.Yes
a numeric vector
HR503.FetalProtection.No
a numeric vector
HR1.SchoolVouchers.No
a numeric vector
HR1836.TaxCutReconciliationBill.No
a numeric vector
HR2356.CampaignFinanceReform.No
a numeric vector
HJRes36.FlagDesecration.No
a numeric vector
HR7.FaithBasedInitiative.Yes
a numeric vector
HJRes50.ChinaNormalizedTradeRelations.Yes
a numeric vector
HR4.ANWRDrillingBan.Yes
a numeric vector
HR2563.PatientsRightsHMOLiability.No
a numeric vector
HR2563.PatientsBillOfRights.No
a numeric vector
HR2944.DomesticPartnerBenefits.No
a numeric vector
HR2586.USMilitaryPersonnelOverseasAbortions.Yes
a numeric vector
HR2975.AntiTerrorismAuthority.No
a numeric vector
HR3090.EconomicStimulus.No
a numeric vector
HR3000.TradePromotionAuthorityFastTrack.No
a numeric vector
Coding of the votes is as described in ADA (2002).
Originally printed in ADA (2002). Kindly supplied in electronic format by Jan deLeeuw, who used the data to illustrate methods developed in deLeeuw (2006).
Americans for Democratic Action, ADA (2002). 2001 voting record: Shattered promise of liberal progress. ADA Today 57(1), 1–17.
deLeeuw, J (2006). Principal component analysis of binary data by iterated singular value decomposition. Computational Statistics and Data Analysis 50, 21–39.
## Not run: ## This example takes some time to run! summary(House2001) ## Put the votes in a matrix, and discard members with too many NAs etc: House2001m <- as.matrix(House2001[-1]) informative <- apply(House2001m, 1, function(row){ valid <- !is.na(row) validSum <- if (any(valid)) sum(row[valid]) else 0 nValid <- sum(valid) uninformative <- (validSum == nValid) || (validSum == 0) || (nValid < 10) !uninformative}) House2001m <- House2001m[informative, ] ## Make a vector of colours, blue for Republican and red for Democrat: parties <- House2001$party[informative] partyColors <- rep("black", length(parties)) partyColors <- ifelse(parties == "D", "red", partyColors) partyColors <- ifelse(parties == "R", "blue", partyColors) ## Expand the data for statistical modelling: House2001v <- as.vector(House2001m) House2001f <- data.frame(member = rownames(House2001m), party = parties, rollCall = factor(rep((1:20), rep(nrow(House2001m), 20))), vote = House2001v) ## Now fit an "empty" model, in which all members vote identically: baseModel <- glm(vote ~ -1 + rollCall, family = binomial, data = House2001f) ## From this, get starting values for a one-dimensional multiplicative term: Start <- residSVD(baseModel, rollCall, member) ## ## Now fit the logistic model with one multiplicative term. ## For the response variable, instead of vote=0,1 we use 0.03 and 0.97, ## corresponding approximately to a bias-reducing adjustment of p/(2n), ## where p is the number of parameters and n the number of observations. ## voteAdj <- 0.5 + 0.94*(House2001f$vote - 0.5) House2001model1 <- gnm(voteAdj ~ Mult(rollCall, member), eliminate = rollCall, family = binomial, data = House2001f, na.action = na.exclude, trace = TRUE, tolerance = 1e-03, start = -Start) ## Deviance is 2234.847, df = 5574 ## ## Plot the members' positions as estimated in the model: ## memberParameters <- pickCoef(House2001model1, "member") plot(coef(House2001model1)[memberParameters], col = partyColors, xlab = "Alphabetical index (Abercrombie 1 to Young 301)", ylab = "Member's relative position, one-dimensional model") ## Can do the same thing with two dimensions, but gnm takes around 40 ## slow iterations to converge (there are more than 600 parameters): Start2 <- residSVD(baseModel, rollCall, member, d = 2) House2001model2 <- gnm( voteAdj ~ instances(Mult(rollCall - 1, member - 1), 2), eliminate = rollCall, family = binomial, data = House2001f, na.action = na.exclude, trace = TRUE, tolerance = 1e-03, start = Start2, lsMethod = "qr") ## Deviance is 1545.166, df = 5257 ## memberParameters1 <- pickCoef(House2001model2, "1).member") memberParameters2 <- pickCoef(House2001model2, "2).member") plot(coef(House2001model2)[memberParameters1], coef(House2001model2)[memberParameters2], col = partyColors, xlab = "Dimension 1", ylab = "Dimension 2", main = "House2001 data: Member positions, 2-dimensional model") ## ## The second dimension is mainly due to rollCall 12, which does not ## correlate well with the rest -- look at the coefficients of ## House2001model1, or at the 12th row of cormat <- cor(na.omit(House2001m)) ## End(Not run)
## Not run: ## This example takes some time to run! summary(House2001) ## Put the votes in a matrix, and discard members with too many NAs etc: House2001m <- as.matrix(House2001[-1]) informative <- apply(House2001m, 1, function(row){ valid <- !is.na(row) validSum <- if (any(valid)) sum(row[valid]) else 0 nValid <- sum(valid) uninformative <- (validSum == nValid) || (validSum == 0) || (nValid < 10) !uninformative}) House2001m <- House2001m[informative, ] ## Make a vector of colours, blue for Republican and red for Democrat: parties <- House2001$party[informative] partyColors <- rep("black", length(parties)) partyColors <- ifelse(parties == "D", "red", partyColors) partyColors <- ifelse(parties == "R", "blue", partyColors) ## Expand the data for statistical modelling: House2001v <- as.vector(House2001m) House2001f <- data.frame(member = rownames(House2001m), party = parties, rollCall = factor(rep((1:20), rep(nrow(House2001m), 20))), vote = House2001v) ## Now fit an "empty" model, in which all members vote identically: baseModel <- glm(vote ~ -1 + rollCall, family = binomial, data = House2001f) ## From this, get starting values for a one-dimensional multiplicative term: Start <- residSVD(baseModel, rollCall, member) ## ## Now fit the logistic model with one multiplicative term. ## For the response variable, instead of vote=0,1 we use 0.03 and 0.97, ## corresponding approximately to a bias-reducing adjustment of p/(2n), ## where p is the number of parameters and n the number of observations. ## voteAdj <- 0.5 + 0.94*(House2001f$vote - 0.5) House2001model1 <- gnm(voteAdj ~ Mult(rollCall, member), eliminate = rollCall, family = binomial, data = House2001f, na.action = na.exclude, trace = TRUE, tolerance = 1e-03, start = -Start) ## Deviance is 2234.847, df = 5574 ## ## Plot the members' positions as estimated in the model: ## memberParameters <- pickCoef(House2001model1, "member") plot(coef(House2001model1)[memberParameters], col = partyColors, xlab = "Alphabetical index (Abercrombie 1 to Young 301)", ylab = "Member's relative position, one-dimensional model") ## Can do the same thing with two dimensions, but gnm takes around 40 ## slow iterations to converge (there are more than 600 parameters): Start2 <- residSVD(baseModel, rollCall, member, d = 2) House2001model2 <- gnm( voteAdj ~ instances(Mult(rollCall - 1, member - 1), 2), eliminate = rollCall, family = binomial, data = House2001f, na.action = na.exclude, trace = TRUE, tolerance = 1e-03, start = Start2, lsMethod = "qr") ## Deviance is 1545.166, df = 5257 ## memberParameters1 <- pickCoef(House2001model2, "1).member") memberParameters2 <- pickCoef(House2001model2, "2).member") plot(coef(House2001model2)[memberParameters1], coef(House2001model2)[memberParameters2], col = partyColors, xlab = "Dimension 1", ylab = "Dimension 2", main = "House2001 data: Member positions, 2-dimensional model") ## ## The second dimension is mainly due to rollCall 12, which does not ## correlate well with the rest -- look at the coefficients of ## House2001model1, or at the 12th row of cormat <- cor(na.omit(House2001m)) ## End(Not run)
A symbolic wrapper, for use in the formula argument to
gnm
, to specify multiple instances of a term specified
by a function with an inst
argument.
instances(term, instances = 1)
instances(term, instances = 1)
term |
a call to a function with an inst argument, which specifies some term. |
instances |
the desired number of instances of the term. |
A deparsed expression representing the summation of term
specified with inst = 1
, inst = 2
, ..., inst =
instances
, which is used to create an expanded formula.
Heather Turner
gnm
, formula
,
nonlin.function
, Mult
,
MultHomog
## Not run: ## (this example can take quite a while to run) ## ## Fitting two instances of a multiplicative interaction (i.e. a ## two-component interaction) yield.scaled <- wheat$yield * sqrt(3/1000) treatment <- factor(paste(wheat$tillage, wheat$summerCrop, wheat$manure, wheat$N, sep = "")) bilinear2 <- gnm(yield.scaled ~ year + treatment + instances(Mult(year, treatment), 2), family = gaussian, data = wheat) ## End(Not run)
## Not run: ## (this example can take quite a while to run) ## ## Fitting two instances of a multiplicative interaction (i.e. a ## two-component interaction) yield.scaled <- wheat$yield * sqrt(3/1000) treatment <- factor(paste(wheat$tillage, wheat$summerCrop, wheat$manure, wheat$N, sep = "")) bilinear2 <- gnm(yield.scaled ~ year + treatment + instances(Mult(year, treatment), 2), family = gaussian, data = wheat) ## End(Not run)
A function of class "nonlin"
to specify the reciprocal
of a predictor in the formula argument to gnm
.
Inv(expression, inst = NULL)
Inv(expression, inst = NULL)
expression |
a symbolic expression representing the (possibly nonlinear) predictor. |
inst |
(optional) an integer specifying the instance number of the term. |
The expression
argument is interpreted as the right hand side
of a formula in an object of class "formula"
, except that an
intercept term is not added by default. Any function of class
"nonlin"
may be used in addition to the usual operators and
functions.
A list with the components required of a "nonlin"
function:
predictors |
the |
term |
a function to create a deparsed mathematical expression of the term, given a label for the predictor. |
call |
the call to use as a prefix for parameter labels. |
Heather Turner
## One way to fit the logistic function without conditional ## linearity as in ?nls library(gnm) set.seed(1) DNase1 <- subset(DNase, Run == 1) test <- gnm(density ~ -1 + Mult(1, Inv(Const(1) + Exp(Mult(1 + offset(-log(conc)), Inv(1))))), start = c(NA, 0, 1), data = DNase1, trace = TRUE) coef(test)
## One way to fit the logistic function without conditional ## linearity as in ?nls library(gnm) set.seed(1) DNase1 <- subset(DNase, Run == 1) test <- gnm(density ~ -1 + Mult(1, Inv(Const(1) + Exp(Mult(1 + offset(-log(conc)), Inv(1))))), start = c(NA, 0, 1), data = DNase1, trace = TRUE) coef(test)
Computes the mean working residuals from a model fitted using Iterative Weighted Least Squares for each level of a factor or interaction of factors.
meanResiduals(object, by, standardized=TRUE, as.table=TRUE, ...)
meanResiduals(object, by, standardized=TRUE, as.table=TRUE, ...)
object |
model object for which |
by |
either a formula specifying a factor or interaction of factors (recommended), or a list of factors (the elements of which must correspond exactly to observations in the model frame). When a list of factors is specified, their interaction is used to specify the grouping factor. |
standardized |
logical: if |
as.table |
logical: logical: if |
... |
currently ignored |
For level of the grouping factor
the mean working
residual is defined as
where is the
'th residual for level
,
is the corresponding working weight and
is the number of observations for level
. The denominator gives
the weight corresponding to mean residual.
For non-aggregated residuals, i.e. when the factor has one level per observation, the residuals are the same as Pearson residuals.
An object of class "meanResiduals"
, for which print
and summary
methods are provided. A "meanResiduals"
object is a list containing the following elements:
call |
the call used to create the model object from which the mean residuals are derived. |
by |
a label for the grouping factor. |
residuals |
the mean residuals. |
df |
the degrees of freedom associated with the mean residuals. |
standardized |
the |
weights |
the weights corresponding to the mean residuals. |
Heather Turner
## Fit a conditional independence model, leaving out ## the uninformative subtable for dest == 7: CImodel <- gnm(Freq ~ educ*orig + educ*dest, family = poisson, data = yaish, subset = (dest != 7)) ## compute mean residuals over origin and destination meanRes <- meanResiduals(CImodel, ~ orig:dest) meanRes summary(meanRes) ## Not run: ## requires vcdExtra package ## display mean residuals for origin and destination library(vcdExtra) mosaic(CImodel, ~orig+dest) ## End(Not run) ## non-aggregated residuals res1 <- meanResiduals(CImodel, ~ educ:orig:dest) res2 <- residuals(CImodel, type = "pearson") all.equal(as.numeric(res1), as.numeric(res2))
## Fit a conditional independence model, leaving out ## the uninformative subtable for dest == 7: CImodel <- gnm(Freq ~ educ*orig + educ*dest, family = poisson, data = yaish, subset = (dest != 7)) ## compute mean residuals over origin and destination meanRes <- meanResiduals(CImodel, ~ orig:dest) meanRes summary(meanRes) ## Not run: ## requires vcdExtra package ## display mean residuals for origin and destination library(vcdExtra) mosaic(CImodel, ~orig+dest) ## End(Not run) ## non-aggregated residuals res1 <- meanResiduals(CImodel, ~ educ:orig:dest) res2 <- residuals(CImodel, type = "pearson") all.equal(as.numeric(res1), as.numeric(res2))
A 2-way contingency table from a sample of residents of Manhattan.
Classifying variables are child's mental impairment (MHS
) and
parents' socioeconomic status (SES
).
mentalHealth
mentalHealth
A data frame with 24 observations on the following 3 variables.
count
a numeric vector
SES
an ordered factor with levels A
< B
< C
< D
< E
< F
MHS
an ordered factor with levels well
< mild
< moderate
< impaired
From Agresti (2002, p381); originally in Srole et al. (1978, p289).
Agresti, A. (2002). Categorical Data Analysis (2nd edn). New York: Wiley.
Srole, L, Langner, T. S., Michael, S. T., Opler, M. K. and Rennie, T. A. C. (1978), Mental Health in the Metropolis: The Midtown Manhattan Study. New York: NYU Press.
set.seed(1) ## Goodman Row-Column association model fits well (deviance 3.57, df 8) mentalHealth$MHS <- C(mentalHealth$MHS, treatment) mentalHealth$SES <- C(mentalHealth$SES, treatment) RC1model <- gnm(count ~ SES + MHS + Mult(SES, MHS), family = poisson, data = mentalHealth) ## Row scores and column scores are both unnormalized in this ## parameterization of the model ## The scores can be normalized as in Agresti's eqn (9.15): rowProbs <- with(mentalHealth, tapply(count, SES, sum) / sum(count)) colProbs <- with(mentalHealth, tapply(count, MHS, sum) / sum(count)) mu <- getContrasts(RC1model, pickCoef(RC1model, "[.]SES"), ref = rowProbs, scaleRef = rowProbs, scaleWeights = rowProbs) nu <- getContrasts(RC1model, pickCoef(RC1model, "[.]MHS"), ref = colProbs, scaleRef = colProbs, scaleWeights = colProbs) all.equal(sum(mu$qv[,1] * rowProbs), 0) all.equal(sum(nu$qv[,1] * colProbs), 0) all.equal(sum(mu$qv[,1]^2 * rowProbs), 1) all.equal(sum(nu$qv[,1]^2 * colProbs), 1)
set.seed(1) ## Goodman Row-Column association model fits well (deviance 3.57, df 8) mentalHealth$MHS <- C(mentalHealth$MHS, treatment) mentalHealth$SES <- C(mentalHealth$SES, treatment) RC1model <- gnm(count ~ SES + MHS + Mult(SES, MHS), family = poisson, data = mentalHealth) ## Row scores and column scores are both unnormalized in this ## parameterization of the model ## The scores can be normalized as in Agresti's eqn (9.15): rowProbs <- with(mentalHealth, tapply(count, SES, sum) / sum(count)) colProbs <- with(mentalHealth, tapply(count, MHS, sum) / sum(count)) mu <- getContrasts(RC1model, pickCoef(RC1model, "[.]SES"), ref = rowProbs, scaleRef = rowProbs, scaleWeights = rowProbs) nu <- getContrasts(RC1model, pickCoef(RC1model, "[.]MHS"), ref = colProbs, scaleRef = colProbs, scaleWeights = colProbs) all.equal(sum(mu$qv[,1] * rowProbs), 0) all.equal(sum(nu$qv[,1] * colProbs), 0) all.equal(sum(mu$qv[,1]^2 * rowProbs), 1) all.equal(sum(nu$qv[,1]^2 * colProbs), 1)
This method extracts or evaluates a local design matrix for a generalized nonlinear model
## S3 method for class 'gnm' model.matrix(object, coef = NULL, ...)
## S3 method for class 'gnm' model.matrix(object, coef = NULL, ...)
object |
an object of class |
coef |
if specified, the vector of (non-eliminated) coefficients at which the local design matrix is evaluated. |
... |
further arguments. |
If coef = NULL
, the local design matrix with columns
corresponding to the non-eliminated parameters evaluated at
coef(object)
(extracted from object
if possible).
Otherwise, the local design matrix evaluated at coef
.
Heather Turner
example(mentalHealth) model.matrix(RC1model) model.matrix(RC1model, coef = seq(coef(RC1model)))
example(mentalHealth) model.matrix(RC1model) model.matrix(RC1model, coef = seq(coef(RC1model)))
Computes the Moore-Penrose generalized inverse.
MPinv(mat, tolerance = 100*.Machine$double.eps, rank = NULL, method = "svd")
MPinv(mat, tolerance = 100*.Machine$double.eps, rank = NULL, method = "svd")
mat |
a real matrix. |
tolerance |
A positive scalar which determines the tolerance for detecting zeroes among the singular values. |
rank |
Either |
method |
Character, one of |
Real-valuedness is not checked, neither is symmetry when method
= "chol"
.
A matrix, with an additional attribute named "rank"
containing
the numerically determined rank of the matrix.
David Firth and Heather Turner
Harville, D. A. (1997). Matrix Algebra from a Statistician's Perspective. New York: Springer.
Courrieu, P. (2005). Fast computation of Moore-Penrose inverse matrices. Neural Information Processing 8, 25–29
A <- matrix(c(1, 1, 0, 1, 1, 0, 2, 3, 4), 3, 3) B <- MPinv(A) A %*% B %*% A - A # essentially zero B %*% A %*% B - B # essentially zero attr(B, "rank") # here 2 ## demonstration that "svd" and "chol" deliver essentially the same ## results for symmetric matrices: A <- crossprod(A) MPinv(A) - MPinv(A, method = "chol") ## (essentially zero)
A <- matrix(c(1, 1, 0, 1, 1, 0, 2, 3, 4), 3, 3) B <- MPinv(A) A %*% B %*% A - A # essentially zero B %*% A %*% B - B # essentially zero attr(B, "rank") # here 2 ## demonstration that "svd" and "chol" deliver essentially the same ## results for symmetric matrices: A <- crossprod(A) MPinv(A) - MPinv(A, method = "chol") ## (essentially zero)
A function of class "nonlin"
to specify a multiplicative
interaction with homogeneous effects in the formula argument to
gnm
.
MultHomog(..., inst = NULL)
MultHomog(..., inst = NULL)
... |
a comma-separated list of two or more factors. |
inst |
(optional) an integer specifying the instance number of the term. |
MultHomog
specifies instances of a multiplicative interaction
in which the constituent multipliers are the effects of two or more
factors and the effects of these factors are constrained to be equal
when the factor levels are equal. Thus the interaction effect would
be
for an observation with level of the first factor, level
of the second factor and so on, where
is the effect for level
of the homogeneous multiplicative
factor.
If the factors passed to MultHomog
do not have exactly the same
levels, the set of levels is taken to be the union of the factor
levels, sorted into increasing order.
A list with the anticipated components of a "nonlin"
function:
predictors |
the factors passed to |
common |
an index to specify that common effects are to be estimated across the factors |
term |
a function to create a deparsed mathematical expression of the term, given labels for the predictors. |
call |
the call to use as a prefix for parameter labels. |
Currently, MultHomog
can only be used to specify a
one-dimensional interaction. See examples for a workaround to specify
interactions with more than one dimension.
Heather Turner
Goodman, L. A. (1979) Simple Models for the Analysis of Association in Cross-Classifications having Ordered Categories. J. Am. Stat. Assoc., 74(367), 537-552.
gnm
, formula
, instances
,
nonlin.function
, Mult
set.seed(1) ### Fit an association model with homogeneous row-column effects rc1 <- gnm(Freq ~ r + c + Diag(r,c) + MultHomog(r, c), family = poisson, data = friend) rc1 ## Not run: ### Extend to two-component interaction rc2 <- update(rc1, . ~ . + MultHomog(r, c, inst = 2), etastart = rc1$predictors) rc2 ## End(Not run) ### For factors with a large number of levels, save time by ### setting diagonal elements to NA rather than fitting exactly; ### skipping start-up iterations may also save time dat <- as.data.frame(friend) id <- with(dat, r == c) dat[id,] <- NA rc2 <- gnm(Freq ~ r + c + instances(MultHomog(r, c), 2), family = poisson, data = dat, iterStart = 0)
set.seed(1) ### Fit an association model with homogeneous row-column effects rc1 <- gnm(Freq ~ r + c + Diag(r,c) + MultHomog(r, c), family = poisson, data = friend) rc1 ## Not run: ### Extend to two-component interaction rc2 <- update(rc1, . ~ . + MultHomog(r, c, inst = 2), etastart = rc1$predictors) rc2 ## End(Not run) ### For factors with a large number of levels, save time by ### setting diagonal elements to NA rather than fitting exactly; ### skipping start-up iterations may also save time dat <- as.data.frame(friend) id <- with(dat, r == c) dat[id,] <- NA rc2 <- gnm(Freq ~ r + c + instances(MultHomog(r, c), 2), family = poisson, data = dat, iterStart = 0)
A function of class "nonlin"
to specify a multiplicative interaction in
the formula argument to gnm
.
Mult(..., inst = NULL)
Mult(..., inst = NULL)
... |
a comma-separated list of two or more symbolic expressions representing the constituent multipliers in the interaction. |
inst |
a positive integer specifying the instance number of the term. |
Mult
specifies instances of a multiplicative interaction,
i.e. a product of the form
where the constituent multipliers are linear
or nonlinear predictors.
Models for the constituent multipliers are specified symbolically
as unspecified arguments to Mult
. These symbolic expressions
are interpreted in the same way as the right hand side of a formula in
an object of class "formula"
, except that an intercept term
is not added by default. Offsets can be added to constituent
multipliers, using offset
.
The family of multiplicative interaction models include row-column association models for contingency tables (e.g., Agresti, 2002, Sec 9.6), log-multiplicative or UNIDIFF models (Erikson and Goldthorpe, 1992; Xie, 1992), and GAMMI models (van Eeuwijk, 1995).
A list with the required components of a "nonlin"
function:
predictors |
the expressions passed to |
term |
a function to create a deparsed mathematical expression of the term, given labels for the predictors. |
call |
the call to use as a prefix for parameter labels. |
Heather Turner
Agresti, A (2002). Categorical Data Analysis (2nd ed.) New York: Wiley.
Erikson, R and Goldthorpe, J H (1992). The Constant Flux. Oxford: Clarendon Press.
van Eeuwijk, F A (1995). Multiplicative interaction in generalized linear models. Biometrics 51, 1017-1032.
Vargas, M, Crossa, J, van Eeuwijk, F, Sayre, K D and Reynolds, M P (2001). Interpreting treatment by environment interaction in agronomy trials. Agronomy Journal 93, 949–960.
Xie, Y (1992). The log-multiplicative layer effect model for comparing mobility tables. American Sociological Review 57, 380-395.
gnm
, formula
, instances
,
nonlin.function
, MultHomog
set.seed(1) ## Using 'Mult' with 'Exp' to constrain the first constituent multiplier ## to be non-negative ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## Not run: ## (this example can take quite a while to run) ## ## Fitting two instances of a multiplicative interaction (i.e. a ## two-component interaction)) yield.scaled <- wheat$yield * sqrt(3/1000) treatment <- factor(paste(wheat$tillage, wheat$summerCrop, wheat$manure, wheat$N, sep = "")) bilinear2 <- gnm(yield.scaled ~ year + treatment + instances(Mult(year, treatment), 2), family = gaussian, data = wheat) formula(bilinear2) ## yield.scaled ~ year + treatment + Mult(year, treatment, inst = 1) + ## Mult(year, treatment, inst = 2) ## End(Not run)
set.seed(1) ## Using 'Mult' with 'Exp' to constrain the first constituent multiplier ## to be non-negative ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## Not run: ## (this example can take quite a while to run) ## ## Fitting two instances of a multiplicative interaction (i.e. a ## two-component interaction)) yield.scaled <- wheat$yield * sqrt(3/1000) treatment <- factor(paste(wheat$tillage, wheat$summerCrop, wheat$manure, wheat$N, sep = "")) bilinear2 <- gnm(yield.scaled ~ year + treatment + instances(Mult(year, treatment), 2), family = gaussian, data = wheat) formula(bilinear2) ## yield.scaled ~ year + treatment + Mult(year, treatment, inst = 1) + ## Mult(year, treatment, inst = 2) ## End(Not run)
Nonlinear terms maybe be specified in the formula argument to gnm by
a call to a function of class "nonlin"
. A "nonlin"
function takes a list of arguments and returns a list of arguments for
the internal nonlinTerms
function.
... |
arguments required to define the term, e.g. symbolic representations of predictors in the term. |
inst |
(optional) an integer specifying the instance number of
the term - for compatibility with |
The function should return a list with the following components:
predictors |
a list of symbolic expressions or formulae with no left hand side which represent (possibly nonlinear) predictors that form part of the term. Intercepts will be added by default to predictors specified by formulae. If predictors are named, these names will be used as a prefix for parameter labels or the parameter label itself in the single parameter case (in either case, prefixed by the call if supplied.) Predictors that may include an intercept should always be named or matched to a call. |
variables |
an optional list of expressions representing variables in the term. |
term |
a function which takes the arguments |
common |
an optional numeric index of |
call |
an optional call to be used as a prefix for parameter labels, specified as an R expression. |
match |
(if |
start |
an optional function which takes a named vector of parameters corresponding to the predictors and returns a vector of starting values for those parameters. This function is ignored if the term is nested within another nonlinear term. |
Heather Turner
Const
to specify a constant,
Dref
to specify a diagonal reference term,
Exp
to specify the exponential of a predictor,
Inv
to specify the reciprocal of a predictor,
Mult
to specify a multiplicative interaction,
MultHomog
to specify a homogeneous multiplicative
interaction,
### Equivalent of weighted.MM function in ?nls weighted.MM <- function(resp, conc){ list(predictors = list(Vm = substitute(conc), K = 1), variables = list(substitute(resp), substitute(conc)), term = function(predictors, variables) { pred <- paste("(", predictors[1], "/(", predictors[2], " + ", variables[2], "))", sep = "") pred <- paste("(", variables[1], " - ", pred, ")/sqrt(", pred, ")", sep = "") }) } class(weighted.MM) <- "nonlin" ## use to fitted weighted Michaelis-Menten model Treated <- Puromycin[Puromycin$state == "treated", ] Pur.wt.2 <- gnm( ~ -1 + weighted.MM(rate, conc), data = Treated, start = c(Vm = 200, K = 0.1), verbose = FALSE) Pur.wt.2 ## ## Call: ## gnm(formula = ~-1 + weighted.MM(rate, conc), data = Treated, ## start = c(Vm = 200, K = 0.1), verbose = FALSE) ## ## Coefficients: ## Vm K ## 206.83477 0.05461 ## ## Deviance: 14.59690 ## Pearson chi-squared: 14.59690 ## Residual df: 10 ### The definition of MultHomog MultHomog <- function(..., inst = NULL){ dots <- match.call(expand.dots = FALSE)[["..."]] list(predictors = dots, common = rep(1, length(dots)), term = function(predictors, ...) { paste("(", paste(predictors, collapse = ")*("), ")", sep = "") }, call = as.expression(match.call())) } class(MultHomog) <- "nonlin" ## use to fit homogeneous multiplicative interaction set.seed(1) RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), ofInterest = "MultHomog", family = poisson, data = occupationalStatus, verbose = FALSE) RChomog ## ## Call: ## ## gnm(formula = Freq ~ origin + destination + Diag(origin, destination) + ## MultHomog(origin, destination), ofInterest = "MultHomog", family = poisson, ## data = occupationalStatus, verbose = FALSE) ## ## Coefficients of interest: ## MultHomog(origin, destination)1 ## -1.50089 ## MultHomog(origin, destination)2 ## -1.28260 ## MultHomog(origin, destination)3 ## -0.68443 ## MultHomog(origin, destination)4 ## -0.10055 ## MultHomog(origin, destination)5 ## -0.08338 ## MultHomog(origin, destination)6 ## 0.42838 ## MultHomog(origin, destination)7 ## 0.84452 ## MultHomog(., .).`origin|destination`8 ## 1.08809 ## ## Deviance: 32.56098 ## Pearson chi-squared: 31.20716 ## Residual df: 34 ## ## the definition of Exp Exp <- function(expression, inst = NULL){ list(predictors = list(substitute(expression)), term = function(predictors, ...) { paste("exp(", predictors, ")", sep = "") }, call = as.expression(match.call()), match = 1) } class(Exp) <- "nonlin" ## use to fit exponentional model x <- 1:100 y <- exp(- x / 10) set.seed(4) exp1 <- gnm(y ~ Exp(1 + x), verbose = FALSE) exp1 ## ## Call: ## gnm(formula = y ~ Exp(1 + x), verbose = FALSE) ## ## Coefficients: ## (Intercept) Exp(. + x).(Intercept) ## 1.549e-11 -7.934e-11 ## Exp(1 + .).x ## -1.000e-01 ## ## Deviance: 9.342418e-20 ## Pearson chi-squared: 9.342418e-20 ## Residual df: 97
### Equivalent of weighted.MM function in ?nls weighted.MM <- function(resp, conc){ list(predictors = list(Vm = substitute(conc), K = 1), variables = list(substitute(resp), substitute(conc)), term = function(predictors, variables) { pred <- paste("(", predictors[1], "/(", predictors[2], " + ", variables[2], "))", sep = "") pred <- paste("(", variables[1], " - ", pred, ")/sqrt(", pred, ")", sep = "") }) } class(weighted.MM) <- "nonlin" ## use to fitted weighted Michaelis-Menten model Treated <- Puromycin[Puromycin$state == "treated", ] Pur.wt.2 <- gnm( ~ -1 + weighted.MM(rate, conc), data = Treated, start = c(Vm = 200, K = 0.1), verbose = FALSE) Pur.wt.2 ## ## Call: ## gnm(formula = ~-1 + weighted.MM(rate, conc), data = Treated, ## start = c(Vm = 200, K = 0.1), verbose = FALSE) ## ## Coefficients: ## Vm K ## 206.83477 0.05461 ## ## Deviance: 14.59690 ## Pearson chi-squared: 14.59690 ## Residual df: 10 ### The definition of MultHomog MultHomog <- function(..., inst = NULL){ dots <- match.call(expand.dots = FALSE)[["..."]] list(predictors = dots, common = rep(1, length(dots)), term = function(predictors, ...) { paste("(", paste(predictors, collapse = ")*("), ")", sep = "") }, call = as.expression(match.call())) } class(MultHomog) <- "nonlin" ## use to fit homogeneous multiplicative interaction set.seed(1) RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), ofInterest = "MultHomog", family = poisson, data = occupationalStatus, verbose = FALSE) RChomog ## ## Call: ## ## gnm(formula = Freq ~ origin + destination + Diag(origin, destination) + ## MultHomog(origin, destination), ofInterest = "MultHomog", family = poisson, ## data = occupationalStatus, verbose = FALSE) ## ## Coefficients of interest: ## MultHomog(origin, destination)1 ## -1.50089 ## MultHomog(origin, destination)2 ## -1.28260 ## MultHomog(origin, destination)3 ## -0.68443 ## MultHomog(origin, destination)4 ## -0.10055 ## MultHomog(origin, destination)5 ## -0.08338 ## MultHomog(origin, destination)6 ## 0.42838 ## MultHomog(origin, destination)7 ## 0.84452 ## MultHomog(., .).`origin|destination`8 ## 1.08809 ## ## Deviance: 32.56098 ## Pearson chi-squared: 31.20716 ## Residual df: 34 ## ## the definition of Exp Exp <- function(expression, inst = NULL){ list(predictors = list(substitute(expression)), term = function(predictors, ...) { paste("exp(", predictors, ")", sep = "") }, call = as.expression(match.call()), match = 1) } class(Exp) <- "nonlin" ## use to fit exponentional model x <- 1:100 y <- exp(- x / 10) set.seed(4) exp1 <- gnm(y ~ Exp(1 + x), verbose = FALSE) exp1 ## ## Call: ## gnm(formula = y ~ Exp(1 + x), verbose = FALSE) ## ## Coefficients: ## (Intercept) Exp(. + x).(Intercept) ## 1.549e-11 -7.934e-11 ## Exp(1 + .).x ## -1.000e-01 ## ## Deviance: 9.342418e-20 ## Pearson chi-squared: 9.342418e-20 ## Residual df: 97
Retrieve or set the "ofInterest"
component of a "gnm"
(generalized nonlinear model) object.
ofInterest(object) ofInterest(object) <- value
ofInterest(object) ofInterest(object) <- value
object |
an object of class |
value |
a numeric vector of indices specifying the subset of
(non-eliminated) coefficients of interest, or |
The "ofInterest"
component of a "gnm"
object is a named
numeric vector of indices specifying a subset of the non-eliminated
coefficients which are of specific interest.
If the "ofInterest"
component is non-NULL, printed summaries of
the model only show the coefficients of interest. In addition
methods for "gnm"
objects which may be applied to a subset of
the parameters are by default applied to the coefficients of interest.
These functions provide a way of extracting and replacing the
"ofInterest"
component. The replacement function prints the
replacement value to show which parameters have been specified by
value
.
A named vector of indices, or NULL
.
Regular expression matching is performed using grep
with
default settings.
Heather Turner
grep
, gnm
, se.gnm
,
getContrasts
,profile.gnm
, confint.gnm
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", family = poisson, data = yaish, subset = (dest != 7)) ofInterest(unidiff) ## Get all of the contrasts with educ1 in the UNIDIFF multipliers getContrasts(unidiff, ofInterest(unidiff)) ## Get estimate and se for the contrast between educ4 and educ5 in the ## UNIDIFF multiplier mycontrast <- numeric(length(coef(unidiff))) mycontrast[ofInterest(unidiff)[4:5]] <- c(1, -1) se(unidiff, mycontrast)
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", family = poisson, data = yaish, subset = (dest != 7)) ofInterest(unidiff) ## Get all of the contrasts with educ1 in the UNIDIFF multipliers getContrasts(unidiff, ofInterest(unidiff)) ## Get estimate and se for the contrast between educ4 and educ5 in the ## UNIDIFF multiplier mycontrast <- numeric(length(coef(unidiff))) mycontrast[ofInterest(unidiff)[4:5]] <- c(1, -1) se(unidiff, mycontrast)
A function to extract non-eliminated parameters from a "gnm"
object, including parameters that were constrained.
parameters(object)
parameters(object)
object |
an object of class |
parameters
acts like coefficients
except that for
constrained parameters, the value at which the parameter was
constrained is returned instead of NA
.
A vector of parameters.
Heather Turner
RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), family = poisson, data = occupationalStatus, ofInterest = "MultHomog", constrain = "MultHomog.*1") coefficients(RChomog) parameters(RChomog)
RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), family = poisson, data = occupationalStatus, ofInterest = "MultHomog", constrain = "MultHomog.*1") coefficients(RChomog) parameters(RChomog)
Get the indices or values of a subset of non-eliminated coefficients selected via a Tk dialog or by pattern matching.
pickCoef(object, pattern = NULL, value = FALSE, ...)
pickCoef(object, pattern = NULL, value = FALSE, ...)
object |
a model object. |
pattern |
character string containing a regular expression or
(with |
value |
if |
... |
arguments to pass on to pickFrom if
|
If value = FALSE
(the default), a named vector of indices,
otherwise the values of the selected coefficients. If no coefficients
are selected the returned value will be NULL
.
Heather Turner
regexp
, grep
,
pickFrom
, ofInterest
set.seed(1) ### Extract indices for use with ofInterest ## fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## set coefficients in first constituent multiplier as 'ofInterest' ## using regular expression ofInterest(unidiff) <- pickCoef(unidiff, "[.]educ") ## summarise model, only showing coefficients of interest summary(unidiff) ## get contrasts of these coefficients getContrasts(unidiff, ofInterest(unidiff)) ### Extract coefficients to use as starting values ## fit diagonal reference model with constant weights set.seed(1) ## reconstruct counts voting Labour/non-Labour count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) ## create factors indicating movement in and out of salariat (class 1) upward <- with(voting, origin != 1 & destination == 1) downward <- with(voting, origin == 1 & destination != 1) ## extract diagonal effects from first model to use as starting values diagCoef <- pickCoef(classMobility, "Dref(., .)", fixed = TRUE, value = TRUE) ## fit separate weights for the "socially mobile" groups ## -- there are now 3 parameters for each weight socialMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward + upward), family = binomial, data = voting, start = c(rep(NA, 6), diagCoef))
set.seed(1) ### Extract indices for use with ofInterest ## fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## set coefficients in first constituent multiplier as 'ofInterest' ## using regular expression ofInterest(unidiff) <- pickCoef(unidiff, "[.]educ") ## summarise model, only showing coefficients of interest summary(unidiff) ## get contrasts of these coefficients getContrasts(unidiff, ofInterest(unidiff)) ### Extract coefficients to use as starting values ## fit diagonal reference model with constant weights set.seed(1) ## reconstruct counts voting Labour/non-Labour count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) ## create factors indicating movement in and out of salariat (class 1) upward <- with(voting, origin != 1 & destination == 1) downward <- with(voting, origin == 1 & destination != 1) ## extract diagonal effects from first model to use as starting values diagCoef <- pickCoef(classMobility, "Dref(., .)", fixed = TRUE, value = TRUE) ## fit separate weights for the "socially mobile" groups ## -- there are now 3 parameters for each weight socialMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward + upward), family = binomial, data = voting, start = c(rep(NA, 6), diagCoef))
Five plots are available: a plot of residuals against fitted values, a
Scale-Location plot of
against fitted values, a Normal Q-Q plot, a plot of Cook's distances
versus row labels, and a plot of residuals against leverages. By
default, all except the fourth are produced.
## S3 method for class 'gnm' plot(x, which = c(1:3, 5), caption = c("Residuals vs Fitted", "Normal Q-Q", "Scale-Location", "Cook's distance", "Residuals vs Leverage"), panel = if (add.smooth) panel.smooth else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, cook.levels = c(0.5, 1), add.smooth = getOption("add.smooth"), label.pos = c(4, 2), cex.caption = 1)
## S3 method for class 'gnm' plot(x, which = c(1:3, 5), caption = c("Residuals vs Fitted", "Normal Q-Q", "Scale-Location", "Cook's distance", "Residuals vs Leverage"), panel = if (add.smooth) panel.smooth else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, cook.levels = c(0.5, 1), add.smooth = getOption("add.smooth"), label.pos = c(4, 2), cex.caption = 1)
x |
a |
which |
a subset of the numbers 1:5 specifying which plots to produce (out of those listed in Description section). |
caption |
captions to appear above the plots. |
panel |
panel function. The useful alternative to |
sub.caption |
common title - above figures if there are
multiple; used as |
main |
title to each plot - in addition to the above |
ask |
logical; if |
... |
other parameters to be passed through to plotting functions. |
id.n |
number of points to be labelled in each plot starting with the most extreme. |
labels.id |
vector of labels, from which the labels for extreme
points will be chosen. |
cex.id |
magnification of point labels. |
qqline |
logical indicating if a |
cook.levels |
levels of Cook's distance at which to draw contours. |
add.smooth |
logical indicating if a smoother should be added to
most plots; see also |
label.pos |
positioning of labels, for the left half and right half of the graph respectively, for plots 1-3. |
cex.caption |
controls the size of 'caption'. |
sub.caption
- by default the function call - is shown as a subtitle
(under the x-axis title) on each plot when plots are on separate
pages, or as a subtitle in the outer margin (if any) when there
are multiple plots per page.
The "Scale-Location" plot, also called "Spread-Location" or "S-L"
plot, takes the square root of the absolute residuals in order to
diminish skewness ( is much less skewed
than
for Gaussian zero-mean
).
The S-L, the Q-Q, and the Residual-Leverage plot, use
standardized residuals which have identical variance (under the
hypothesis). They are given as where
are the diagonal
entries of the hat matrix,
influence()$hat
, see also
hat
.
The Residual-Leverage plot shows contours of equal Cook's
distance, for values of cook.levels
(by default 0.5 and 1) and
omits cases with leverage one. If the leverages are constant, as
typically in a balanced aov
situation, the plot uses factor
level combinations instead of the leverages for the x-axis.
Modification of plot.lm
by the R Core Team. Adapted
for "gnm"
objects by Heather Turner.
set.seed(1) ## Fit an association model with homogeneous row-column effects RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), family = poisson, data = occupationalStatus) ## Plot model diagnostics plot(RChomog) ## Put 4 plots on 1 page; allow room for printing model formula in outer margin: par(mfrow = c(2, 2), oma = c(0, 0, 3, 0)) title <- paste(deparse(RChomog$formula, width.cutoff = 50), collapse = "\n") plot(RChomog, sub.caption = title) ## Fit smoother curves plot(RChomog, sub.caption = title, panel = panel.smooth) plot(RChomog, sub.caption = title, panel = function(x,y) panel.smooth(x, y, span = 1))
set.seed(1) ## Fit an association model with homogeneous row-column effects RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), family = poisson, data = occupationalStatus) ## Plot model diagnostics plot(RChomog) ## Put 4 plots on 1 page; allow room for printing model formula in outer margin: par(mfrow = c(2, 2), oma = c(0, 0, 3, 0)) title <- paste(deparse(RChomog$formula, width.cutoff = 50), collapse = "\n") plot(RChomog, sub.caption = title) ## Fit smoother curves plot(RChomog, sub.caption = title, panel = panel.smooth) plot(RChomog, sub.caption = title, panel = function(x,y) panel.smooth(x, y, span = 1))
Obtains predictions and optionally estimates standard errors of those predictions from a fitted generalized nonlinear model object.
## S3 method for class 'gnm' predict(object, newdata = NULL, type = c("link", "response", "terms"), se.fit = FALSE, dispersion = NULL, terms = NULL, na.action = na.exclude, ...)
## S3 method for class 'gnm' predict(object, newdata = NULL, type = c("link", "response", "terms"), se.fit = FALSE, dispersion = NULL, terms = NULL, na.action = na.exclude, ...)
object |
a fitted object of class inheriting from |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted predictors are used. |
type |
the type of prediction required. The default is on the scale
of the predictors; the alternative The value of this argument can be abbreviated. |
se.fit |
logical switch indicating if standard errors are required. |
dispersion |
the dispersion of the fit to be assumed in computing
the standard errors. If omitted, that returned by |
terms |
with |
na.action |
function determining what should be done with missing values
in |
... |
further arguments passed to or from other methods. |
If newdata
is omitted the predictions are based on the data used
for the fit. In that case how cases with missing values in the
original fit is determined by the na.action
argument of that
fit. If na.action = na.omit
omitted cases will not appear in
the residuals, whereas if na.action = na.exclude
they will
appear (in predictions and standard errors), with residual value
NA
. See also napredict
.
If se = FALSE
, a vector or matrix of predictions. If se =
TRUE
, a list with components
fit |
predictions. |
se.fit |
estimated standard errors. |
residual.scale |
a scalar giving the square root of the dispersion used in computing the standard errors. |
Variables are first looked for in 'newdata' and then searched for in the usual way (which will include the environment of the formula used in the fit). A warning will be given if the variables found are not of the same length as those in 'newdata' if it was supplied.
Heather Turner
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S
set.seed(1) ## Fit an association model with homogeneous row-column effects RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), family = poisson, data = occupationalStatus) ## Fitted values (expected counts) predict(RChomog, type = "response", se.fit = TRUE) ## Fitted values on log scale predict(RChomog, type = "link", se.fit = TRUE)
set.seed(1) ## Fit an association model with homogeneous row-column effects RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), family = poisson, data = occupationalStatus) ## Fitted values (expected counts) predict(RChomog, type = "response", se.fit = TRUE) ## Fitted values on log scale predict(RChomog, type = "link", se.fit = TRUE)
For one or more parameters in a generalized nonlinear model, profile the deviance over a range of values about the fitted estimate.
## S3 method for class 'gnm' profile(fitted, which = ofInterest(fitted), alpha = 0.05, maxsteps = 10, stepsize = NULL, trace = FALSE, ...)
## S3 method for class 'gnm' profile(fitted, which = ofInterest(fitted), alpha = 0.05, maxsteps = 10, stepsize = NULL, trace = FALSE, ...)
fitted |
an object of class |
which |
(optional) either a numeric vector of indices or a
character vector of names, specifying the parameters over which the
deviance is to be profiled. If |
alpha |
the significance level of the z statistic, indicating the range that the profile must cover (see details). |
maxsteps |
the maximum number of steps to take either side of the fitted parameter. |
stepsize |
(optional) a numeric vector of length two, specifying
the size of steps to take when profiling down and up respectively,
or a single number specifying the step size in both directions. If
|
trace |
logical, indicating whether profiling should be traced. |
... |
further arguments. |
This is a method for the generic function profile
in the
base
package.
For a given parameter, the deviance is profiled by constraining that parameter to certain values either side of its estimate in the fitted model and refitting the model.
For each updated model, the following "z statistic" is computed
where is the constrained value of the parameter;
is the original fitted value;
is the deviance when the parameter is equal
to
, and
is the dispersion
parameter.
When the deviance is quadratic in , z will be
linear in
. Therefore departures from quadratic
behaviour can easily be identified by plotting z against
using
plot.profile.gnm
.
confint.profile.gnm
estimates confidence intervals for the
parameters by interpolating the deviance profiles and identifying the
parameter values at which z is equal to the relevant percentage points
of the normal distribution. The alpha
argument to
profile.gnm
specifies the significance level of z which must be
covered by the profile. In particular, the profiling in a given
direction will stop when maxsteps
is reached or two steps have
been taken in which
By default, the stepsize is
where is the standard error of
. Strong asymmetry is detected and
the stepsize is adjusted accordingly, to try to ensure that the range
determined by
alpha
is adequately covered. profile.gnm
will also attempt to detect if the deviance is asymptotic such that
the desired significance level cannot be reached. Each profile has an
attribute "asymptote"
, a two-length logical vector specifying
whether an asymptote has been detected in either direction.
For unidentified parameters the profile will be NA
, as such
parameters cannot be profiled.
A list of profiles, with one named component for each parameter profiled. Each profile is a data.frame: the first column, "z", contains the z statistics and the second column "par.vals" contains a matrix of parameter values, with one column for each parameter in the model.
The list has two attributes: "original.fit" containing fitted
and "summary" containing summary(fitted)
.
Modification of profile.glm
from the MASS
package. Originally D. M. Bates and W. N. Venables, ported to R by
B. D. Ripley, adapted for "gnm"
objects by Heather Turner.
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S
confint.gnm
, gnm
,
profile.glm
, ofInterest
set.seed(1) ### Example in which deviance is near quadratic count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), constrain = "delta1", family = binomial, data = voting) prof <- profile(classMobility, trace = TRUE) plot(prof) ## confint similar to MLE +/- 1.96*s.e. confint(prof, trace = TRUE) coefData <- se(classMobility) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ## Not run: ### These examples take longer to run ### Another near quadratic example RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), ofInterest = "MultHomog", constrain = "MultHomog.*1", family = poisson, data = occupationalStatus) prof <- profile(RChomog, trace = TRUE) plot(prof) ## confint similar to MLE +/- 1.96*s.e. confint(prof) coefData <- se(RChomog) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ## Another near quadratic example, with more complex constraints count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) wts <- prop.table(exp(coef(classMobility))[1:2]) classMobility <- update(classMobility, constrain = "delta1", constrainTo = log(wts[1])) sum(exp(parameters(classMobility))[1:2]) #=1 prof <- profile(classMobility, trace = TRUE) plot(prof) ## confint similar to MLE +/- 1.96*s.e. confint(prof, trace = TRUE) coefData <- se(classMobility) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ### An example showing asymptotic deviance unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", constrain = "[.]educ1", family = poisson, data = yaish, subset = (dest != 7)) prof <- profile(unidiff, trace = TRUE) plot(prof) ## clearly not quadratic for Mult1.Factor1.educ4 or Mult1.Factor1.educ5! confint(prof) ## 2.5 % 97.5 % ## Mult(Exp(.), orig:dest).educ1 NA NA ## Mult(Exp(.), orig:dest).educ2 -0.5978901 0.1022447 ## Mult(Exp(.), orig:dest).educ3 -1.4836854 -0.2362378 ## Mult(Exp(.), orig:dest).educ4 -2.5792398 -0.2953420 ## Mult(Exp(.), orig:dest).educ5 -Inf -0.7006889 coefData <- se(unidiff) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ### A far from quadratic example, also with eliminated parameters backPainLong <- expandCategorical(backPain, "pain") oneDimensional <- gnm(count ~ pain + Mult(pain, x1 + x2 + x3), eliminate = id, family = "poisson", constrain = "[.](painworse|x1)", constrainTo = c(0, 1), data = backPainLong) prof <- profile(oneDimensional, trace = TRUE) plot(prof) ## not quadratic for any non-eliminated parameter confint(prof) coefData <- se(oneDimensional) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ## End(Not run)
set.seed(1) ### Example in which deviance is near quadratic count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), constrain = "delta1", family = binomial, data = voting) prof <- profile(classMobility, trace = TRUE) plot(prof) ## confint similar to MLE +/- 1.96*s.e. confint(prof, trace = TRUE) coefData <- se(classMobility) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ## Not run: ### These examples take longer to run ### Another near quadratic example RChomog <- gnm(Freq ~ origin + destination + Diag(origin, destination) + MultHomog(origin, destination), ofInterest = "MultHomog", constrain = "MultHomog.*1", family = poisson, data = occupationalStatus) prof <- profile(RChomog, trace = TRUE) plot(prof) ## confint similar to MLE +/- 1.96*s.e. confint(prof) coefData <- se(RChomog) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ## Another near quadratic example, with more complex constraints count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) wts <- prop.table(exp(coef(classMobility))[1:2]) classMobility <- update(classMobility, constrain = "delta1", constrainTo = log(wts[1])) sum(exp(parameters(classMobility))[1:2]) #=1 prof <- profile(classMobility, trace = TRUE) plot(prof) ## confint similar to MLE +/- 1.96*s.e. confint(prof, trace = TRUE) coefData <- se(classMobility) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ### An example showing asymptotic deviance unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", constrain = "[.]educ1", family = poisson, data = yaish, subset = (dest != 7)) prof <- profile(unidiff, trace = TRUE) plot(prof) ## clearly not quadratic for Mult1.Factor1.educ4 or Mult1.Factor1.educ5! confint(prof) ## 2.5 % 97.5 % ## Mult(Exp(.), orig:dest).educ1 NA NA ## Mult(Exp(.), orig:dest).educ2 -0.5978901 0.1022447 ## Mult(Exp(.), orig:dest).educ3 -1.4836854 -0.2362378 ## Mult(Exp(.), orig:dest).educ4 -2.5792398 -0.2953420 ## Mult(Exp(.), orig:dest).educ5 -Inf -0.7006889 coefData <- se(unidiff) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ### A far from quadratic example, also with eliminated parameters backPainLong <- expandCategorical(backPain, "pain") oneDimensional <- gnm(count ~ pain + Mult(pain, x1 + x2 + x3), eliminate = id, family = "poisson", constrain = "[.](painworse|x1)", constrainTo = c(0, 1), data = backPainLong) prof <- profile(oneDimensional, trace = TRUE) plot(prof) ## not quadratic for any non-eliminated parameter confint(prof) coefData <- se(oneDimensional) cbind(coefData[1] - 1.96 * coefData[2], coefData[1] + 1.96 * coefData[2]) ## End(Not run)
This function uses the first d
components of the singular value
decomposition in order to approximate a vector of model residuals by a
sum of d
multiplicative terms, with the multiplicative
structure determined by two specified factors.
residSVD(model, fac1, fac2, d = 1)
residSVD(model, fac1, fac2, d = 1)
model |
a model object with |
fac1 |
a factor |
fac2 |
a factor |
d |
integer, the number of multiplicative terms to use in the approximation |
This function operates on the matrix of mean residuals, with rows
indexed by fac1
and columns indexed by fac2
. For
glm
and glm
models, the matrix entries are weighted
working residuals. The primary use of residSVD
is to
generate good starting values for the parameters in Mult
terms
in models to be fitted using gnm
.
If d = 1
, a numeric vector; otherwise a numeric
matrix with d
columns.
David Firth and Heather Turner
set.seed(1) ## Goodman RC1 association model fits well (deviance 3.57, df 8) mentalHealth$MHS <- C(mentalHealth$MHS, treatment) mentalHealth$SES <- C(mentalHealth$SES, treatment) ## independence model indep <- gnm(count ~ SES + MHS, family = poisson, data = mentalHealth) mult1 <- residSVD(indep, SES, MHS) ## Now use mult1 as starting values for the RC1 association parameters RC1model <- update(indep, . ~ . + Mult(SES, MHS), start = c(coef(indep), mult1), trace = TRUE) ## Similarly for the RC2 model: mult2 <- residSVD(indep, SES, MHS, d = 2) RC2model <- update(indep, . ~ . + instances(Mult(SES, MHS), 2), start = c(coef(indep), mult2), trace = TRUE) ## ## See also example(House2001), where good starting values matter much more! ##
set.seed(1) ## Goodman RC1 association model fits well (deviance 3.57, df 8) mentalHealth$MHS <- C(mentalHealth$MHS, treatment) mentalHealth$SES <- C(mentalHealth$SES, treatment) ## independence model indep <- gnm(count ~ SES + MHS, family = poisson, data = mentalHealth) mult1 <- residSVD(indep, SES, MHS) ## Now use mult1 as starting values for the RC1 association parameters RC1model <- update(indep, . ~ . + Mult(SES, MHS), start = c(coef(indep), mult1), trace = TRUE) ## Similarly for the RC2 model: mult2 <- residSVD(indep, SES, MHS, d = 2) RC2model <- update(indep, . ~ . + instances(Mult(SES, MHS), 2), start = c(coef(indep), mult2), trace = TRUE) ## ## See also example(House2001), where good starting values matter much more! ##
Computes approximate standard errors for (a selection of) individual
parameters or one or more linear combinations of the parameters in a
gnm
(generalized nonlinear model) object. By default, a
check is made first on the estimability of each specified combination.
## S3 method for class 'gnm' se(object, estimate = NULL, checkEstimability = TRUE, Vcov = NULL, dispersion = NULL, ...)
## S3 method for class 'gnm' se(object, estimate = NULL, checkEstimability = TRUE, Vcov = NULL, dispersion = NULL, ...)
object |
a model object of class |
estimate |
(optional) specifies parameters or linear
combinations of parameters for which to find standard errors. In the
first case either a character vector of names, a
numeric vector of indices or |
checkEstimability |
logical: should the estimability of all specified combinations be checked? |
Vcov |
either NULL, or a matrix |
dispersion |
either NULL, or a positive number |
... |
possible further arguments for
|
A data frame with two columns:
Estimate |
The estimated parameter combinations |
Std. Error |
Their estimated standard errors |
If available, the column names of coefMatrix
will be used to name
the rows.
In the case where estimate
is a numeric vector, se
will
assume that indices have been specified if all the values of
estimate
are in seq(length(coef(object))
.
Where both Vcov
and dispersion
are supplied, the
variance-covariance matrix of estimated model coefficients is taken to
be Vcov * dispersion
.
David Firth and Heather Turner
gnm
, getContrasts
,
checkEstimable
, ofInterest
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", family = poisson, data = yaish, subset = (dest != 7)) ## Deviance is 200.3 ## Get estimate and se for the contrast between educ4 and educ5 in the ## UNIDIFF multiplier mycontrast <- numeric(length(coef(unidiff))) mycontrast[ofInterest(unidiff)[4:5]] <- c(1, -1) se(unidiff, mycontrast) ## Get all of the contrasts with educ5 in the UNIDIFF multipliers getContrasts(unidiff, rev(ofInterest(unidiff)))
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), ofInterest = "[.]educ", family = poisson, data = yaish, subset = (dest != 7)) ## Deviance is 200.3 ## Get estimate and se for the contrast between educ4 and educ5 in the ## UNIDIFF multiplier mycontrast <- numeric(length(coef(unidiff))) mycontrast[ofInterest(unidiff)[4:5]] <- c(1, -1) se(unidiff, mycontrast) ## Get all of the contrasts with educ5 in the UNIDIFF multipliers getContrasts(unidiff, rev(ofInterest(unidiff)))
summary
method for objects of class "gnm"
## S3 method for class 'gnm' summary(object, dispersion = NULL, correlation = FALSE, symbolic.cor = FALSE, with.eliminate = FALSE, ...) ## S3 method for class 'summary.gnm' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), symbolic.cor = x$symbolic.cor, ...)
## S3 method for class 'gnm' summary(object, dispersion = NULL, correlation = FALSE, symbolic.cor = FALSE, with.eliminate = FALSE, ...) ## S3 method for class 'summary.gnm' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), symbolic.cor = x$symbolic.cor, ...)
object |
an object of class |
x |
an object of class |
dispersion |
the dispersion parameter for the fitting family. By
default it is obtained from |
correlation |
logical: if |
digits |
the number of significant digits to use when printing. |
symbolic.cor |
logical: if |
signif.stars |
logical. If |
with.eliminate |
Logical. If |
... |
further arguments passed to or from other methods. |
print.summary.gnm
prints the original call to gnm
; a
summary of the deviance residuals from the model fit; the coefficients
of the model; the residual deviance; the Akaike's Information
Criterion value, and the number of main iterations performed.
Standard errors, z-values and p-values are printed alongside the
coefficients, with "significance stars" if signif.stars
is
TRUE
.
When the "summary.gnm"
object has a "correlation"
component, the lower triangle of this matrix is also printed, to two
decimal places (or symbolically); to see the full matrix of
correlations print summary(object, correlation =
TRUE)$correlation
directly.
The standard errors returned by summary.gnm
are scaled by
sqrt(dispersion)
. If the dispersion is not specified, it is
taken as 1
for the binomial
and Poisson
families,
and otherwise estimated by the residual Chi-squared statistic divided
by the residual degrees of freedom. For coefficients that have been
constrained or are not estimable, the standard error is returned as
NA
.
summary.gnm
returns an object of class "summary.gnm"
,
which is a list with components
call |
the |
ofInterest |
the |
family |
the |
deviance |
the |
aic |
the |
df.residual |
the |
iter |
the |
deviance.resid |
the deviance residuals, see
|
coefficients |
the matrix of coefficients, standard errors, z-values and p-values. |
elim.coefs |
if |
dispersion |
either the supplied argument or the estimated dispersion if
the latter is |
df |
a 3-vector of the rank of the model; the number of residual degrees of freedom, and number of unconstrained coefficients. |
cov.scaled |
the estimated covariance matrix scaled by
|
correlation |
(only if |
symbolic.cor |
(only if |
The gnm
class includes generalized linear models, and it
should be noted that summary.gnm
differs from
summary.glm
in that it does not omit coefficients which
are NA
from the objects it returns. (Such coefficients are
NA
since they have been fixed at 0
either by use of the
constrain
argument to gnm
or by a convention to handle
linear aliasing).
Modification of summary.glm
by the R Core Team. Adapted
for "gnm"
objects by Heather Turner.
### First example from ?Dref set.seed(1) ## reconstruct counts voting Labour/non-Labour count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) ## fit diagonal reference model with constant weights classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) ## summarize results - note diagonal weights are over-parameterised summary(classMobility) ## refit setting first weight to zero (as DrefWeights() does) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting, constrain = "delta1") summary(classMobility)
### First example from ?Dref set.seed(1) ## reconstruct counts voting Labour/non-Labour count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) ## fit diagonal reference model with constant weights classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) ## summarize results - note diagonal weights are over-parameterised summary(classMobility) ## refit setting first weight to zero (as DrefWeights() does) classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting, constrain = "delta1") summary(classMobility)
Symm
codes the symmetric interaction of factors having
the same set of levels, for use in regression models of symmetry or
quasi-symmetry.
Symm(..., separator = ":")
Symm(..., separator = ":")
... |
one or more factors. |
separator |
a character string of length 1 or more, to be used in naming the levels of the resulting interaction factor. |
A factor whose levels index the symmetric interaction of all factors supplied as input.
David Firth and Heather Turner
# square table rowfac <- gl(4, 4, 16) colfac <- gl(4, 1, 16) symm4by4 <- Symm(rowfac, colfac) matrix(symm4by4, 4, 4) # 3 x 3 x 3 table ind <- expand.grid(A = 1:3, B = 1:3, C = 1:3) symm3cubed <- with(ind, Symm(A, B, C)) array(symm3cubed, c(3, 3, 3))
# square table rowfac <- gl(4, 4, 16) colfac <- gl(4, 1, 16) symm4by4 <- Symm(rowfac, colfac) matrix(symm4by4, 4, 4) # 3 x 3 x 3 table ind <- expand.grid(A = 1:3, B = 1:3, C = 1:3) symm3cubed <- with(ind, Symm(A, B, C)) array(symm3cubed, c(3, 3, 3))
termPredictors
is a generic function which extracts the
contribution of each term to the predictor from a fitted model object.
termPredictors(object, ...)
termPredictors(object, ...)
object |
a fitted model object. |
... |
additional arguments for method functions. |
The default method assumes that the predictor is linear and calculates
the contribution of each term from the model matrix and fitted
coefficients. A method is also available for gnm
objects.
A matrix with the additive components of the predictor in labelled columns.
Heather Turner
## Linear model G <- gl(4, 6) x <- 1:24 y <- rnorm(24, 0, 1) lmGx <- lm(y ~ G + x) contrib <- termPredictors(lmGx) contrib all.equal(as.numeric(rowSums(contrib)), as.numeric(lmGx$fitted)) #TRUE ## Generalized linear model y <- cbind(rbinom(24, 10, 0.5), rep(10, 24)) glmGx <- glm(y ~ G + x, family = binomial) contrib <- termPredictors(glmGx) contrib all.equal(as.numeric(rowSums(contrib)), as.numeric(glmGx$linear.predictors)) #TRUE ## Generalized nonlinear model A <- gl(4, 6) B <- gl(6, 1, 24) y <- cbind(rbinom(24, 10, 0.5), rep(10, 24)) set.seed(1) gnmAB <- gnm(y ~ A + B + Mult(A, B), family = binomial) contrib <- termPredictors(gnmAB) contrib all.equal(as.numeric(rowSums(contrib)), as.numeric(gnmAB$predictors)) #TRUE
## Linear model G <- gl(4, 6) x <- 1:24 y <- rnorm(24, 0, 1) lmGx <- lm(y ~ G + x) contrib <- termPredictors(lmGx) contrib all.equal(as.numeric(rowSums(contrib)), as.numeric(lmGx$fitted)) #TRUE ## Generalized linear model y <- cbind(rbinom(24, 10, 0.5), rep(10, 24)) glmGx <- glm(y ~ G + x, family = binomial) contrib <- termPredictors(glmGx) contrib all.equal(as.numeric(rowSums(contrib)), as.numeric(glmGx$linear.predictors)) #TRUE ## Generalized nonlinear model A <- gl(4, 6) B <- gl(6, 1, 24) y <- cbind(rbinom(24, 10, 0.5), rep(10, 24)) set.seed(1) gnmAB <- gnm(y ~ A + B + Mult(A, B), family = binomial) contrib <- termPredictors(gnmAB) contrib all.equal(as.numeric(rowSums(contrib)), as.numeric(gnmAB$predictors)) #TRUE
Given two or more factors Topo
creates an interaction factor
as specified by an array of levels, which may be arbitrarily
structured.
Topo(..., spec = NULL)
Topo(..., spec = NULL)
... |
two or more factors |
spec |
an array of levels, with dimensions corresponding to the number of levels of each factor in the interaction |
A factor of levels extracted from the levels array given in
spec
, using the given factors as index variables.
David Firth
Erikson, R., Goldthorpe, J. H. and Portocarero, L. (1982) Social Fluidity in Industrial Nations: England, France and Sweden. Brit. J. Sociol. 33(1), 1-34.
Xie, Y. (1992) The Log-multiplicative Layer Effect Model for Comparing Mobility Tables. Am. Sociol. Rev. 57(3), 380-395.
Symm
and Diag
for special cases
set.seed(1) ### Collapse to 7 by 7 table as in Erikson (1982) erikson <- as.data.frame(erikson) lvl <- levels(erikson$origin) levels(erikson$origin) <- levels(erikson$destination) <- c(rep(paste(lvl[1:2], collapse = " + "), 2), lvl[3], rep(paste(lvl[4:5], collapse = " + "), 2), lvl[6:9]) erikson <- xtabs(Freq ~ origin + destination + country, data = erikson) ### Create array of interaction levels as in Table 2 of Xie (1992) levelMatrix <- matrix(c(2, 3, 4, 6, 5, 6, 6, 3, 3, 4, 6, 4, 5, 6, 4, 4, 2, 5, 5, 5, 5, 6, 6, 5, 1, 6, 5, 2, 4, 4, 5, 6, 3, 4, 5, 5, 4, 5, 5, 3, 3, 5, 6, 6, 5, 3, 5, 4, 1), 7, 7, byrow = TRUE) ### Fit the levels models given in Table 3 of Xie (1992) ## Null association between origin and destination nullModel <- gnm(Freq ~ country:origin + country:destination, family = poisson, data = erikson) ## Interaction specified by levelMatrix, common to all countries commonTopo <- update(nullModel, ~ . + Topo(origin, destination, spec = levelMatrix)) ## Interaction specified by levelMatrix, different multiplier for ## each country multTopo <- update(nullModel, ~ . + Mult(Exp(country), Topo(origin, destination, spec = levelMatrix))) ## Interaction specified by levelMatrix, different effects for ## each country separateTopo <- update(nullModel, ~ . + country:Topo(origin, destination, spec = levelMatrix))
set.seed(1) ### Collapse to 7 by 7 table as in Erikson (1982) erikson <- as.data.frame(erikson) lvl <- levels(erikson$origin) levels(erikson$origin) <- levels(erikson$destination) <- c(rep(paste(lvl[1:2], collapse = " + "), 2), lvl[3], rep(paste(lvl[4:5], collapse = " + "), 2), lvl[6:9]) erikson <- xtabs(Freq ~ origin + destination + country, data = erikson) ### Create array of interaction levels as in Table 2 of Xie (1992) levelMatrix <- matrix(c(2, 3, 4, 6, 5, 6, 6, 3, 3, 4, 6, 4, 5, 6, 4, 4, 2, 5, 5, 5, 5, 6, 6, 5, 1, 6, 5, 2, 4, 4, 5, 6, 3, 4, 5, 5, 4, 5, 5, 3, 3, 5, 6, 6, 5, 3, 5, 4, 1), 7, 7, byrow = TRUE) ### Fit the levels models given in Table 3 of Xie (1992) ## Null association between origin and destination nullModel <- gnm(Freq ~ country:origin + country:destination, family = poisson, data = erikson) ## Interaction specified by levelMatrix, common to all countries commonTopo <- update(nullModel, ~ . + Topo(origin, destination, spec = levelMatrix)) ## Interaction specified by levelMatrix, different multiplier for ## each country multTopo <- update(nullModel, ~ . + Mult(Exp(country), Topo(origin, destination, spec = levelMatrix))) ## Interaction specified by levelMatrix, different effects for ## each country separateTopo <- update(nullModel, ~ . + country:Topo(origin, destination, spec = levelMatrix))
This method extracts or computes a variance-covariance matrix for use in approximate inference on estimable parameter combinations in a generalized nonlinear model.
## S3 method for class 'gnm' vcov(object, dispersion = NULL, with.eliminate = FALSE, ...)
## S3 method for class 'gnm' vcov(object, dispersion = NULL, with.eliminate = FALSE, ...)
object |
a model object of class |
dispersion |
the dispersion parameter for the fitting family. By
default it is obtained from |
with.eliminate |
logical; should parts of the variance-covariance matrix corresponding to eliminated coefficients be computed? |
... |
as for |
The resultant matrix does not itself necessarily
contain variances and covariances, since gnm
typically works
with over-parameterized model representations in which parameters are
not all identified. Rather, the resultant matrix is to be used as
the kernel of quadratic forms which are the variances or
covariances for estimable parameter combinations.
The matrix values are scaled by dispersion
. If the dispersion
is not specified, it is taken as 1
for the binomial
and
Poisson
families, and otherwise estimated by the residual
Chi-squared statistic divided by the residual degrees of freedom. The
dispersion used is returned as an attribute of the matrix.
The dimensions of the matrix correspond to the non-eliminated
coefficients of the "gnm"
object. If use.eliminate =
TRUE
then setting can sometimes give appreciable
speed gains; see gnm
for details of the eliminate
mechanism. The use.eliminate
argument is currently ignored if the
model has full rank.
A matrix with number of rows/columns equal to
length(coef(object))
. If there are eliminated coefficients and
use.eliminate = TRUE
, the matrix will have the following
attributes:
covElim |
a matrix of covariances between the eliminated and non-eliminated parameters. |
varElim |
a vector of variances corresponding to the eliminated parameters. |
The gnm
class includes generalized linear models, and it
should be noted that the
behaviour of vcov.gnm
differs from that of
vcov.glm
whenever any(is.na(coef(object)))
is
TRUE
. Whereas vcov.glm
drops all rows and columns which
correspond to NA
values in coef(object)
, vcov.gnm
keeps those columns (which are full of zeros, since the NA
represents a parameter which is fixed either by use of the
constrain
argument to gnm
or by a convention to handle
linear aliasing).
David Firth
Turner, H and Firth, D (2005). Generalized nonlinear models in R: An overview of the gnm package. At https://cran.r-project.org
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## Examine the education multipliers (differences on the log scale): ind <- pickCoef(unidiff, "[.]educ") educMultipliers <- getContrasts(unidiff, rev(ind)) ## Now get the same standard errors using a suitable set of ## quadratic forms, by calling vcov() directly: cmat <- contr.sum(ind) sterrs <- sqrt(diag(t(cmat) %*% vcov(unidiff)[ind, ind] %*% cmat)) all(sterrs == (educMultipliers$SE)[-1]) ## TRUE
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## Examine the education multipliers (differences on the log scale): ind <- pickCoef(unidiff, "[.]educ") educMultipliers <- getContrasts(unidiff, rev(ind)) ## Now get the same standard errors using a suitable set of ## quadratic forms, by calling vcov() directly: cmat <- contr.sum(ind) sterrs <- sqrt(diag(t(cmat) %*% vcov(unidiff)[ind, ind] %*% cmat)) all(sterrs == (educMultipliers$SE)[-1]) ## TRUE
Voting data from the 1987 British general election, cross-classified by the class of the head of household and the class of their father.
voting
voting
A data frame with 25 observations on the following 4 variables.
percentage
the percentage of the cell voting Labour.
total
the cell count.
origin
a factor describing the father's class with
levels 1:5
.
destination
a factor describing the head of
household's class with levels 1:5
.
Clifford, P. and Heath, A. F. (1993) The Political Consequences of Social Mobility. J. Roy. Stat. Soc. A, 156(1), 51-61.
### Examples from Clifford and Heath paper ### (Results differ slightly - possible transcription error in ### published data?) set.seed(1) ## reconstruct counts voting Labour/non-Labour count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) ## fit diagonal reference model with constant weights classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) DrefWeights(classMobility) ## create factors indicating movement in and out of salariat (class 1) upward <- with(voting, origin != 1 & destination == 1) downward <- with(voting, origin == 1 & destination != 1) ## fit separate weights for the "socially mobile" groups socialMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward + upward), family = binomial, data = voting) DrefWeights(socialMobility) ## fit separate weights for downwardly mobile groups only downwardMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward), family = binomial, data = voting) DrefWeights(downwardMobility)
### Examples from Clifford and Heath paper ### (Results differ slightly - possible transcription error in ### published data?) set.seed(1) ## reconstruct counts voting Labour/non-Labour count <- with(voting, percentage/100 * total) yvar <- cbind(count, voting$total - count) ## fit diagonal reference model with constant weights classMobility <- gnm(yvar ~ -1 + Dref(origin, destination), family = binomial, data = voting) DrefWeights(classMobility) ## create factors indicating movement in and out of salariat (class 1) upward <- with(voting, origin != 1 & destination == 1) downward <- with(voting, origin == 1 & destination != 1) ## fit separate weights for the "socially mobile" groups socialMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward + upward), family = binomial, data = voting) DrefWeights(socialMobility) ## fit separate weights for downwardly mobile groups only downwardMobility <- gnm(yvar ~ -1 + Dref(origin, destination, delta = ~ 1 + downward), family = binomial, data = voting) DrefWeights(downwardMobility)
Creates a family
object for use with glm
,
gnm
, etc., for the variance function
introduced by Wedderburn (1974) for response values in
[0,1].
wedderburn(link = "logit")
wedderburn(link = "logit")
link |
The name of a link function. Allowed are "logit", "probit" and "cloglog". |
An object of class family
.
The reported deviance involves an arbitrary constant (see McCullagh and Nelder, 1989, p330); for estimating dispersion, use the Pearson chi-squared statistic instead.
Modification of binomial
by the R Core Team. Adapted
for the Wedderburn quasi-likelihood family by David Firth.
Gabriel, K R (1998). Generalised bilinear regression. Biometrika 85, 689–700.
McCullagh, P and Nelder, J A (1989). Generalized Linear Models (2nd ed). Chapman and Hall.
Wedderburn, R W M (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton method. Biometrika 61, 439–47.
set.seed(1) ### Use data from Wedderburn (1974), see ?barley ### Fit Wedderburn's logit model with variance proportional to the ### square of mu(1-mu) logitModel <- glm(y ~ site + variety, family = wedderburn, data = barley) fit <- fitted(logitModel) print(sum((barley$y - fit)^2 / (fit * (1-fit))^2)) ## Agrees with the chi-squared value reported in McCullagh and Nelder ## (1989, p331), which differs slightly from Wedderburn's reported value. ### Fit the biplot model as in Gabriel (1998, p694) biplotModel <- gnm(y ~ -1 + instances(Mult(site, variety), 2), family = wedderburn, data = barley) barleySVD <- svd(matrix(biplotModel$predictors, 10, 9)) A <- sweep(barleySVD$v, 2, sqrt(barleySVD$d), "*")[, 1:2] B <- sweep(barleySVD$u, 2, sqrt(barleySVD$d), "*")[, 1:2] ## These are essentially A and B as in Gabriel (1998, p694), from which ## the biplot is made by plot(rbind(A, B), pch = c(LETTERS[1:9], as.character(1:9), "X")) ### Fit the double-additive model as in Gabriel (1998, p697) variety.binary <- factor(match(barley$variety, c(2,3,6), nomatch = 0) > 0, labels = c("Rest", "2,3,6")) doubleAdditive <- gnm(y ~ variety + Mult(site, variety.binary), family = wedderburn, data = barley)
set.seed(1) ### Use data from Wedderburn (1974), see ?barley ### Fit Wedderburn's logit model with variance proportional to the ### square of mu(1-mu) logitModel <- glm(y ~ site + variety, family = wedderburn, data = barley) fit <- fitted(logitModel) print(sum((barley$y - fit)^2 / (fit * (1-fit))^2)) ## Agrees with the chi-squared value reported in McCullagh and Nelder ## (1989, p331), which differs slightly from Wedderburn's reported value. ### Fit the biplot model as in Gabriel (1998, p694) biplotModel <- gnm(y ~ -1 + instances(Mult(site, variety), 2), family = wedderburn, data = barley) barleySVD <- svd(matrix(biplotModel$predictors, 10, 9)) A <- sweep(barleySVD$v, 2, sqrt(barleySVD$d), "*")[, 1:2] B <- sweep(barleySVD$u, 2, sqrt(barleySVD$d), "*")[, 1:2] ## These are essentially A and B as in Gabriel (1998, p694), from which ## the biplot is made by plot(rbind(A, B), pch = c(LETTERS[1:9], as.character(1:9), "X")) ### Fit the double-additive model as in Gabriel (1998, p697) variety.binary <- factor(match(barley$variety, c(2,3,6), nomatch = 0) > 0, labels = c("Rest", "2,3,6")) doubleAdditive <- gnm(y ~ variety + Mult(site, variety.binary), family = wedderburn, data = barley)
Data from a 10-year experiment at the CIMMYT experimental station located in the Yaqui Valley near Ciudad Obregon, Sonora, Mexico — factorial design using 24 treatments in all. In each of the 10 years the experiment was arranged in a randomized complete block design with three replicates.
wheat
wheat
A data frame with 240 observations on the following 33 variables.
numeric, mean yield in kg/ha for 3 replicates
a factor with levels 1988:1997
a factor with levels T
t
a factor with levels S
s
a factor with levels M
m
a factor with levels 0
N
n
numeric, mean max temp sheltered (deg C) in December
same for January
same for February
same for March
same for April
numeric, mean min temp sheltered (deg C) in December
same for January
same for February
same for March
same for April
numeric, mean min temp unsheltered (deg C)in December
same for January
same for February
same for March
same for April
numeric, total precipitation (mm) in December
same for January
same for February
same for March
numeric, mean sun hours in December
same for January
same for February
numeric, total evaporation (mm) in December
same for January
same for February
same for March
same for April
Tables A1 and A3 of Vargas, M, Crossa, J, van Eeuwijk, F, Sayre, K D and Reynolds, M P (2001). Interpreting treatment by environment interaction in agronomy trials. Agronomy Journal 93, 949–960.
set.seed(1) ## Scale yields to reproduce analyses reported in Vargas et al (2001) yield.scaled <- wheat$yield * sqrt(3/1000) ## Reproduce (up to error caused by rounding) Table 1 of Vargas et al (2001) aov(yield.scaled ~ year*tillage*summerCrop*manure*N, data = wheat) treatment <- interaction(wheat$tillage, wheat$summerCrop, wheat$manure, wheat$N, sep = "") mainEffects <- lm(yield.scaled ~ year + treatment, data = wheat) svdStart <- residSVD(mainEffects, year, treatment, 3) bilinear1 <- update(asGnm(mainEffects), . ~ . + Mult(year, treatment), start = c(coef(mainEffects), svdStart[,1])) bilinear2 <- update(bilinear1, . ~ . + Mult(year, treatment, inst = 2), start = c(coef(bilinear1), svdStart[,2])) bilinear3 <- update(bilinear2, . ~ . + Mult(year, treatment, inst = 3), start = c(coef(bilinear2), svdStart[,3])) anova(mainEffects, bilinear1, bilinear2, bilinear3) ## Examine the extent to which, say, mTF explains the first bilinear term bilinear1mTF <- gnm(yield.scaled ~ year + treatment + Mult(1 + mTF, treatment), family = gaussian, data = wheat) anova(mainEffects, bilinear1mTF, bilinear1) ## How to get the standard SVD representation of an AMMI-n model ## ## We'll work with the AMMI-2 model, which here is called "bilinear2" ## ## First, extract the contributions of the 5 terms in the model: ## wheat.terms <- termPredictors(bilinear2) ## ## That's a matrix, whose 4th and 5th columns are the interaction terms ## ## Combine those two interaction terms, to get the total estimated ## interaction effect: ## wheat.interaction <- wheat.terms[, 4] + wheat.terms[, 5] ## ## That's a vector, so we need to re-shape it as a 24 by 10 matrix ## ready for calculating the SVD: ## wheat.interaction <- matrix(wheat.interaction, 24, 10) ## ## Now we can compute the SVD: ## wheat.interaction.SVD <- svd(wheat.interaction) ## ## Only the first two singular values are nonzero, as expected ## (since this is an AMMI-2 model, the interaction has rank 2) ## ## So the result object can be simplified by re-calculating the SVD with ## just two dimensions: ## wheat.interaction.SVD <- svd(wheat.interaction, nu = 2, nv = 2)
set.seed(1) ## Scale yields to reproduce analyses reported in Vargas et al (2001) yield.scaled <- wheat$yield * sqrt(3/1000) ## Reproduce (up to error caused by rounding) Table 1 of Vargas et al (2001) aov(yield.scaled ~ year*tillage*summerCrop*manure*N, data = wheat) treatment <- interaction(wheat$tillage, wheat$summerCrop, wheat$manure, wheat$N, sep = "") mainEffects <- lm(yield.scaled ~ year + treatment, data = wheat) svdStart <- residSVD(mainEffects, year, treatment, 3) bilinear1 <- update(asGnm(mainEffects), . ~ . + Mult(year, treatment), start = c(coef(mainEffects), svdStart[,1])) bilinear2 <- update(bilinear1, . ~ . + Mult(year, treatment, inst = 2), start = c(coef(bilinear1), svdStart[,2])) bilinear3 <- update(bilinear2, . ~ . + Mult(year, treatment, inst = 3), start = c(coef(bilinear2), svdStart[,3])) anova(mainEffects, bilinear1, bilinear2, bilinear3) ## Examine the extent to which, say, mTF explains the first bilinear term bilinear1mTF <- gnm(yield.scaled ~ year + treatment + Mult(1 + mTF, treatment), family = gaussian, data = wheat) anova(mainEffects, bilinear1mTF, bilinear1) ## How to get the standard SVD representation of an AMMI-n model ## ## We'll work with the AMMI-2 model, which here is called "bilinear2" ## ## First, extract the contributions of the 5 terms in the model: ## wheat.terms <- termPredictors(bilinear2) ## ## That's a matrix, whose 4th and 5th columns are the interaction terms ## ## Combine those two interaction terms, to get the total estimated ## interaction effect: ## wheat.interaction <- wheat.terms[, 4] + wheat.terms[, 5] ## ## That's a vector, so we need to re-shape it as a 24 by 10 matrix ## ready for calculating the SVD: ## wheat.interaction <- matrix(wheat.interaction, 24, 10) ## ## Now we can compute the SVD: ## wheat.interaction.SVD <- svd(wheat.interaction) ## ## Only the first two singular values are nonzero, as expected ## (since this is an AMMI-2 model, the interaction has rank 2) ## ## So the result object can be simplified by re-calculating the SVD with ## just two dimensions: ## wheat.interaction.SVD <- svd(wheat.interaction, nu = 2, nv = 2)
A 3-way contingency table of father/son pairs, classified by father's
social class (orig
), son's social class (dest
) and son's
education level (educ
).
yaish
yaish
A table of counts, with classifying factors educ
(levels 1:5
),
orig
(levels 1:7
) and dest
(levels 1:7
).
David Firth
Originally in Yaish (1998), see also Yaish (2004, p316).
Yaish, M (1998). Opportunities, Little Change. Class Mobility in Israeli Society: 1974-1991. D.Phil. Thesis, Nuffield College, University of Oxford.
Yaish, M (2004). Class Mobility Trends in Israeli Society, 1974-1991. Lewiston: Edwin Mellen Press.
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels, leaving out ## the uninformative subtable for dest == 7: ## unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## Deviance should be 200.3, 116 d.f. ## ## Look at the multipliers of the orig:dest association: ofInterest(unidiff) <- pickCoef(unidiff, "[.]educ") coef(unidiff) ## ## Coefficients of interest: ## Mult(Exp(.), orig:dest).educ1 Mult(Exp(.), orig:dest).educ2 ## -0.5513258 -0.7766976 ## Mult(Exp(.), orig:dest).educ3 Mult(Exp(.), orig:dest).educ4 ## -1.2947494 -1.5902644 ## Mult(Exp(.), orig:dest).educ5 ## -2.8008285 ## ## Get standard errors for the contrasts with educ1: ## getContrasts(unidiff, ofInterest(unidiff)) ## estimate SE quasiSE ## Mult(Exp(.), orig:dest).educ1 0.0000000 0.0000000 0.09757438 ## Mult(Exp(.), orig:dest).educ2 -0.2253718 0.1611874 0.12885847 ## Mult(Exp(.), orig:dest).educ3 -0.7434236 0.2335083 0.21182123 ## Mult(Exp(.), orig:dest).educ4 -1.0389386 0.3434256 0.32609380 ## Mult(Exp(.), orig:dest).educ5 -2.2495026 0.9453764 0.93560643 ## quasiVar ## Mult(Exp(.), orig:dest).educ1 0.00952076 ## Mult(Exp(.), orig:dest).educ2 0.01660450 ## Mult(Exp(.), orig:dest).educ3 0.04486823 ## Mult(Exp(.), orig:dest).educ4 0.10633716 ## Mult(Exp(.), orig:dest).educ5 0.87535940 ## ## Table of model residuals: ## residuals(unidiff)
set.seed(1) ## Fit the "UNIDIFF" mobility model across education levels, leaving out ## the uninformative subtable for dest == 7: ## unidiff <- gnm(Freq ~ educ*orig + educ*dest + Mult(Exp(educ), orig:dest), family = poisson, data = yaish, subset = (dest != 7)) ## Deviance should be 200.3, 116 d.f. ## ## Look at the multipliers of the orig:dest association: ofInterest(unidiff) <- pickCoef(unidiff, "[.]educ") coef(unidiff) ## ## Coefficients of interest: ## Mult(Exp(.), orig:dest).educ1 Mult(Exp(.), orig:dest).educ2 ## -0.5513258 -0.7766976 ## Mult(Exp(.), orig:dest).educ3 Mult(Exp(.), orig:dest).educ4 ## -1.2947494 -1.5902644 ## Mult(Exp(.), orig:dest).educ5 ## -2.8008285 ## ## Get standard errors for the contrasts with educ1: ## getContrasts(unidiff, ofInterest(unidiff)) ## estimate SE quasiSE ## Mult(Exp(.), orig:dest).educ1 0.0000000 0.0000000 0.09757438 ## Mult(Exp(.), orig:dest).educ2 -0.2253718 0.1611874 0.12885847 ## Mult(Exp(.), orig:dest).educ3 -0.7434236 0.2335083 0.21182123 ## Mult(Exp(.), orig:dest).educ4 -1.0389386 0.3434256 0.32609380 ## Mult(Exp(.), orig:dest).educ5 -2.2495026 0.9453764 0.93560643 ## quasiVar ## Mult(Exp(.), orig:dest).educ1 0.00952076 ## Mult(Exp(.), orig:dest).educ2 0.01660450 ## Mult(Exp(.), orig:dest).educ3 0.04486823 ## Mult(Exp(.), orig:dest).educ4 0.10633716 ## Mult(Exp(.), orig:dest).educ5 0.87535940 ## ## Table of model residuals: ## residuals(unidiff)