Spatial analyses represents an optimal tool for post-hoc correction of field trials information. Heterogeneous factors that were not full corrected by the plotting techniques (i.e.Β soil fertility, diseases, and water holding capacity) can cause some disturbance in the prediction of BLUP values and, at the end, in the selection of the genotypes.
Below, we implement a set of models that is known as spatial analyses. In addition to demonstrate each step of the analysis, we also cover several parameters of interest for estimation/prediction purposes.
The goodness-of-fit of a model to the data can be measured by different parameters. Below, we describe the most useful for spatial analyses. It is worth to mentioning that the spatial analysesβ objective is to select the most accurately and parsimoniously model model that captures the spatial effects in the field. This will provides the most acurate and precise estimation of variance components and prediction of the genotypes values (Isik et al.Β 2018)
The heritability represents how much of the phenotypic variation came from a genetic source. We can measure it based on the the variance components estimated by the model. The simplest representation of heritability (\(h^2\)) follows the equation:
\[ h^2 = \frac{\sigma^2_g}{\sigma^2_g + \sigma^2_{res}} \]
where \(\sigma^2_g\) is represents the genotypic variance and \(\sigma^2_{res}\) represents the residual variance.
The prediction accuracy represents a measure of how close a prediction is to the real genotypic value of a genotype for a target trait. It can be estimated by:
\[ r_{\tilde{g}g} = cor(predicted.value,real.value) \]
One simple way to get an estimation of accuracy is using the square root of the heritability:
\[ r_{\tilde{g}g} = {\sqrt{h^2}} \]
In mixed models, one alternative is to estimated the heritability is to use the prediction error variance (PEV) from the prediction. It can be easily estimated by:
\[ PEV = std.error^2 \]
where the \(std.error\) represents the standard error from the genotypes predictions. After, we can use the following equation (Mrode, 2014):
\[ r_{\tilde{g}g} = \sqrt{1-\frac{PEV}{\sigma^2_g}} \]
The accuracy values can vary from 0 to 1. Values close to 1 represents the best accuracy from a prediction of a target trait.
The likelihood information test (LRT - Wilks, 1938) is indicated to compared models. However, the comparison take into account that the models share the same fixed effects and one model (reduced model) has a subset of the random effects of the other model (full model). The LRT formulae is given as follow:
\[ LRT = -2(logL_{full}-logL_{reduced}) \] where \(logL_{full}\) represents the log likelihood of the full model (i.e.Β all random effects), and \(logL_{reduced}\) represents the log likelihood of the reduced model (i.e.Β missing one random effect). The difference in-between models should be higher than 3.84 (\(\chi^2\) test, with 5% of probability and 1 degree of freedom). At this level (higher than 3.84) we consider that we have enough evidence to reject the null hypothesis, where the models are similar.
The most indicated parameters to measure the goodness-of-fit of one model to the data are Akaike Information Criteria (AIC - Akaike, 1974) and Schwarzβs Bayesian Information Criteria (BIC - Schwarz, 1978). They are indicated for models that are not nested and share the same fixed effects. The formulae definition for AIC and BIC are the following:
\[ AIC = -2*logL + 2t \]
\[ BIC = -2*logL + 2t*log(v) \]
where LogL represents the log likelihood of a model, \(t\) represents the number of covariance parameters in the model, \(v\) is the residual degrees of freedom, where \(v = n-p\), being \(n\) the number of observations and \(p\) the number of parameters in fixed effect factors. If the difference in-between two models is superior to 2, we can consider that the model with lower value of AIC/BIC is the best fit model to the data. In addition, if the difference is lower than 2, we should choose the most parsimoniously model, in other words, the model with lower number of parameters.
require(asreml)
## Online License checked out Thu Oct 26 11:57:57 2023
require(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
rm(list=ls())
data = read.table("../data/data_spatial.txt", h=TRUE)
This dataset refers to an evaluation of 78 maize genotypes, in one location, and the trait measured was Yield.
Factor | Levels |
---|---|
Location | One |
Genotypes | 78 |
Checks | 6 |
Blocks | 3 |
Ranges | 18 |
Rows | 14 |
# Setting as factors
data$Row = data$Row %>% as.factor()
data$Range = data$Range %>% as.factor()
data$Block = data$Block %>% as.factor()
data$Genotype = data$Genotype %>% as.factor()
data$Check = data$Check %>% as.factor()
Exploring the data set by graphs
boxplot(Yield~Block,
data=data,
main="Trait performance over Blocks",
xlab="Block Number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)
Exploring the data set by graphs
boxplot(Yield~Range,
data=data,
main="Trait performance over ranges",
xlab="Range Number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)
Exploring the data set by graphs
boxplot(Yield~Row,
data=data,
main="Trait performance over rows",
xlab="Row number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)
For the spatial analyses we list in table below the possible models for spatial analyses. We implemented each one of the models below. We used the before mentioned parameters to choose the best model (AIC, BIC, accuracy, heritability). The given model is as follow:
\[ y = Xb + Zg + e \]
where \(y\) is the vector of phenotypic data; \(b\) is the vector of block-check combination (assumed to be fixed), which comprises the effects of block and checks within the blocks; \(g\) is the vector of genotype effects (assumed to be random), where \(g~N(0,\sigma^2_g)\); and \(e\) is the vector of residuals (random), where \(e~N(0,\sigma^2_{res})\). The letters \(X\) and \(Z\) refers to the incidence matrices for \(b\) and \(g\), respectively.
Model | Action | Fitted model for residual |
---|---|---|
Mod1 | Only residual effect | \(\sigma^2_{res}I_rβI_c\) |
Mod2 | Includes AR1 for row | \(\sigma^2_{π}β_r(πΊ_r)βI_c\) |
Mod3 | Includes AR1 for range | \(\sigma^2_{π}I_rββ_c(πΊ_c)\) |
Mod4 | Includes AR1 for row/range | \(\sigma^2_{π}β_r(πΊ_r)ββ_c(πΊ_c)\) |
Mod5 | Includes AR1 for row/range and nugget effect | \(\sigma^2_{π}β_r(πΊ_r)ββ_c(πΊ_c), \sigma^2_πΌ\) |
# Toy model
fm = asreml(fixed = Yield ~ Block + at(Check):Block,
random = ~ Genotype,
na.action = na.method(x = "include"),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:58 2023
## LogLik Sigma2 DF wall cpu
## 1 -1666.765 759353.7 227 11:57:58 0.0
## 2 -1666.345 737700.9 227 11:57:58 0.0
## 3 -1666.026 712706.0 227 11:57:58 0.0
## 4 -1665.932 692796.8 227 11:57:58 0.0
## 5 -1665.931 691635.9 227 11:57:58 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
fm1 = asreml(fixed = Yield ~ Block + Check:Block,
random = ~ Genotype,
na.action = na.method(x = "include"),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:58 2023
## LogLik Sigma2 DF wall cpu
## 1 -1666.765 759353.7 227 11:57:58 0.0
## 2 -1666.345 737700.9 227 11:57:58 0.0
## 3 -1666.026 712706.0 227 11:57:58 0.0
## 4 -1665.932 692796.8 227 11:57:58 0.0
## 5 -1665.931 691635.9 227 11:57:58 0.0
summary(fm)$varcomp
## component std.error z.ratio bound %ch
## Genotype 157043.4 68856.25 2.280743 P 0
## units!R 691635.9 79886.29 8.657755 P 0
plot(fm)
# Model 1 Same model, but with structure
mod1 = asreml(fixed = Yield~Block+at(Check):Block,
random = ~Genotype,
na.action = na.method(x = "include"),
residual = ~id(Row):id(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:59 2023
## LogLik Sigma2 DF wall cpu
## 1 -1666.765 759353.7 227 11:57:59 0.0
## 2 -1666.345 737700.9 227 11:57:59 0.0
## 3 -1666.026 712706.0 227 11:57:59 0.0
## 4 -1665.932 692796.8 227 11:57:59 0.0
## 5 -1665.931 691635.9 227 11:57:59 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
summary(mod1)$varcomp
## component std.error z.ratio bound %ch
## Genotype 157043.4 68856.25 2.280743 P 0
## Row:Range!R 691635.9 79886.29 8.657755 P 0
plot(varioGram(mod1))
Model outcomes
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod1,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])
# variance components
summary(mod1)$varcomp
## component std.error z.ratio bound %ch
## Genotype 157043.4 68856.25 2.280743 P 0
## Row:Range!R 691635.9 79886.29 8.657755 P 0
# Traditional definition
(H2.mod1<-vpredict(mod1,H2~V1/(V1+V2)))
## Estimate SE
## H2 0.1850445 12047.14
# Model accuracy
# Estimation via heritability
(Acc.mod1 = sqrt(H2.mod1$Estimate[1]))
## [1] 0.430168
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod1)$varcomp[1,1]
(Acc.PEV.mod1<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6286996
# Akaike information criterion
aic.mod1 = summary(mod1)$aic[1]
# Bayesian information criterion
bic.mod1 = summary(mod1)$bic[1]
# Model 2 - Trends in Rows
mod2 = asreml(fixed = Yield~Block+at(Check):Block,
random = ~Genotype,
na.action = na.method(x = "include"),
residual = ~ar1(Row):id(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:59 2023
## LogLik Sigma2 DF wall cpu
## 1 -1663.337 743638.1 227 11:57:59 0.0
## 2 -1661.662 717589.6 227 11:57:59 0.0
## 3 -1660.231 694389.3 227 11:57:59 0.0
## 4 -1659.666 684226.9 227 11:57:59 0.0
## 5 -1659.648 684909.9 227 11:57:59 0.0
## 6 -1659.648 685030.1 227 11:57:59 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
plot(varioGram.asreml(mod2))
Model outcomes
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod2,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])
# variance components
summary(mod2)$varcomp
## component std.error z.ratio bound %ch
## Genotype 1.781998e+05 6.653420e+04 2.678319 P 0.1
## Row:Range!R 6.850301e+05 8.125227e+04 8.430904 P 0.0
## Row:Range!Row!cor 3.016325e-01 7.659184e-02 3.938181 U 0.1
# Traditional definition
(H2.mod2<-vpredict(mod2,H2~V1/(V1+V2)))
## Estimate SE
## H2 0.2064338 13310.69
# Model accuracy
# Estimation via heritability
(Acc.mod2 = sqrt(H2.mod2$Estimate[1]))
## [1] 0.4543498
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod2)$varcomp[1,1]
(Acc.PEV.mod2<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6797007
# Akaike information criterion
aic.mod2 = summary(mod2)$aic[1]
# Bayesian information criterion
bic.mod2 = summary(mod2)$bic[1]
# Model 3
mod3 = asreml(fixed = Yield~Block+at(Check):Block,
random = ~Genotype,
na.action = na.method(x = "include"),
residual = ~Row:ar1(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:59 2023
## LogLik Sigma2 DF wall cpu
## 1 -1662.870 740546.6 227 11:57:59 0.0
## 2 -1660.921 715343.2 227 11:57:59 0.0
## 3 -1659.143 695081.4 227 11:57:59 0.0
## 4 -1658.444 692113.0 227 11:57:59 0.0
## 5 -1658.300 697117.2 227 11:57:59 0.0
## 6 -1658.293 699444.7 227 11:57:59 0.0
## 7 -1658.292 699926.4 227 11:57:59 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
plot(varioGram(mod3))
Model output
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod3,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])
# variance components
summary(mod3)$varcomp
## component std.error z.ratio bound %ch
## Genotype 1.609634e+05 6.219464e+04 2.588059 P 0.0
## Row:Range!R 6.999264e+05 8.315938e+04 8.416686 P 0.0
## Row:Range!Range!cor 3.313993e-01 6.971045e-02 4.753941 U 0.1
# Traditional definition
(H2.mod3<-vpredict(mod3,H2~V1/(V1+V2)))
## Estimate SE
## H2 0.1869733 12641.45
# Model accuracy
# Estimation via heritability
(Acc.mod3 = sqrt(H2.mod3$Estimate[1]))
## [1] 0.4324041
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod3)$varcomp[1,1]
(Acc.PEV.mod3<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.66369
# Akaike information criterion
aic.mod3 = summary(mod3)$aic[1]
# Bayesian information criterion
bic.mod3 = summary(mod3)$bic[1]
# Model 4
mod4 = asreml(fixed = Yield~Block+at(Check):Block,
random = ~Genotype,
na.action = na.method(x = "include"),
residual = ~ar1(Row):ar1(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:00 2023
## LogLik Sigma2 DF wall cpu
## 1 -1659.945 728475.8 227 11:58:00 0.0
## 2 -1657.932 701226.7 227 11:58:00 0.0
## 3 -1656.075 676716.1 227 11:58:00 0.0
## 4 -1655.177 666701.7 227 11:58:00 0.0
## 5 -1655.115 668546.7 227 11:58:00 0.0
## 6 -1655.111 668926.2 227 11:58:00 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
plot(varioGram(mod4))
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod4,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])
# variance components
summary(mod4)$varcomp
## component std.error z.ratio bound %ch
## Genotype 1.872567e+05 6.418097e+04 2.917635 P 0.3
## Row:Range!R 6.689262e+05 7.892529e+04 8.475436 P 0.0
## Row:Range!Row!cor 2.332094e-01 8.463097e-02 2.755604 U 0.4
## Row:Range!Range!cor 2.752116e-01 7.895243e-02 3.485790 U 0.4
# Traditional definition
(H2.mod4<-vpredict(mod4,H2~V1/(V1+V2)))
## Estimate SE
## H2 0.218711 13486.51
# Model accuracy
# Estimation via heritability
(Acc.mod4 = sqrt(H2.mod4$Estimate[1]))
## [1] 0.4676655
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod4)$varcomp[1,1]
(Acc.PEV.mod4<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.703319
# Akaike information criterion
aic.mod4 = summary(mod4)$aic[1]
# Bayesian information criterion
bic.mod4 = summary(mod4)$bic[1]
# Model 5
mod5 = asreml(fixed = Yield~Block+at(Check):Block,
random = ~Genotype + idv(units),
na.action = na.method(x = "include"),
residual = ~ar1(Row):ar1(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:00 2023
## LogLik Sigma2 DF wall cpu
## 1 -1660.438 668054.9 227 11:58:00 0.0
## 2 -1658.461 357818.0 227 11:58:00 0.0 (2 restrained)
## 3 -1649.911 267626.5 227 11:58:00 0.0 (2 restrained)
## 4 -1644.965 195589.2 227 11:58:00 0.0
## 5 -1644.295 215099.0 227 11:58:00 0.0 (1 restrained)
## 6 -1645.041 311746.8 227 11:58:00 0.0
## 7 -1644.480 177117.6 227 11:58:00 0.0
## 8 -1643.940 283254.1 227 11:58:00 0.0
## 9 -1644.023 247082.3 227 11:58:00 0.0
## 10 -1644.743 311395.6 227 11:58:00 0.0
## 11 -1644.294 191800.4 227 11:58:00 0.0
## 12 -1643.950 282599.9 227 11:58:00 0.0
## 13 -1644.052 242821.2 227 11:58:00 0.0
## Warning in asreml(fixed = Yield ~ Block + at(Check):Block, random = ~Genotype +
## : Oscillating parameter(s) reset to average value (iteration 13).
## Warning in asreml(fixed = Yield ~ Block + at(Check):Block, random = ~Genotype +
## : Log-likelihood not converged
## Warning in asreml(fixed = Yield ~ Block + at(Check):Block, random = ~Genotype +
## : Some components changed by more than 1% on the last iteration.
## Terms with zero df listed in attribute 'zerodf' of the wald table.
plot(varioGram(mod5))
Model output
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod5,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])
# variance components
summary(mod5)$varcomp
## component std.error z.ratio bound %ch
## Genotype 9.878827e+04 5.617063e+04 1.758718 P 21.0
## units!units 3.797216e+05 8.308721e+04 4.570157 P 19.3
## Row:Range!R 2.428212e+05 1.085612e+05 2.236722 P 0.0
## Row:Range!Row!cor 8.124105e-01 2.067444e-01 3.929542 U 12.3
## Row:Range!Range!cor 8.904518e-01 6.925935e-02 12.856775 U 0.9
# Traditional definition
(H2.mod5<-vpredict(mod5,H2~V1/(V1+V2+V3)))
## Estimate SE
## H2 0.1369527 5004.948
# Model accuracy
# Estimation via heritability
(Acc.mod5 = sqrt(H2.mod5$Estimate[1]))
## [1] 0.3700713
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod5)$varcomp[1,1]
(Acc.PEV.mod5<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.4870271
# Akaike information criterion
aic.mod5 = summary(mod5)$aic[1]
# Bayesian information criterion
bic.mod5 = summary(mod5)$bic[1]
h2 = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
h2 = c(H2.mod1$Estimate[1],H2.mod2$Estimate[1],H2.mod3$Estimate[1],H2.mod4$Estimate[1],H2.mod5$Estimate[1]))
h2
## Model h2
## 1 Mod1 0.1850445
## 2 Mod2 0.2064338
## 3 Mod3 0.1869733
## 4 Mod4 0.2187110
## 5 Mod5 0.1369527
Acc.h2 = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
Acc.h2 = c(Acc.mod1,Acc.mod2,Acc.mod3,Acc.mod4,Acc.mod5))
Acc.h2
## Model Acc.h2
## 1 Mod1 0.4301680
## 2 Mod2 0.4543498
## 3 Mod3 0.4324041
## 4 Mod4 0.4676655
## 5 Mod5 0.3700713
Acc.PEV = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
Acc.PEV = c(Acc.PEV.mod1,Acc.PEV.mod2,Acc.PEV.mod3,Acc.PEV.mod4,Acc.PEV.mod5))
Acc.PEV
## Model Acc.PEV
## 1 Mod1 0.6286996
## 2 Mod2 0.6797007
## 3 Mod3 0.6636900
## 4 Mod4 0.7033190
## 5 Mod5 0.4870271
mod.AIC = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
AIC = c(aic.mod1,aic.mod2,aic.mod3,aic.mod4,aic.mod5))
mod.AIC
## Model AIC
## 1 Mod1 3335.863
## 2 Mod2 3325.295
## 3 Mod3 3322.585
## 4 Mod4 3318.222
## 5 Mod5 3298.103
mod.BIC = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
BIC = c(bic.mod1,bic.mod2,bic.mod3,bic.mod4,bic.mod5))
mod.BIC
## Model BIC
## 1 Mod1 3342.713
## 2 Mod2 3335.570
## 3 Mod3 3332.860
## 4 Mod4 3331.921
## 5 Mod5 3315.228
out = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
h2 = h2$h2,
Acc.h2 = Acc.h2$Acc.h2,
Acc.PEV = Acc.PEV$Acc.PEV,
AIC = mod.AIC$AIC,
BIC = mod.BIC$BIC)
out
## Model h2 Acc.h2 Acc.PEV AIC BIC
## 1 Mod1 0.1850445 0.4301680 0.6286996 3335.863 3342.713
## 2 Mod2 0.2064338 0.4543498 0.6797007 3325.295 3335.570
## 3 Mod3 0.1869733 0.4324041 0.6636900 3322.585 3332.860
## 4 Mod4 0.2187110 0.4676655 0.7033190 3318.222 3331.921
## 5 Mod5 0.1369527 0.3700713 0.4870271 3298.103 3315.228
Based on AIC and BIC from those models, we can conclude that the goodness-of-fit came from the model 4, where we modeled AR1 for rows and AR1 for columns. The following comparisons will be based on this model.
After the modelling of the effects of spatial correlated errors, we can include the information of row and range as an covariate in the model using the function \(lin()\). In this case, we are accounting for an single linear trend for range and another for row effect. In addition, we check the possibility of inclusion of the row and range effects as random effects in the model. We described the models below:
Model | Global/Extraneous | Natural | Random parameter | Fixed parameter |
---|---|---|---|---|
Mod5 | \(\sigma^2_{π}β_r(πΊ_r)ββ_c(πΊ_c)\) | 4 | 21 | |
Mod6 | \(\beta_r Γ R\) | \(\sigma^2_{π}β_r(πΊ_r)ββ_c(πΊ_c)\) | 4 | 22 |
Mod7 | \(\beta_r Γ R,\beta_c Γ C\) | \(\sigma^2_{π}β_r(πΊ_r)ββ_c(πΊ_c)\) | 4 | 23 |
Mod8 | \(\beta_r Γ R,\beta_c Γ C, \sigma^2_rI_r\) | \(\sigma^2_{π}β_r(πΊ_r)ββ_c(πΊ_c)\) | 5 | 23 |
Mod9 | \(\beta_r Γ R,\beta_c Γ C, \sigma^2_rI_r, \sigma^2_cI_c\) | \(\sigma^2_{π}β_r(πΊ_r)ββ_c(πΊ_c)\) | 6 | 23 |
For the models above mentioned, we face an specific situation where the number of fixed effects vary among models. In this case, we are not allowed to use the AIC/BIC, as proposed. For such, we can incorporate a correction proposed by Verbyla (2019). With this correction, we will be able to compare those models. The calculation of the information criteria is an adaption of the code supplied in File S1 of Verbyla (2019). The log-likelihood is calculated as:
\[ {loglik = log(REML) - log(|C|)/2}, \]
where C is log.determinant \((LogDet((X'H^{-1X})^{-1}))\) .
The AIC and BIC are calculated as:
\[ AIC = {- 2 * loglik + 2 * (rand.par + fixed.par)} \] \[ BIC = {- 2 * loglik + (fixed.par + rand.par) * log(n - r + fixed.par)}, \]
where rand.par represents the number of variances estimated by the full model , fixed.par represents the fixed degrees of freedom estimated by the full model, n is the number of observations, and r is the rank of the fixed effects design matrix.
For comparison purposes, we will borrow the best residual model from the previous step to be the baseline structure for the next step. Therefore, we use the Model 4 and we will fit it again to calculate the corrected AIC/BIC.
# Model
mod4 = asreml(fixed = Yield~Block+at(Check):Block,
random = ~Genotype,
na.action = na.method(x = "include"),
residual = ~ar1(Row):ar1(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:00 2023
## LogLik Sigma2 DF wall cpu
## 1 -1659.945 728475.8 227 11:58:00 0.0
## 2 -1657.932 701226.7 227 11:58:00 0.0
## 3 -1656.075 676716.1 227 11:58:00 0.0
## 4 -1655.177 666701.7 227 11:58:00 0.0
## 5 -1655.115 668546.7 227 11:58:00 0.0
## 6 -1655.111 668926.2 227 11:58:00 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod4,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])
# variance components
summary(mod4)$varcomp
## component std.error z.ratio bound %ch
## Genotype 1.872567e+05 6.418097e+04 2.917635 P 0.3
## Row:Range!R 6.689262e+05 7.892529e+04 8.475436 P 0.0
## Row:Range!Row!cor 2.332094e-01 8.463097e-02 2.755604 U 0.4
## Row:Range!Range!cor 2.752116e-01 7.895243e-02 3.485790 U 0.4
# Traditional definition
(H2.mod4<-vpredict(mod4,H2~V1/(V1+V2)))
## Estimate SE
## H2 0.218711 13486.51
# Model accuracy
# Estimation via heritability
(Acc.mod4 = sqrt(H2.mod4$Estimate[1]))
## [1] 0.4676655
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod4)$varcomp[1,1]
(Acc.PEV.mod4<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.703319
# Information criteria (Based on Verbyla (2019))
log.det = 267.6646
logL = summary(mod4)$loglik
rand.par = 4
fixed.par = 21
df = summary(mod4)$nedf
# Akaike information criterion
(aic.mod4 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))
## [1] 3627.886
# Bayesian information criterion
(bic.mod4 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))
## [1] 3713.51
# Model 6
mod6 = asreml(fixed = Yield~Block+at(Check):Block+lin(Row),
random = ~Genotype,
na.action = na.method(x = "include"),
residual = ~ar1(Row):ar1(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:00 2023
## LogLik Sigma2 DF wall cpu
## 1 -1643.773 650168.6 226 11:58:00 0.0
## 2 -1642.842 624438.2 226 11:58:00 0.0
## 3 -1642.093 596251.2 226 11:58:00 0.0
## 4 -1641.828 575033.8 226 11:58:01 0.0
## 5 -1641.822 573771.8 226 11:58:01 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod6,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])
# variance components
summary(mod6)$varcomp
## component std.error z.ratio bound %ch
## Genotype 1.668735e+05 6.017612e+04 2.773085 P 0.3
## Row:Range!R 5.737718e+05 6.697486e+04 8.566973 P 0.0
## Row:Range!Row!cor 1.382740e-01 8.777011e-02 1.575411 U 0.5
## Row:Range!Range!cor 1.575960e-01 8.350556e-02 1.887251 U 0.7
# Traditional definition
(H2.mod6<-vpredict(mod6,H2~V1/(V1+V2)))
## Estimate SE
## H2 0.2253082 11690.13
# Model accuracy
# Estimation via heritability
(Acc.mod6 = sqrt(H2.mod6$Estimate[1]))
## [1] 0.4746664
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod6)$varcomp[1,1]
(Acc.PEV.mod6<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.686594
# Information criterion (Based on Verbyla (2019))
log.det = 271.634
logL = summary(mod6)$loglik
rand.par = 4
fixed.par = 22
df = summary(mod6)$nedf
# Akaike information criterion
(aic.mod6 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))
## [1] 3607.278
# Bayesian information criterion
(bic.mod6 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))
## [1] 3696.212
# Model 7
mod7 = asreml(fixed = Yield~Block+at(Check):Block+lin(Row)+lin(Range),
random = ~Genotype,
na.action = na.method(x = "include"),
residual = ~ar1(Row):ar1(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:01 2023
## LogLik Sigma2 DF wall cpu
## 1 -1640.058 652210.0 225 11:58:01 0.0
## 2 -1639.190 627645.7 225 11:58:01 0.0
## 3 -1638.481 600543.0 225 11:58:01 0.0
## 4 -1638.222 579934.4 225 11:58:01 0.0
## 5 -1638.215 578493.7 225 11:58:01 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod7,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])
# variance components
summary(mod7)$varcomp
## component std.error z.ratio bound %ch
## Genotype 1.654901e+05 6.062632e+04 2.729674 P 0.3
## Row:Range!R 5.784937e+05 6.790972e+04 8.518570 P 0.0
## Row:Range!Row!cor 1.448583e-01 8.778909e-02 1.650072 U 0.6
## Row:Range!Range!cor 1.571447e-01 8.361204e-02 1.879450 U 0.7
# Traditional definition
(H2.mod7<-vpredict(mod7,H2~V1/(V1+V2)))
## Estimate SE
## H2 0.2224378 11745.65
# Model accuracy
# Estimation via heritability
(Acc.mod7 = sqrt(H2.mod7$Estimate[1]))
## [1] 0.4716331
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod7)$varcomp[1,1]
(Acc.PEV.mod7<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.683342
# Information criterion
log.det = 278.967179871277
logL = summary(mod7)$loglik
rand.par = 4
fixed.par = 23
df = summary(mod7)$nedf
# Akaike information criterion
(aic.mod7 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))
## [1] 3609.397
# Bayesian information criterion
(bic.mod7 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))
## [1] 3701.631
# Model
mod8 = asreml(fixed = Yield~Block+at(Check):Block+lin(Row)+lin(Range),
random = ~Genotype + Row,
na.action = na.method(x = "include"),
residual = ~ar1(Row):ar1(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:01 2023
## LogLik Sigma2 DF wall cpu
## 1 -1641.575 631771.2 225 11:58:01 0.0
## 2 -1639.503 616049.3 225 11:58:01 0.0
## 3 -1638.490 595863.3 225 11:58:01 0.0
## 4 -1638.221 578956.6 225 11:58:01 0.0
## 5 -1638.214 577575.4 225 11:58:01 0.0
## Warning in asreml(fixed = Yield ~ Block + at(Check):Block + lin(Row) +
## lin(Range), : Some components changed by more than 1% on the last iteration.
## Terms with zero df listed in attribute 'zerodf' of the wald table.
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod8,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[15:(78+14),])
# variance components
summary(mod8)$varcomp
## component std.error z.ratio bound %ch
## Row 8.681562e+02 2.214727e+04 0.03919924 P 5.0
## Genotype 1.656823e+05 6.070327e+04 2.72938047 P 0.3
## Row:Range!R 5.775754e+05 7.103582e+04 8.13076316 P 0.0
## Row:Range!Row!cor 1.452191e-01 8.887367e-02 1.63399467 U 0.5
## Row:Range!Range!cor 1.558451e-01 8.970111e-02 1.73738240 U 0.7
# Traditional definition
(H2.mod8<-vpredict(mod8,H2~V2/(V1+V2+V3)))
## Estimate SE
## H2 0.2226536 12276.39
# Model accuracy
# Estimation via heritability
(Acc.mod8 = sqrt(H2.mod8$Estimate[1]))
## [1] 0.4718619
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod8)$varcomp[2,1]
(Acc.PEV.mod8<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6836402
# Information criterion
log.det = 278.9775
logL = summary(mod8)$loglik
rand.par = 5
fixed.par = 23
df = summary(mod8)$nedf
# Akaike information criterion
(aic.mod8 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))
## [1] 3611.405
# Bayesian information criterion
(bic.mod8 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))
## [1] 3707.056
# Model
mod9 = asreml(fixed = Yield~Block+at(Check):Block+lin(Row)+lin(Range),
random = ~Genotype + Row + Range,
na.action = na.method(x = "include"),
residual = ~ar1(Row):ar1(Range),
data = data)
## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:01 2023
## LogLik Sigma2 DF wall cpu
## 1 -1644.918 622375.5 225 11:58:01 0.0 (1 restrained)
## 2 -1639.723 613377.8 225 11:58:01 0.0 (1 restrained)
## 3 -1638.487 594908.4 225 11:58:01 0.0 (1 restrained)
## 4 -1638.221 578788.0 225 11:58:01 0.0 (1 restrained)
## 5 -1638.214 577570.4 225 11:58:01 0.0 (1 restrained)
## 6 -1638.214 577513.7 225 11:58:01 0.0
## Terms with zero df listed in attribute 'zerodf' of the wald table.
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod9,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[33:110,])
# variance components
summary(mod9)$varcomp
## component std.error z.ratio bound %ch
## Row 8.621175e+02 2.216525e+04 0.03889501 P 0.6
## Range 5.844029e-02 NA NA B 0.0
## Genotype 1.657439e+05 6.078922e+04 2.72653512 P 0.0
## Row:Range!R 5.775137e+05 7.105233e+04 8.12800575 P 0.0
## Row:Range!Row!cor 1.453659e-01 8.890685e-02 1.63503588 U 0.1
## Row:Range!Range!cor 1.560400e-01 8.973279e-02 1.73894093 U 0.1
# Traditional definition
(H2.mod9<-vpredict(mod9,H2~V3/(V1+V2+V3+V4)))
## Estimate SE
## H2 0.2227382 12282.7
# Model accuracy
# Estimation via heritability
(Acc.mod9 = sqrt(H2.mod9$Estimate[1]))
## [1] 0.4719515
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod9)$varcomp[3,1]
(Acc.PEV.mod9<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6832604
# Information criterion
log.det = 278.97809
logL = summary(mod9)$loglik
rand.par = 6
fixed.par = 23
df = summary(mod9)$nedf
# Akaike information criterion
(aic.mod9 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))
## [1] 3613.406
# Bayesian information criterion
(bic.mod9 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))
## [1] 3712.473
h2 = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
h2 = c(H2.mod4$Estimate[1],H2.mod6$Estimate[1],H2.mod7$Estimate[1],
H2.mod8$Estimate[1],H2.mod9$Estimate[1]))
h2
## Model h2
## 1 Mod4 0.2187110
## 2 Mod6 0.2253082
## 3 Mod7 0.2224378
## 4 Mod8 0.2226536
## 5 Mod9 0.2227382
Acc.h2 = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
Acc.h2 = c(Acc.mod4,Acc.mod6,Acc.mod7,Acc.mod8,Acc.mod9))
Acc.h2
## Model Acc.h2
## 1 Mod4 0.4676655
## 2 Mod6 0.4746664
## 3 Mod7 0.4716331
## 4 Mod8 0.4718619
## 5 Mod9 0.4719515
Acc.PEV = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
Acc.PEV = c(Acc.PEV.mod4,Acc.PEV.mod6,Acc.PEV.mod7,Acc.PEV.mod8,Acc.PEV.mod9))
Acc.PEV
## Model Acc.PEV
## 1 Mod4 0.7033190
## 2 Mod6 0.6865940
## 3 Mod7 0.6833420
## 4 Mod8 0.6836402
## 5 Mod9 0.6832604
mod.AIC = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
AIC = c(aic.mod4,aic.mod6,aic.mod7,aic.mod8,aic.mod9))
mod.AIC
## Model AIC
## 1 Mod4 3627.886
## 2 Mod6 3607.278
## 3 Mod7 3609.397
## 4 Mod8 3611.405
## 5 Mod9 3613.406
mod.BIC = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
BIC = c(bic.mod4,bic.mod6,bic.mod7,bic.mod8,bic.mod9))
mod.BIC
## Model BIC
## 1 Mod4 3713.510
## 2 Mod6 3696.212
## 3 Mod7 3701.631
## 4 Mod8 3707.056
## 5 Mod9 3712.473
output = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
h2 = round(h2$h2,3),
Acc.h2 = round(Acc.h2$Acc.h2,3),
Acc.PEV = round(Acc.PEV$Acc.PEV,3),
AIC = mod.AIC$AIC,
BIC = mod.BIC$BIC)
output
## Model h2 Acc.h2 Acc.PEV AIC BIC
## 1 Mod4 0.219 0.468 0.703 3627.886 3713.510
## 2 Mod6 0.225 0.475 0.687 3607.278 3696.212
## 3 Mod7 0.222 0.472 0.683 3609.397 3701.631
## 4 Mod8 0.223 0.472 0.684 3611.405 3707.056
## 5 Mod9 0.223 0.472 0.683 3613.406 3712.473
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716β723. https://doi.org/10.1109/TAC.1974.1100705
Coelho, I., Peixoto, M. A., Marcal, T. D. S., Bernardeli, A., Alves, R., Lima, R. O., & Bhering, L. L. (2021). Accounting for spatial trends in multi-environment diallel analysis in maize breeding. PloS one, 16(10), e0258473.
Cullis, B.R., A.B. Smith, and N.E. Coombes. 2006. On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11(4): 381β393. doi: 10.1198/108571106X15444
Gilmour, A.R.,Cullis, B.R. & Verbyla, A.P. (1997). Accounting for natural and extraneous variationin the analysis of field experiments.Journal of Agricultural, Biological and Environmental Statistics2,269β293
Isik et al., 2017. Genetic Data Analysis for Plant and Animal Breeding.
Mrode, R. A. (2014). Linear models for the prediction of animal breeding values. Cabi.
Schwarz G. Estimating the dimension of a model. Ann Stat. 1978; 6: 461β464.
Verbyla, A. P. (2019). A note on model selection using information criteria for general linear models estimated using REML. Australian & New Zealand Journal of Statistics, 61, 39β50. doi: 10.1111/anzs.12254.
Wilks SS. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Ann Math Stat. 1938; 9: 60β62. https://doi.org/10.1214/aoms/1177732360