1. Introduction

Spatial analyses represents an optimal tool for post-hoc correction of field trials information. Heterogeneous factors that were not full corrected by the plotting techniques (i.e. soil fertility, diseases, and water holding capacity) can cause some disturbance in the prediction of BLUP values and, at the end, in the selection of the genotypes.

Below, we implement a set of models that is known as spatial analyses. In addition to demonstrate each step of the analysis, we also cover several parameters of interest for estimation/prediction purposes.

2. How to measure the best fit of a model to the data

The goodness-of-fit of a model to the data can be measured by different parameters. Below, we describe the most useful for spatial analyses. It is worth to mentioning that the spatial analyses’ objective is to select the most accurately and parsimoniously model model that captures the spatial effects in the field. This will provides the most acurate and precise estimation of variance components and prediction of the genotypes values (Isik et al. 2018)

Heritability (\(h^2\))

The heritability represents how much of the phenotypic variation came from a genetic source. We can measure it based on the the variance components estimated by the model. The simplest representation of heritability (\(h^2\)) follows the equation:

\[ h^2 = \frac{\sigma^2_g}{\sigma^2_g + \sigma^2_{res}} \]

where \(\sigma^2_g\) is represents the genotypic variance and \(\sigma^2_{res}\) represents the residual variance.

Prediction accuracy (\(r_{\tilde{g}g}\))

The prediction accuracy represents a measure of how close a prediction is to the real genotypic value of a genotype for a target trait. It can be estimated by:

\[ r_{\tilde{g}g} = cor(predicted.value,real.value) \]

One simple way to get an estimation of accuracy is using the square root of the heritability:

\[ r_{\tilde{g}g} = {\sqrt{h^2}} \]

In mixed models, one alternative is to estimated the heritability is to use the prediction error variance (PEV) from the prediction. It can be easily estimated by:

\[ PEV = std.error^2 \]

where the \(std.error\) represents the standard error from the genotypes predictions. After, we can use the following equation (Mrode, 2014):

\[ r_{\tilde{g}g} = \sqrt{1-\frac{PEV}{\sigma^2_g}} \]

The accuracy values can vary from 0 to 1. Values close to 1 represents the best accuracy from a prediction of a target trait.

LRT test

The likelihood information test (LRT - Wilks, 1938) is indicated to compared models. However, the comparison take into account that the models share the same fixed effects and one model (reduced model) has a subset of the random effects of the other model (full model). The LRT formulae is given as follow:

\[ LRT = -2(logL_{full}-logL_{reduced}) \] where \(logL_{full}\) represents the log likelihood of the full model (i.e. all random effects), and \(logL_{reduced}\) represents the log likelihood of the reduced model (i.e. missing one random effect). The difference in-between models should be higher than 3.84 (\(\chi^2\) test, with 5% of probability and 1 degree of freedom). At this level (higher than 3.84) we consider that we have enough evidence to reject the null hypothesis, where the models are similar.

Information criteria (AIC and BIC)

The most indicated parameters to measure the goodness-of-fit of one model to the data are Akaike Information Criteria (AIC - Akaike, 1974) and Schwarz’s Bayesian Information Criteria (BIC - Schwarz, 1978). They are indicated for models that are not nested and share the same fixed effects. The formulae definition for AIC and BIC are the following:

\[ AIC = -2*logL + 2t \]

\[ BIC = -2*logL + 2t*log(v) \]

where LogL represents the log likelihood of a model, \(t\) represents the number of covariance parameters in the model, \(v\) is the residual degrees of freedom, where \(v = n-p\), being \(n\) the number of observations and \(p\) the number of parameters in fixed effect factors. If the difference in-between two models is superior to 2, we can consider that the model with lower value of AIC/BIC is the best fit model to the data. In addition, if the difference is lower than 2, we should choose the most parsimoniously model, in other words, the model with lower number of parameters.

3. Dataset

Packages

require(asreml)

## Online License checked out Thu Oct 26 11:57:57 2023

require(dplyr)

## Warning: package 'dplyr' was built under R version 4.2.3

rm(list=ls())

Dataset

data = read.table("../data/data_spatial.txt", h=TRUE)

This dataset refers to an evaluation of 78 maize genotypes, in one location, and the trait measured was Yield.

Factor	Levels
Location	One
Genotypes	78
Checks	6
Blocks	3
Ranges	18
Rows	14

 # Setting as factors
 
 data$Row = data$Row %>% as.factor()
 data$Range = data$Range %>% as.factor()
 data$Block = data$Block %>% as.factor()
 data$Genotype = data$Genotype %>% as.factor()
 data$Check  = data$Check %>% as.factor()

Exploring the data set by graphs

boxplot(Yield~Block,
data=data,
main="Trait performance over Blocks",
xlab="Block Number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)

Exploring the data set by graphs

boxplot(Yield~Range,
data=data,
main="Trait performance over ranges",
xlab="Range Number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)

Exploring the data set by graphs

boxplot(Yield~Row,
data=data,
main="Trait performance over rows",
xlab="Row number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)

4. Model implementation

For the spatial analyses we list in table below the possible models for spatial analyses. We implemented each one of the models below. We used the before mentioned parameters to choose the best model (AIC, BIC, accuracy, heritability). The given model is as follow:

\[ y = Xb + Zg + e \]

where \(y\) is the vector of phenotypic data; \(b\) is the vector of block-check combination (assumed to be fixed), which comprises the effects of block and checks within the blocks; \(g\) is the vector of genotype effects (assumed to be random), where \(g~N(0,\sigma^2_g)\); and \(e\) is the vector of residuals (random), where \(e~N(0,\sigma^2_{res})\). The letters \(X\) and \(Z\) refers to the incidence matrices for \(b\) and \(g\), respectively.

Models fitted for spatial analyses
Model	Action	Fitted model for residual
Mod1	Only residual effect	\(\sigma^2_{res}I_r⊗I_c\)
Mod2	Includes AR1 for row	\(\sigma^2_{𝜉}∑_r(𝞺_r)⊗I_c\)
Mod3	Includes AR1 for range	\(\sigma^2_{𝜉}I_r⊗∑_c(𝞺_c)\)
Mod4	Includes AR1 for row/range	\(\sigma^2_{𝜉}∑_r(𝞺_r)⊗∑_c(𝞺_c)\)
Mod5	Includes AR1 for row/range and nugget effect	\(\sigma^2_{𝜉}∑_r(𝞺_r)⊗∑_c(𝞺_c), \sigma^2_𝜼\)

Model 1: No spatial structures

# Toy model
fm = asreml(fixed = Yield ~ Block + at(Check):Block,
              random = ~ Genotype,
              na.action = na.method(x = "include"),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:58 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1666.765      759353.7    227 11:57:58    0.0
##  2     -1666.345      737700.9    227 11:57:58    0.0
##  3     -1666.026      712706.0    227 11:57:58    0.0
##  4     -1665.932      692796.8    227 11:57:58    0.0
##  5     -1665.931      691635.9    227 11:57:58    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

fm1 = asreml(fixed = Yield ~ Block + Check:Block,
              random = ~ Genotype,
              na.action = na.method(x = "include"),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:58 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1666.765      759353.7    227 11:57:58    0.0
##  2     -1666.345      737700.9    227 11:57:58    0.0
##  3     -1666.026      712706.0    227 11:57:58    0.0
##  4     -1665.932      692796.8    227 11:57:58    0.0
##  5     -1665.931      691635.9    227 11:57:58    0.0

summary(fm)$varcomp

##          component std.error  z.ratio bound %ch
## Genotype  157043.4  68856.25 2.280743     P   0
## units!R   691635.9  79886.29 8.657755     P   0

plot(fm)

# Model 1 Same model, but with structure
mod1 = asreml(fixed = Yield~Block+at(Check):Block,
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~id(Row):id(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:59 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1666.765      759353.7    227 11:57:59    0.0
##  2     -1666.345      737700.9    227 11:57:59    0.0
##  3     -1666.026      712706.0    227 11:57:59    0.0
##  4     -1665.932      692796.8    227 11:57:59    0.0
##  5     -1665.931      691635.9    227 11:57:59    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

summary(mod1)$varcomp

##             component std.error  z.ratio bound %ch
## Genotype     157043.4  68856.25 2.280743     P   0
## Row:Range!R  691635.9  79886.29 8.657755     P   0

plot(varioGram(mod1))

Model outcomes

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod1,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod1)$varcomp

##             component std.error  z.ratio bound %ch
## Genotype     157043.4  68856.25 2.280743     P   0
## Row:Range!R  691635.9  79886.29 8.657755     P   0

# Traditional definition
(H2.mod1<-vpredict(mod1,H2~V1/(V1+V2)))

##     Estimate       SE
## H2 0.1850445 12047.14

# Model accuracy

# Estimation via heritability
(Acc.mod1 = sqrt(H2.mod1$Estimate[1]))

## [1] 0.430168

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod1)$varcomp[1,1]
(Acc.PEV.mod1<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.6286996

# Akaike information criterion
aic.mod1 = summary(mod1)$aic[1]

# Bayesian information criterion
bic.mod1 = summary(mod1)$bic[1]

Model 2: Inclusion of AR1 for row

# Model 2 - Trends in Rows
mod2 = asreml(fixed = Yield~Block+at(Check):Block,
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):id(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:59 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1663.337      743638.1    227 11:57:59    0.0
##  2     -1661.662      717589.6    227 11:57:59    0.0
##  3     -1660.231      694389.3    227 11:57:59    0.0
##  4     -1659.666      684226.9    227 11:57:59    0.0
##  5     -1659.648      684909.9    227 11:57:59    0.0
##  6     -1659.648      685030.1    227 11:57:59    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

plot(varioGram.asreml(mod2))

Model outcomes

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod2,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod2)$varcomp

##                      component    std.error  z.ratio bound %ch
## Genotype          1.781998e+05 6.653420e+04 2.678319     P 0.1
## Row:Range!R       6.850301e+05 8.125227e+04 8.430904     P 0.0
## Row:Range!Row!cor 3.016325e-01 7.659184e-02 3.938181     U 0.1

# Traditional definition
(H2.mod2<-vpredict(mod2,H2~V1/(V1+V2)))

##     Estimate       SE
## H2 0.2064338 13310.69

# Model accuracy

# Estimation via heritability
(Acc.mod2 = sqrt(H2.mod2$Estimate[1]))

## [1] 0.4543498

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod2)$varcomp[1,1]
(Acc.PEV.mod2<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.6797007

# Akaike information criterion
aic.mod2 = summary(mod2)$aic[1]

# Bayesian information criterion
bic.mod2 = summary(mod2)$bic[1]

Model 3: inclusion of AR1 for range

# Model 3
mod3 = asreml(fixed = Yield~Block+at(Check):Block,
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~Row:ar1(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:57:59 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1662.870      740546.6    227 11:57:59    0.0
##  2     -1660.921      715343.2    227 11:57:59    0.0
##  3     -1659.143      695081.4    227 11:57:59    0.0
##  4     -1658.444      692113.0    227 11:57:59    0.0
##  5     -1658.300      697117.2    227 11:57:59    0.0
##  6     -1658.293      699444.7    227 11:57:59    0.0
##  7     -1658.292      699926.4    227 11:57:59    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

plot(varioGram(mod3))

Model output

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod3,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod3)$varcomp

##                        component    std.error  z.ratio bound %ch
## Genotype            1.609634e+05 6.219464e+04 2.588059     P 0.0
## Row:Range!R         6.999264e+05 8.315938e+04 8.416686     P 0.0
## Row:Range!Range!cor 3.313993e-01 6.971045e-02 4.753941     U 0.1

# Traditional definition
(H2.mod3<-vpredict(mod3,H2~V1/(V1+V2)))

##     Estimate       SE
## H2 0.1869733 12641.45

# Model accuracy

# Estimation via heritability
(Acc.mod3 = sqrt(H2.mod3$Estimate[1]))

## [1] 0.4324041

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod3)$varcomp[1,1]
(Acc.PEV.mod3<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.66369

# Akaike information criterion
aic.mod3 = summary(mod3)$aic[1]

# Bayesian information criterion
bic.mod3 = summary(mod3)$bic[1]

Model 4: Inclusion of AR1 for row and range

# Model 4
mod4 = asreml(fixed = Yield~Block+at(Check):Block,
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:00 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1659.945      728475.8    227 11:58:00    0.0
##  2     -1657.932      701226.7    227 11:58:00    0.0
##  3     -1656.075      676716.1    227 11:58:00    0.0
##  4     -1655.177      666701.7    227 11:58:00    0.0
##  5     -1655.115      668546.7    227 11:58:00    0.0
##  6     -1655.111      668926.2    227 11:58:00    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

plot(varioGram(mod4))

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod4,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod4)$varcomp

##                        component    std.error  z.ratio bound %ch
## Genotype            1.872567e+05 6.418097e+04 2.917635     P 0.3
## Row:Range!R         6.689262e+05 7.892529e+04 8.475436     P 0.0
## Row:Range!Row!cor   2.332094e-01 8.463097e-02 2.755604     U 0.4
## Row:Range!Range!cor 2.752116e-01 7.895243e-02 3.485790     U 0.4

# Traditional definition
(H2.mod4<-vpredict(mod4,H2~V1/(V1+V2)))

##    Estimate       SE
## H2 0.218711 13486.51

# Model accuracy

# Estimation via heritability
(Acc.mod4 = sqrt(H2.mod4$Estimate[1]))

## [1] 0.4676655

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod4)$varcomp[1,1]
(Acc.PEV.mod4<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.703319

# Akaike information criterion
aic.mod4 = summary(mod4)$aic[1]

# Bayesian information criterion
bic.mod4 = summary(mod4)$bic[1]

Model 5: Inclusion of nugget effect

# Model 5
mod5 = asreml(fixed = Yield~Block+at(Check):Block,
              random = ~Genotype + idv(units),
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:00 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1660.438      668054.9    227 11:58:00    0.0
##  2     -1658.461      357818.0    227 11:58:00    0.0 (2 restrained)
##  3     -1649.911      267626.5    227 11:58:00    0.0 (2 restrained)
##  4     -1644.965      195589.2    227 11:58:00    0.0
##  5     -1644.295      215099.0    227 11:58:00    0.0 (1 restrained)
##  6     -1645.041      311746.8    227 11:58:00    0.0
##  7     -1644.480      177117.6    227 11:58:00    0.0
##  8     -1643.940      283254.1    227 11:58:00    0.0
##  9     -1644.023      247082.3    227 11:58:00    0.0
## 10     -1644.743      311395.6    227 11:58:00    0.0
## 11     -1644.294      191800.4    227 11:58:00    0.0
## 12     -1643.950      282599.9    227 11:58:00    0.0
## 13     -1644.052      242821.2    227 11:58:00    0.0

## Warning in asreml(fixed = Yield ~ Block + at(Check):Block, random = ~Genotype +
## : Oscillating parameter(s) reset to average value (iteration 13).

## Warning in asreml(fixed = Yield ~ Block + at(Check):Block, random = ~Genotype +
## : Log-likelihood not converged

## Warning in asreml(fixed = Yield ~ Block + at(Check):Block, random = ~Genotype +
## : Some components changed by more than 1% on the last iteration.

## Terms with zero df listed in attribute 'zerodf' of the wald table.

plot(varioGram(mod5))

Model output

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod5,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod5)$varcomp

##                        component    std.error   z.ratio bound  %ch
## Genotype            9.878827e+04 5.617063e+04  1.758718     P 21.0
## units!units         3.797216e+05 8.308721e+04  4.570157     P 19.3
## Row:Range!R         2.428212e+05 1.085612e+05  2.236722     P  0.0
## Row:Range!Row!cor   8.124105e-01 2.067444e-01  3.929542     U 12.3
## Row:Range!Range!cor 8.904518e-01 6.925935e-02 12.856775     U  0.9

# Traditional definition
(H2.mod5<-vpredict(mod5,H2~V1/(V1+V2+V3)))

##     Estimate       SE
## H2 0.1369527 5004.948

# Model accuracy

# Estimation via heritability
(Acc.mod5 = sqrt(H2.mod5$Estimate[1]))

## [1] 0.3700713

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod5)$varcomp[1,1]
(Acc.PEV.mod5<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.4870271

# Akaike information criterion
aic.mod5 = summary(mod5)$aic[1]

# Bayesian information criterion
bic.mod5 = summary(mod5)$bic[1]

5. Model comparision

Heritabilities

h2 = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                h2 = c(H2.mod1$Estimate[1],H2.mod2$Estimate[1],H2.mod3$Estimate[1],H2.mod4$Estimate[1],H2.mod5$Estimate[1]))

h2

##   Model        h2
## 1  Mod1 0.1850445
## 2  Mod2 0.2064338
## 3  Mod3 0.1869733
## 4  Mod4 0.2187110
## 5  Mod5 0.1369527

Accuracy via heritability

Acc.h2 = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                Acc.h2 = c(Acc.mod1,Acc.mod2,Acc.mod3,Acc.mod4,Acc.mod5))

Acc.h2

##   Model    Acc.h2
## 1  Mod1 0.4301680
## 2  Mod2 0.4543498
## 3  Mod3 0.4324041
## 4  Mod4 0.4676655
## 5  Mod5 0.3700713

Accuracy via PEV

Acc.PEV = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                Acc.PEV = c(Acc.PEV.mod1,Acc.PEV.mod2,Acc.PEV.mod3,Acc.PEV.mod4,Acc.PEV.mod5))

Acc.PEV

##   Model   Acc.PEV
## 1  Mod1 0.6286996
## 2  Mod2 0.6797007
## 3  Mod3 0.6636900
## 4  Mod4 0.7033190
## 5  Mod5 0.4870271

AIC

mod.AIC = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                AIC = c(aic.mod1,aic.mod2,aic.mod3,aic.mod4,aic.mod5))

mod.AIC

##   Model      AIC
## 1  Mod1 3335.863
## 2  Mod2 3325.295
## 3  Mod3 3322.585
## 4  Mod4 3318.222
## 5  Mod5 3298.103

BIC

mod.BIC = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                BIC = c(bic.mod1,bic.mod2,bic.mod3,bic.mod4,bic.mod5))

mod.BIC

##   Model      BIC
## 1  Mod1 3342.713
## 2  Mod2 3335.570
## 3  Mod3 3332.860
## 4  Mod4 3331.921
## 5  Mod5 3315.228

All information for the residual effect modelling

out = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                 h2 = h2$h2,
                 Acc.h2 = Acc.h2$Acc.h2,
                 Acc.PEV = Acc.PEV$Acc.PEV,
                 AIC = mod.AIC$AIC,
                 BIC = mod.BIC$BIC)
out

##   Model        h2    Acc.h2   Acc.PEV      AIC      BIC
## 1  Mod1 0.1850445 0.4301680 0.6286996 3335.863 3342.713
## 2  Mod2 0.2064338 0.4543498 0.6797007 3325.295 3335.570
## 3  Mod3 0.1869733 0.4324041 0.6636900 3322.585 3332.860
## 4  Mod4 0.2187110 0.4676655 0.7033190 3318.222 3331.921
## 5  Mod5 0.1369527 0.3700713 0.4870271 3298.103 3315.228

Based on AIC and BIC from those models, we can conclude that the goodness-of-fit came from the model 4, where we modeled AR1 for rows and AR1 for columns. The following comparisons will be based on this model.

6. Modelling the global effects

After the modelling of the effects of spatial correlated errors, we can include the information of row and range as an covariate in the model using the function \(lin()\). In this case, we are accounting for an single linear trend for range and another for row effect. In addition, we check the possibility of inclusion of the row and range effects as random effects in the model. We described the models below:

Model	Global/Extraneous	Natural	Random parameter	Fixed parameter
Mod5		\(\sigma^2_{𝜉}∑_r(𝞺_r)⊗∑_c(𝞺_c)\)	4	21
Mod6	\(\beta_r × R\)	\(\sigma^2_{𝜉}∑_r(𝞺_r)⊗∑_c(𝞺_c)\)	4	22
Mod7	\(\beta_r × R,\beta_c × C\)	\(\sigma^2_{𝜉}∑_r(𝞺_r)⊗∑_c(𝞺_c)\)	4	23
Mod8	\(\beta_r × R,\beta_c × C, \sigma^2_rI_r\)	\(\sigma^2_{𝜉}∑_r(𝞺_r)⊗∑_c(𝞺_c)\)	5	23
Mod9	\(\beta_r × R,\beta_c × C, \sigma^2_rI_r, \sigma^2_cI_c\)	\(\sigma^2_{𝜉}∑_r(𝞺_r)⊗∑_c(𝞺_c)\)	6	23

Verbyla information

For the models above mentioned, we face an specific situation where the number of fixed effects vary among models. In this case, we are not allowed to use the AIC/BIC, as proposed. For such, we can incorporate a correction proposed by Verbyla (2019). With this correction, we will be able to compare those models. The calculation of the information criteria is an adaption of the code supplied in File S1 of Verbyla (2019). The log-likelihood is calculated as:

\[ {loglik = log(REML) - log(|C|)/2}, \]

where C is log.determinant \((LogDet((X'H^{-1X})^{-1}))\) .

The AIC and BIC are calculated as:

\[ AIC = {- 2 * loglik + 2 * (rand.par + fixed.par)} \] \[ BIC = {- 2 * loglik + (fixed.par + rand.par) * log(n - r + fixed.par)}, \]

where rand.par represents the number of variances estimated by the full model , fixed.par represents the fixed degrees of freedom estimated by the full model, n is the number of observations, and r is the rank of the fixed effects design matrix.

Model 4: Ar1xAr1

For comparison purposes, we will borrow the best residual model from the previous step to be the baseline structure for the next step. Therefore, we use the Model 4 and we will fit it again to calculate the corrected AIC/BIC.

# Model
mod4 = asreml(fixed = Yield~Block+at(Check):Block,
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:00 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1659.945      728475.8    227 11:58:00    0.0
##  2     -1657.932      701226.7    227 11:58:00    0.0
##  3     -1656.075      676716.1    227 11:58:00    0.0
##  4     -1655.177      666701.7    227 11:58:00    0.0
##  5     -1655.115      668546.7    227 11:58:00    0.0
##  6     -1655.111      668926.2    227 11:58:00    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod4,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod4)$varcomp

##                        component    std.error  z.ratio bound %ch
## Genotype            1.872567e+05 6.418097e+04 2.917635     P 0.3
## Row:Range!R         6.689262e+05 7.892529e+04 8.475436     P 0.0
## Row:Range!Row!cor   2.332094e-01 8.463097e-02 2.755604     U 0.4
## Row:Range!Range!cor 2.752116e-01 7.895243e-02 3.485790     U 0.4

# Traditional definition
(H2.mod4<-vpredict(mod4,H2~V1/(V1+V2)))

##    Estimate       SE
## H2 0.218711 13486.51

# Model accuracy

# Estimation via heritability
(Acc.mod4 = sqrt(H2.mod4$Estimate[1]))

## [1] 0.4676655

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod4)$varcomp[1,1]
(Acc.PEV.mod4<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.703319

# Information criteria (Based on Verbyla (2019))
log.det = 267.6646
logL = summary(mod4)$loglik
rand.par = 4
fixed.par = 21
df = summary(mod4)$nedf

# Akaike information criterion
(aic.mod4 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))

## [1] 3627.886

# Bayesian information criterion
(bic.mod4 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))

## [1] 3713.51

Model 6: Linear effect of row

# Model 6
mod6 = asreml(fixed = Yield~Block+at(Check):Block+lin(Row),
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:00 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1643.773      650168.6    226 11:58:00    0.0
##  2     -1642.842      624438.2    226 11:58:00    0.0
##  3     -1642.093      596251.2    226 11:58:00    0.0
##  4     -1641.828      575033.8    226 11:58:01    0.0
##  5     -1641.822      573771.8    226 11:58:01    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod6,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod6)$varcomp

##                        component    std.error  z.ratio bound %ch
## Genotype            1.668735e+05 6.017612e+04 2.773085     P 0.3
## Row:Range!R         5.737718e+05 6.697486e+04 8.566973     P 0.0
## Row:Range!Row!cor   1.382740e-01 8.777011e-02 1.575411     U 0.5
## Row:Range!Range!cor 1.575960e-01 8.350556e-02 1.887251     U 0.7

# Traditional definition
(H2.mod6<-vpredict(mod6,H2~V1/(V1+V2)))

##     Estimate       SE
## H2 0.2253082 11690.13

# Model accuracy

# Estimation via heritability
(Acc.mod6 = sqrt(H2.mod6$Estimate[1]))

## [1] 0.4746664

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod6)$varcomp[1,1]
(Acc.PEV.mod6<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.686594

# Information criterion (Based on Verbyla (2019))
log.det = 271.634
logL = summary(mod6)$loglik
rand.par = 4
fixed.par = 22
df = summary(mod6)$nedf

# Akaike information criterion
(aic.mod6 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))

## [1] 3607.278

# Bayesian information criterion
(bic.mod6 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))

## [1] 3696.212

Model 7: Linear effect of row and range

# Model 7
mod7 = asreml(fixed = Yield~Block+at(Check):Block+lin(Row)+lin(Range),
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:01 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1640.058      652210.0    225 11:58:01    0.0
##  2     -1639.190      627645.7    225 11:58:01    0.0
##  3     -1638.481      600543.0    225 11:58:01    0.0
##  4     -1638.222      579934.4    225 11:58:01    0.0
##  5     -1638.215      578493.7    225 11:58:01    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod7,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod7)$varcomp

##                        component    std.error  z.ratio bound %ch
## Genotype            1.654901e+05 6.062632e+04 2.729674     P 0.3
## Row:Range!R         5.784937e+05 6.790972e+04 8.518570     P 0.0
## Row:Range!Row!cor   1.448583e-01 8.778909e-02 1.650072     U 0.6
## Row:Range!Range!cor 1.571447e-01 8.361204e-02 1.879450     U 0.7

# Traditional definition
(H2.mod7<-vpredict(mod7,H2~V1/(V1+V2)))

##     Estimate       SE
## H2 0.2224378 11745.65

# Model accuracy

# Estimation via heritability
(Acc.mod7 = sqrt(H2.mod7$Estimate[1]))

## [1] 0.4716331

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod7)$varcomp[1,1]
(Acc.PEV.mod7<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.683342

# Information criterion
log.det = 278.967179871277 
logL = summary(mod7)$loglik
rand.par = 4
fixed.par = 23
df = summary(mod7)$nedf

# Akaike information criterion
(aic.mod7 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))

## [1] 3609.397

# Bayesian information criterion
(bic.mod7 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))

## [1] 3701.631

Model 8: Linear effect of row/range + random effect for row

# Model
mod8 = asreml(fixed = Yield~Block+at(Check):Block+lin(Row)+lin(Range),
              random = ~Genotype + Row,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:01 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1641.575      631771.2    225 11:58:01    0.0
##  2     -1639.503      616049.3    225 11:58:01    0.0
##  3     -1638.490      595863.3    225 11:58:01    0.0
##  4     -1638.221      578956.6    225 11:58:01    0.0
##  5     -1638.214      577575.4    225 11:58:01    0.0

## Warning in asreml(fixed = Yield ~ Block + at(Check):Block + lin(Row) +
## lin(Range), : Some components changed by more than 1% on the last iteration.

## Terms with zero df listed in attribute 'zerodf' of the wald table.

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod8,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[15:(78+14),])

# variance components
summary(mod8)$varcomp

##                        component    std.error    z.ratio bound %ch
## Row                 8.681562e+02 2.214727e+04 0.03919924     P 5.0
## Genotype            1.656823e+05 6.070327e+04 2.72938047     P 0.3
## Row:Range!R         5.775754e+05 7.103582e+04 8.13076316     P 0.0
## Row:Range!Row!cor   1.452191e-01 8.887367e-02 1.63399467     U 0.5
## Row:Range!Range!cor 1.558451e-01 8.970111e-02 1.73738240     U 0.7

# Traditional definition
(H2.mod8<-vpredict(mod8,H2~V2/(V1+V2+V3)))

##     Estimate       SE
## H2 0.2226536 12276.39

# Model accuracy

# Estimation via heritability
(Acc.mod8 = sqrt(H2.mod8$Estimate[1]))

## [1] 0.4718619

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod8)$varcomp[2,1]
(Acc.PEV.mod8<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.6836402

# Information criterion
log.det = 278.9775
logL = summary(mod8)$loglik
rand.par = 5
fixed.par = 23
df = summary(mod8)$nedf

# Akaike information criterion
(aic.mod8 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))

## [1] 3611.405

# Bayesian information criterion
(bic.mod8 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))

## [1] 3707.056

Model 9: Linear effect of row/range + random for row/range

# Model
mod9 = asreml(fixed = Yield~Block+at(Check):Block+lin(Row)+lin(Range),
              random = ~Genotype + Row + Range,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)

## Model fitted using the gamma parameterization.
## ASReml 4.1.0 Thu Oct 26 11:58:01 2023
##           LogLik        Sigma2     DF     wall    cpu
##  1     -1644.918      622375.5    225 11:58:01    0.0 (1 restrained)
##  2     -1639.723      613377.8    225 11:58:01    0.0 (1 restrained)
##  3     -1638.487      594908.4    225 11:58:01    0.0 (1 restrained)
##  4     -1638.221      578788.0    225 11:58:01    0.0 (1 restrained)
##  5     -1638.214      577570.4    225 11:58:01    0.0 (1 restrained)
##  6     -1638.214      577513.7    225 11:58:01    0.0

## Terms with zero df listed in attribute 'zerodf' of the wald table.

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod9,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[33:110,])

# variance components
summary(mod9)$varcomp

##                        component    std.error    z.ratio bound %ch
## Row                 8.621175e+02 2.216525e+04 0.03889501     P 0.6
## Range               5.844029e-02           NA         NA     B 0.0
## Genotype            1.657439e+05 6.078922e+04 2.72653512     P 0.0
## Row:Range!R         5.775137e+05 7.105233e+04 8.12800575     P 0.0
## Row:Range!Row!cor   1.453659e-01 8.890685e-02 1.63503588     U 0.1
## Row:Range!Range!cor 1.560400e-01 8.973279e-02 1.73894093     U 0.1

# Traditional definition
(H2.mod9<-vpredict(mod9,H2~V3/(V1+V2+V3+V4)))

##     Estimate      SE
## H2 0.2227382 12282.7

# Model accuracy

# Estimation via heritability
(Acc.mod9 = sqrt(H2.mod9$Estimate[1]))

## [1] 0.4719515

# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod9)$varcomp[3,1]
(Acc.PEV.mod9<-sqrt(1-mean.PEV/sigma2g))

## [1] 0.6832604

# Information criterion
log.det = 278.97809
logL = summary(mod9)$loglik
rand.par = 6
fixed.par = 23
df = summary(mod9)$nedf

# Akaike information criterion
(aic.mod9 = (-2*(logL-(0.5*log.det)))+(2*(rand.par+fixed.par)))

## [1] 3613.406

# Bayesian information criterion
(bic.mod9 = (-2*(logL-(0.5*log.det)))+((rand.par+fixed.par)*log(df)))

## [1] 3712.473

7. Model Comparision

Heritabilities

h2 = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
                h2 = c(H2.mod4$Estimate[1],H2.mod6$Estimate[1],H2.mod7$Estimate[1],
                       H2.mod8$Estimate[1],H2.mod9$Estimate[1]))

h2

##   Model        h2
## 1  Mod4 0.2187110
## 2  Mod6 0.2253082
## 3  Mod7 0.2224378
## 4  Mod8 0.2226536
## 5  Mod9 0.2227382

Accuracy via heritabilities

Acc.h2 = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
                Acc.h2 = c(Acc.mod4,Acc.mod6,Acc.mod7,Acc.mod8,Acc.mod9))

Acc.h2

##   Model    Acc.h2
## 1  Mod4 0.4676655
## 2  Mod6 0.4746664
## 3  Mod7 0.4716331
## 4  Mod8 0.4718619
## 5  Mod9 0.4719515

Accuracy via PEV

Acc.PEV = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
                Acc.PEV = c(Acc.PEV.mod4,Acc.PEV.mod6,Acc.PEV.mod7,Acc.PEV.mod8,Acc.PEV.mod9))

Acc.PEV

##   Model   Acc.PEV
## 1  Mod4 0.7033190
## 2  Mod6 0.6865940
## 3  Mod7 0.6833420
## 4  Mod8 0.6836402
## 5  Mod9 0.6832604

AIC

mod.AIC = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
                AIC = c(aic.mod4,aic.mod6,aic.mod7,aic.mod8,aic.mod9))

mod.AIC

##   Model      AIC
## 1  Mod4 3627.886
## 2  Mod6 3607.278
## 3  Mod7 3609.397
## 4  Mod8 3611.405
## 5  Mod9 3613.406

BIC

mod.BIC = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
                BIC = c(bic.mod4,bic.mod6,bic.mod7,bic.mod8,bic.mod9))

mod.BIC

##   Model      BIC
## 1  Mod4 3713.510
## 2  Mod6 3696.212
## 3  Mod7 3701.631
## 4  Mod8 3707.056
## 5  Mod9 3712.473

All information for the fixed/random terms

output = data.frame(Model = c("Mod4","Mod6","Mod7", "Mod8", "Mod9"),
                 h2 = round(h2$h2,3),
                 Acc.h2 = round(Acc.h2$Acc.h2,3),
                 Acc.PEV = round(Acc.PEV$Acc.PEV,3),
                 AIC = mod.AIC$AIC,
                 BIC = mod.BIC$BIC)


output

##   Model    h2 Acc.h2 Acc.PEV      AIC      BIC
## 1  Mod4 0.219  0.468   0.703 3627.886 3713.510
## 2  Mod6 0.225  0.475   0.687 3607.278 3696.212
## 3  Mod7 0.222  0.472   0.683 3609.397 3701.631
## 4  Mod8 0.223  0.472   0.684 3611.405 3707.056
## 5  Mod9 0.223  0.472   0.683 3613.406 3712.473

References

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723. https://doi.org/10.1109/TAC.1974.1100705

Coelho, I., Peixoto, M. A., Marcal, T. D. S., Bernardeli, A., Alves, R., Lima, R. O., & Bhering, L. L. (2021). Accounting for spatial trends in multi-environment diallel analysis in maize breeding. PloS one, 16(10), e0258473.

Cullis, B.R., A.B. Smith, and N.E. Coombes. 2006. On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11(4): 381–393. doi: 10.1198/108571106X15444

Gilmour, A.R.,Cullis, B.R. & Verbyla, A.P. (1997). Accounting for natural and extraneous variationin the analysis of field experiments.Journal of Agricultural, Biological and Environmental Statistics2,269–293

Isik et al., 2017. Genetic Data Analysis for Plant and Animal Breeding.

Mrode, R. A. (2014). Linear models for the prediction of animal breeding values. Cabi.

Schwarz G. Estimating the dimension of a model. Ann Stat. 1978; 6: 461–464.

Verbyla, A. P. (2019). A note on model selection using information criteria for general linear models estimated using REML. Australian & New Zealand Journal of Statistics, 61, 39–50. doi: 10.1111/anzs.12254.

Wilks SS. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Ann Math Stat. 1938; 9: 60–62. https://doi.org/10.1214/aoms/1177732360

Spatial analyses

Marco Antônio Peixoto, Marcio F. Resende

2022-11-04

1. Introduction

2. How to measure the best fit of a model to the data

Heritability (\(h^2\))

Prediction accuracy (\(r_{\tilde{g}g}\))

LRT test

Information criteria (AIC and BIC)

3. Dataset

Packages

Dataset

4. Model implementation

Model 1: No spatial structures

Model 2: Inclusion of AR1 for row

Model 3: inclusion of AR1 for range

Model 4: Inclusion of AR1 for row and range

Model 5: Inclusion of nugget effect

5. Model comparision

Heritabilities

Accuracy via heritability

Accuracy via PEV

AIC

BIC

All information for the residual effect modelling

6. Modelling the global effects

Verbyla information

Model 4: Ar1xAr1

Model 6: Linear effect of row

Model 7: Linear effect of row and range

Model 8: Linear effect of row/range + random effect for row

Model 9: Linear effect of row/range + random for row/range

7. Model Comparision

Heritabilities

Accuracy via heritabilities

Accuracy via PEV

AIC

BIC

All information for the fixed/random terms

References