The effect of temperature on Podosphera xanthii germination and branching

The statistical method undertaken in analysing the effect of temperature on P. xanthii germination and germ-tube production.

Paul Melloy https://PaulMelloy.github.io (The University of Queensland)https://uq.edu.au
2022-10-19

Compendium sections

  1. Title page
  2. Data preparation and formatting
  3. The influence of temperatures on P. xanthii conidia germination
  4. The influence of VPD P. xanthii conidia germination and germtube formation




Load R libraries required for analysis

library(data.table) # data structure, cleaning and formatting
library(mgcv) # gam statistical models
library(rstanarm) # Bayesian statistical models
library(ggplot2) # graphing
library(MASS) # logistical ordinal regression functions, among other things
library(flextable) # formatting data.frames as word tables
library(marginaleffects) # produce predicted probabilities for polr()
source("R/p_star.R") # function to show visually the level of significance


Read in data

g_tm <- fread("cache/germination_temperature.csv")




Impact of Temperature on P. xanthii germination

Peek at the data

head(g_tm)
   Tm i_period leaf germ_conidia non_germ_conidia
1:  8       12    1            0              100
2:  8       12    1            0              100
3:  8       12    2            0              100
4:  8       12    2            0              100
5:  8       24    1            0              100
6:  8       24    1            0              100
str(g_tm)
Classes 'data.table' and 'data.frame':  128 obs. of  5 variables:
 $ Tm              : int  8 8 8 8 8 8 8 8 8 8 ...
 $ i_period        : int  12 12 12 12 24 24 24 24 36 36 ...
 $ leaf            : int  1 1 2 2 1 1 2 2 1 1 ...
 $ germ_conidia    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ non_germ_conidia: int  100 100 100 100 100 100 100 100 100 100 ...
 - attr(*, ".internal.selfref")=<externalptr> 


Summary table of the data.

g_tm_table <- 
   g_tm[, list(Mean = mean(germ_conidia),
               Std_dev = round(sd(germ_conidia),3),
               Median = median(germ_conidia),
            Range = paste0(range(germ_conidia), collapse = " - " ),
            Replicates = .N),
     by = .(Tm, i_period)] |>
   flextable() |>
   align(align = "center") |>
   set_header_labels(Tm = "Temperature",
                     i_period = "Incubation (hours)",
                     Std_dev = "Standard deviation") |>
   fontsize(size = 8, part = "body") |>
   fontsize(size = 10, part = "header") |>
   bg(i = seq(2,32,2),bg = "grey80",part = "body")|>
   set_caption("Podosheria xanthii germination percentage over a range of temperatures and incubation periods")
g_tm_table

How many leaves were inoculated?

length(unique(g_tm$Tm)) *           # Temperature incubators
   length(unique(g_tm$leaf)) *      # individual leaves
   length(unique(g_tm$i_period))    # Sampling times
[1] 64

Visualise the raw data

Let’s plot the data over time, we will need to omit temperatures 8 and 35 where no germination was observed.

g_tm[Tm != 8 &
        Tm != 35,] |>
   ggplot(aes(x = i_period, y = germ_conidia, colour = as.factor(Tm))) + 
   geom_point()+
   geom_smooth(method = "gam",formula = y ~ s(x, k = 3))+
   scale_color_viridis_d()+
   xlab("Time in hours")+
   ylab("Germinated conidia (%)")+
   labs(colour = "Temperature")+
   theme_classic()+
   scale_x_continuous(limits = c(12,48), breaks = c(12,18,24,30,36,42,48))+
   scale_y_continuous(limits = c(0,100), n.breaks = 5)



Now lets plot germination over temperature separating by infection period.

g_tm |>
   ggplot(aes(x = Tm, y = germ_conidia, colour = as.factor(i_period))) + 
   geom_point()+
   geom_smooth(method = "gam",formula = y ~ s(x, k = 8), alpha = 0.2)+
   scale_color_viridis_d(direction = -1)+
   xlab(expression("Temperature " ( degree*C)))+
   ylab("Germinated conidia (%)")+
   labs(colour = "Time (h)")+
   theme_classic()+
   scale_x_continuous(breaks = seq(5,35,5))+
   scale_y_continuous(limits = c(0,100), n.breaks = 5)+
   annotate("text", x = 8, y = 100, label = "(a)")

This plot shows that the optimum temperature for rapid P. xanthii germination is around 28\(^\circ\)C. Also, in the first plot, if we assume at time zero there was no germinated conidia, 50% of the conidia germinated within the first 12 hours. In all temperature treatments, between 12 and 36 hours after infection conidial germination increased at a linear rate. After 36 hours the rate of conidia germinating declined to almost zero. Curves were fit using a generalised additive model with a smooth spline containing 6 knots.

Lets further evaluate this fit. However instead of fitting a temperature as a separate random effect, as depicted in the above plot, we will define temperature as a separate continuous predictor that may or may not interact with infection period.

Model analysis

Germination is a binomial response so we need to provide the number of germinated to the number of non-germinated. In the first model we will fit a smooth spline to each predictor and specify that each predictor influences germination independently of the other.

I have used only 2 and 4 basis spline ‘knots’ for infection period and temperature respectively.

m1 <- gam(cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k=3) + s(Tm, k =4), 
          data = g_tm,
          family = "binomial")
summary(m1)

Family: binomial 
Link function: logit 

Formula:
cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.73642    0.04417  -39.31   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq p-value    
s(i_period) 1.974  1.999  633.4  <2e-16 ***
s(Tm)       2.996  3.000 2114.0  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.93   Deviance explained = 93.2%
UBRE = 2.0049  Scale est. = 1         n = 128
plot(m1)
vis.gam(m1,
        theta = -50, # horizontal rotation
        phi = 30, # vertical rotation
        n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
        )



Model two we will investigate whether there is an interaction between temperature and infection period using a tensor.

m2 <- gam(cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4),
          data = g_tm,
          family = "binomial")
summary(m2)

Family: binomial 
Link function: logit 

Formula:
cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.70973    0.04499     -38   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                  edf Ref.df Chi.sq p-value    
te(i_period,Tm) 10.51  11.61   2424  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.954   Deviance explained = 95.4%
UBRE = 1.1466  Scale est. = 1         n = 128
plot(m2)
vis.gam(m2, 
        theta = -50, # horizontal rotation
        phi = 30, # vertical rotation
        n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
        )

This model shows slightly higher R squared and ‘explained deviance’ values, with a lower UBRE value indicating that model two (m2) is the better model.

Lets compare the models against each other.

anova(m1,m2, test = "Chisq")
Analysis of Deviance Table

Model 1: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4)
Model 2: cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
  Resid. Df Resid. Dev     Df Deviance  Pr(>Chi)    
1    122.00     372.68                              
2    115.39     251.74 6.6071   120.94 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This confirms that model two is the better model and explains more of the deviance.


Yet lets see if we can fit a better model to represent the raw data does not use an interaction to be sure this is the best fit. The plots of the fitted model indicate that 28\(^\circ\)C showed the highest conidial germination. This might be due to the default type of basis splines, and number of knots. We have exhausted our options for improving the interaction, so let us investigate improving the additive model by fitting a better spline to the temperature predictor.

P-splines use a penalised least-square method to fit the model and are better in non-normal distributions. Lets try p-splines to see if we can improve the fit.

m3 <- gam(cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k=3) + s(Tm, k =4, bs = "ps"),
          data = g_tm,
          family = "binomial")
summary(m3)

Family: binomial 
Link function: logit 

Formula:
cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4, bs = "ps")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.94630    0.05067  -38.41   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq p-value    
s(i_period) 1.973  1.999  630.6  <2e-16 ***
s(Tm)       2.997  3.000 1896.7  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.933   Deviance explained = 93.7%
UBRE = 1.7889  Scale est. = 1         n = 128
plot(m3)
vis.gam(m3,
        theta = -50, # horizontal rotation
        phi = 30, # vertical rotation
        n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
        )

The P-splines have not improved the predictive ability of the model with a lower UBRE compared with model 2, improved the R squared value and explained deviance. m3 however is a better model compared to m1 , We can also see that the curve has been shifted slightly up towards 28\(^\circ\)C.

Lets compare the models two and three to against each other.

anova(m2,m3, test = "Chisq")
Analysis of Deviance Table

Model 1: cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4, bs = "ps")
  Resid. Df Resid. Dev      Df Deviance  Pr(>Chi)    
1    115.39     251.74                               
2    122.00     345.03 -6.6071  -93.294 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The models show no significant difference therefore we should use the simpler model, model three (m3). Model three also shows a better fit to the data.


Next lets see if cyclic versions of the p-splines "cp" improve this fit further.

m4 <- gam(cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k=3) + s(Tm, k =4, bs = "cp"),
          data = g_tm,
          family = "binomial")
summary(m4)

Family: binomial 
Link function: logit 

Formula:
cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4, bs = "cp")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7148     0.0412  -41.62   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq p-value    
s(i_period) 1.975  1.999    639  <2e-16 ***
s(Tm)       2.997  3.000   2204  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.945   Deviance explained = 94.6%
UBRE = 1.4262  Scale est. = 1         n = 128
plot(m4)
vis.gam(m4,
        theta = -50, # horizontal rotation
        phi = 30, # vertical rotation
        n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
        )

While not an improvement on m2, the cyclic P-splines seem to have a very good predictive ability with a very low UBRE, also a high R squared value and explaining a large amount of deviance. We can also see that the curve has been shifted further towards 28\(^\circ\)C.


Lets compare models three and four.

anova(m2,m4, test = "Chisq")
Analysis of Deviance Table

Model 1: cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4, bs = "cp")
  Resid. Df Resid. Dev      Df Deviance  Pr(>Chi)    
1    115.39     251.74                               
2    122.00     298.62 -6.6071  -46.875 3.886e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m3,m4, test = "Chisq")
Analysis of Deviance Table

Model 1: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4, bs = "ps")
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4, bs = "cp")
  Resid. Df Resid. Dev         Df Deviance  Pr(>Chi)    
1       122     345.03                                  
2       122     298.61 8.0648e-05   46.419 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Lets use the p splines again however increase the number of knots for temperature in attempt to get a better fit to the raw data.

m5 <- gam(cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k=3) + s(Tm, k =6, bs = "cs"),
          data = g_tm,
          family = "binomial")
summary(m5)

Family: binomial 
Link function: logit 

Formula:
cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 6, bs = "cs")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.7266     0.1014  -26.88   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq p-value    
s(i_period) 1.974  1.999  637.3  <2e-16 ***
s(Tm)       4.992  5.000 1794.3  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.973   Deviance explained = 97.3%
UBRE = 0.27067  Scale est. = 1         n = 128
plot(m5)
vis.gam(m5,
        theta = -50, # horizontal rotation
        phi = 30, # vertical rotation
        n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
        )

This model plot looks contrived, however if we focus the plot and ignore large negative standardised values created by zero observations at 8 and 35\(^\circ\), the curve seems to reflect the raw data very well.

plot(m5,ylim = c(-3,4))
abline(v=28)

This model also shows the highest R-squared, explained more deviance and has the lowest UBRE score, compared to all other models.

Lets compare the models two and five, then four and five to against each other.

anova(m2,m5, test = "Chisq")
Analysis of Deviance Table

Model 1: cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 6, bs = "cs")
  Resid. Df Resid. Dev      Df Deviance Pr(>Chi)
1    115.39     251.74                          
2    120.00     146.71 -4.6071   105.03         
anova(m4,m5, test = "Chisq")
Analysis of Deviance Table

Model 1: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 4, bs = "cp")
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) + 
    s(Tm, k = 6, bs = "cs")
  Resid. Df Resid. Dev     Df Deviance  Pr(>Chi)    
1       122     298.62                              
2       120     146.71 1.9999    151.9 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The cyclic p-spline model with more knots explained the most deviance and provides a significantly better fit.

And look at the AIC values of all the models.

AIC(m1,m2,m3,m4,m5)
          df      AIC
m1  5.970342 832.0036
m2 11.510000 722.1389
m3  5.969895 804.3527
m4  5.971520 757.9365
m5  7.965804 610.0241

This shows that model five is the best fit, with the lowest AIC and an R squared of 0.9732. Temperature and infection period are significant predictors and the model produces a low UBRE score or 0.271, indicating good predictive ability and less affected by drop one, tests.


Visualise model against the raw data

To simulate data from the model using predict we need to provide some values which to estimate over.

# Make a data.frame of test values to feed in the model
test_dat <- data.table(Tm = rep(seq(from = 0, to = 40, by = 0.1), 4), # between 5 and 8 degrees for each incubation period
                        i_period = rep(c(12,24,36,48), each = length(seq(from = 0, to = 40, by = 0.1))))

# create a data.frame with predicted values and reference them to the respective temperature
m5_pred_list <-
   predict(m5,
           test_dat,
           type = "response",
           se.fit = TRUE)
m5_pred <- data.table(Tm = test_dat$Tm,
                      i_period = as.factor(test_dat$i_period),
                      germinated = m5_pred_list$fit,
                      germ_se = m5_pred_list$se.fit,
                      CI_low = m5_pred_list$fit + (m5_pred_list$se.fit * -1.94),
                      CI_high = m5_pred_list$fit + (m5_pred_list$se.fit * 1.94))

Figure_2a <- 
   m5_pred |>
   ggplot(aes(x = Tm, y = germinated, colour = i_period))+
   geom_line(size = 1)+
   scale_color_grey(start = 0.75,end = 0)+
   geom_point(data = g_tm, aes(x = Tm, y = (germ_conidia/100), colour = as.factor(i_period)))+
   theme_bw()+
   geom_line(aes(x = Tm, y = CI_low), linetype = "dashed", size = 0.6)+
   geom_line(aes(x = Tm, y = CI_high), linetype = "dashed", size = 0.6)+
   xlab(expression("Temperature " ( degree*C)))+
   ylab("Germinated conidia (%)")+
   labs(colour = "Time (h)")


Figure_2a


Temperature effect on germ tube production

Import cleaned data.

br_tm <- fread("cache/branching_temperature.csv")

Peek at the data

head(br_tm)
   Tm germtubes rep conidia
1:  8         0   1     100
2:  8         0   2     100
3:  8         1   1       0
4:  8         1   2       0
5:  8         2   1       0
6:  8         2   2       0
str(br_tm)
Classes 'data.table' and 'data.frame':  64 obs. of  4 variables:
 $ Tm       : int  8 8 8 8 8 8 8 8 17 17 ...
 $ germtubes: int  0 0 1 1 2 2 3 3 0 0 ...
 $ rep      : int  1 2 1 2 1 2 1 2 1 2 ...
 $ conidia  : int  100 100 0 0 0 0 0 0 70 70 ...
 - attr(*, ".internal.selfref")=<externalptr> 
summary(br_tm)
       Tm          germtubes         rep         conidia     
 Min.   : 8.00   Min.   :0.00   Min.   :1.0   Min.   :  0.0  
 1st Qu.:18.50   1st Qu.:0.75   1st Qu.:1.0   1st Qu.:  0.0  
 Median :23.50   Median :1.50   Median :1.5   Median : 15.0  
 Mean   :23.12   Mean   :1.50   Mean   :1.5   Mean   : 25.0  
 3rd Qu.:28.75   3rd Qu.:2.25   3rd Qu.:2.0   3rd Qu.: 37.5  
 Max.   :35.00   Max.   :3.00   Max.   :2.0   Max.   :100.0  

This experiment used the same incubation conditions as was used in the germination experiment. Following 36 hours of incubation at the respective temperatures, four leaf discs were removed from the growth chamber stained and fixed for microscopy using the leaf clearing method. Then one hundred conidia were selected at random to record the state of germination or germ tube production.


Visualise the raw data

Summary table of the data.

br_tm[, list(Mean = mean(conidia),
            Range = paste0(range(conidia), collapse = " - " ),
            Std_dev = round(sd(conidia),3),
            Replicates = .N),
     by = .(Tm, germtubes)] |>
   flextable() |>
   align(align = "center") |>
   set_header_labels(Tm = "Temperature",
                     Std_dev = "Standard deviation") |>
   fontsize(size = 8, part = "body") |>
   fontsize(size = 10, part = "header") |>
   bg(i = seq(2,32,2),bg = "grey80",part = "body")|>
   set_caption("Podosheria xanthii germination percentage over a range of temperatures and incubation periods")

Let’s plot the number of germ tubes for each temperature treatment. We can use stacked bars to show not only the number of germ tubes, but the fact that a germinated conidia with three germ tubes, also has two germ tubes ect.

Figure_3a <-
   br_tm |>
   ggplot(aes(x = as.factor(Tm), y = conidia/2 , 
              fill = factor(germtubes,levels = c(1,2,3,0)))) + 
   geom_col(position = position_stack(reverse = TRUE))+
   scale_fill_manual(values = c("0" = "grey90",
                                "1" = "grey70",
                                "2" = "grey55",
                                "3" = "grey40"), 
                     name = "Germ tubes")+
   xlab(expression("Temperature " ( degree*C)))+
   ylab("Germ tube production (%)")+
   labs(colour = "Germ tubes")+
   theme_classic()+
   annotate("text", x = 0.8, y = 99, label = "(a)", size = 5)

Figure_3a   

This plot shows that the optimum temperature for rapid P. xanthii germination is around 28\(^\circ\)C with 53 % of conidia presenting three or more germ tubes. 25\(^\circ\)C and 22\(^\circ\)C presented 51 and 49 % respectively. Although there were only fewer conidia with tertiary germtubes in the 22°C treatment, combined means of conidia with two or more germtubes were 63, 63, and 63 in temperature treatments 22\(^\circ\)C, 25\(^\circ\)C, and 28\(^\circ\)C. Only 4.5% of germinated conidia had produced tertiary germtubes in the 19\(^\circ\)C treatment, with 0 tertiary germ tubes observed in either 17\(^\circ\)C or 31\(^\circ\)C treatments.

Also, no germination was observed at temperatures 8\(^\circ\)C , or at 35\(^\circ\)C.


Ordinal logistic regression

Lets build a model to help us predict germ tube production after 36 hours when incubated at each respective temperature.

Trecate (2019) claims if there are two or more germ tubes, successful infection is likely to have occurred. Here we have a response variable, germtubes which describes the progress of a potential infection by a single conidia.
- 0 which represents a non-germinated conidia,
- 1 a germinated conidia with one germ tube,
- 2 a germinated conidia with two germ tubes,
- 3 a germinated conidia with three or more germ tubes.

The response variable might be thought of as continuous. However, given our aim is to estimate successful infection by the pathogen by proxy of germ tube production, estimating the number of germ tubes and hyphae beyond three is irrelevant and provide inaccurate results, given the difference between the number of germtubes may not be equal or consistent between levels. germtubes would be better characterised as ordinal and therefore leads us to using ordinal logistic regression methods.

The aim of this analysis is to predict the likelihood of a conidia germinating and branching under different temperature conditions.
The data is count data and our response is not binomial but ordered multinomial, which leads us to use an ordinal logistic regression.

First we need to convert the count data into a longer format, so each row represents an observation.

br_tm_lng <-
   rbindlist(apply(br_tm, 1 , function(x) {
      rbindlist(lapply(seq_len(x["conidia"]), function(x2) {
         as.data.frame(t(x[c(1:3)]))
      }))
   }))

# classify each column
br_tm_lng[, c("Tm", "germtubes", "rep") := list(Tm,
                                               ordered(germtubes), 
                                               as.factor(rep))]

Model 1 - temperature effect on germ-tube production

Now we have prepared our data lets try it in our first model assuming a linear and additive relationship between germ tube number and temperature and leaf rep.

m1 <- polr(germtubes ~ Tm + rep, 
           data = br_tm_lng,
           Hess = TRUE)

summary(m1)
Call:
polr(formula = germtubes ~ Tm + rep, data = br_tm_lng, Hess = TRUE)

Coefficients:
       Value Std. Error t value
Tm   0.02208   0.006384  3.4580
rep2 0.01755   0.100010  0.1755

Intercepts:
    Value   Std. Error t value
0|1  1.0022  0.1657     6.0473
1|2  1.4227  0.1670     8.5171
2|3  1.9288  0.1701    11.3379

Residual Deviance: 3372.54 
AIC: 3382.54 

The output is in proportional log odds and presents Coefficients which represent the change in log odds for each unit of the predictor. For example for each degree temperature increases the log odds increases 0.022.

The coefficient rep is not significant and can probably be dropped from the model.

Lets plot the model predictions for temperatures between 8\(^\circ\) C and 35\(^\circ\) C to determine if the result is reasonable compared to the raw data.

# Make a data.frame of test values to feed in the model
test_dat <- data.frame(Tm = seq(from = 5, to = 36, by = 0.1), # between 5 and 8 degrees
                       rep = as.factor(rbinom(length(seq(from = 5, to = 36, by = 0.1)),1,0.5)+1)) # randomly use either the first or second rep

# create a data.frame with predicted values and reference them to the respective temperature
m1_pred <-
   as.data.table(cbind(Tm = test_dat$Tm,
                       round(predict(m1,
                                     test_dat,
                                     type = "p"),
                             3)))

# format to long 
m1_pred <-
   melt(
      m1_pred,
      id.vars = "Tm",
      measure.vars = c("0", "1", "2", "3"),
      variable.name = "germtubes",
      value.name = "prob"
   )

m1_pred |>
   ggplot(aes(Tm, prob, colour = germtubes))+
   geom_line(size = 1)+
   theme_minimal()

The result is rather linear and predicts more germ tubes at 40\(^\circ\)C when the raw data does not see any from 35C or presumably above.

Model 2 - Temperature effect on germ-tube production

Lets try applying some splines to temperature and dropping rep.

We need to read in the splines package to fit splines as they are not supported in the polr() function.

library(splines)
m2 <- polr(germtubes ~ bs(Tm, Boundary.knots = c(8,35)),
           data = br_tm_lng, 
           Hess = TRUE)

summary(m2)
Call:
polr(formula = germtubes ~ bs(Tm, Boundary.knots = c(8, 35)), 
    data = br_tm_lng, Hess = TRUE)

Coefficients:
                                     Value Std. Error t value
bs(Tm, Boundary.knots = c(8, 35))1  0.1975     1.6404  0.1204
bs(Tm, Boundary.knots = c(8, 35))2 16.6471     0.8822 18.8701
bs(Tm, Boundary.knots = c(8, 35))3 -2.8168     1.1639 -2.4202

Intercepts:
    Value   Std. Error t value
0|1  5.0417  0.8499     5.9319
1|2  5.6921  0.8513     6.6867
2|3  6.4421  0.8529     7.5529

Residual Deviance: 2539.384 
AIC: 2551.384 

The residual deviance and AIC on this model is much lower, indicating this might be a better fit. Also the t value is higher hinting this model shows clearer differences between the variables, especially in the middle of the studied temperature range.

Now lets try predicting from this model.

m2_pred <- as.data.table(round(predict(m2,
                         test_dat,
                         type = "p"),
                 3))[, Tm := test_dat$Tm]

We get a warning indicating the the predicted values are outside the range of the spline boundary knots. This is ok as we know there should be a zero probability of spore germination outside this temperature range of 8\(^\circ\)C and 35\(^\circ\)C

# format to long 
m2_pred <-
   melt(
      m2_pred,
      id.vars = "Tm",
      measure.vars = c("0", "1", "2", "3"),
      variable.name = "germtubes",
      value.name = "prob"
   )

m2_pred |>
   ggplot(aes(Tm, prob, colour = germtubes))+
   geom_line(size = 1)+
   theme_minimal()

This plot looks like it would fit the raw data much better than m1. However, the model estimates 25\(^\circ\)C as the temperature likely to produce a greater proportion of germinated conidia with three germ-tubes. This does not reflect the raw data we plotted in figure 2b.

Model 3 - Temperature effect on germ-tube production

Lets see if we can improve the model by changing the number and placement of the spline knots.

m3 <- polr(germtubes ~ bs(Tm, knots = c(19,23,27,30)), 
           data = br_tm_lng,
           Hess = TRUE)
summary(m3)
Call:
polr(formula = germtubes ~ bs(Tm, knots = c(19, 23, 27, 30)), 
    data = br_tm_lng, Hess = TRUE)

Coefficients:
                                    Value Std. Error  t value
bs(Tm, knots = c(19, 23, 27, 30))1 15.009    40.5865  0.36979
bs(Tm, knots = c(19, 23, 27, 30))2  8.925    38.3956  0.23246
bs(Tm, knots = c(19, 23, 27, 30))3 15.293    38.8312  0.39383
bs(Tm, knots = c(19, 23, 27, 30))4 12.718    38.6486  0.32907
bs(Tm, knots = c(19, 23, 27, 30))5 16.338    38.8583  0.42044
bs(Tm, knots = c(19, 23, 27, 30))6  1.130    39.5542  0.02856
bs(Tm, knots = c(19, 23, 27, 30))7 -7.676     0.9026 -8.50526

Intercepts:
    Value   Std. Error t value
0|1 12.6119 38.7288     0.3256
1|2 13.2755 38.7289     0.3428
2|3 14.0390 38.7289     0.3625

Residual Deviance: 2501.665 
AIC: 2521.665 

The residual deviance and AIC on this model is lower, indicating this might be a better fit. Also the t value is higher hinting this model shows clearer differences between the temperature variables.

Now lets try predicting from this model.

m3_pred <- as.data.table(round(predict(m3,
                         test_dat,
                         type = "p"),
                 3))[, Tm := test_dat$Tm]

We get a warning indicating the the predicted values are outside the range of the spline boundary knots. This is ok as we know there should be a zero probability of spore germination outside this temperature range of 8\(^\circ\)C and 35\(^\circ\)C

# format to long 
m3_pred <-
   melt(
      m3_pred,
      id.vars = "Tm",
      measure.vars = c("0", "1", "2", "3"),
      variable.name = "germtubes",
      value.name = "prob"
   )

m3_pred |>
   ggplot(aes(Tm, prob, colour = germtubes))+
   geom_line(size = 1)+
   theme_minimal()

The plot shows that this model might have too many spline knots. Lets compare the models using the anova() function.

anova(m2,m3)
Likelihood ratio tests of ordinal regression models

Response: germtubes
                              Model Resid. df Resid. Dev   Test    Df
1 bs(Tm, Boundary.knots = c(8, 35))      1594   2539.384             
2 bs(Tm, knots = c(19, 23, 27, 30))      1590   2501.665 1 vs 2     4
  LR stat.      Pr(Chi)
1                      
2  37.7193 1.280351e-07

While the anova says m3 is a significantly better model, the residual deviance informs us it is not by much.

Model 4 - Temperature effect on germ-tube production

Based on the plot of model 3, we will try another model with fewer knots to reduce the wiggliness.

m4 <- polr(germtubes ~ bs(Tm, knots = c(23,28)), 
           data = br_tm_lng, 
           Hess = TRUE)
summary(m4)
Call:
polr(formula = germtubes ~ bs(Tm, knots = c(23, 28)), data = br_tm_lng, 
    Hess = TRUE)

Coefficients:
                             Value Std. Error t value
bs(Tm, knots = c(23, 28))1   2.455      2.366   1.037
bs(Tm, knots = c(23, 28))2   8.255      1.780   4.638
bs(Tm, knots = c(23, 28))3   7.719      2.017   3.827
bs(Tm, knots = c(23, 28))4   8.419      2.539   3.317
bs(Tm, knots = c(23, 28))5 -33.997     11.786  -2.884

Intercepts:
    Value    Std. Error t value 
0|1   6.5111   1.8145     3.5883
1|2   7.1687   1.8151     3.9496
2|3   7.9200   1.8154     4.3626

Residual Deviance: 2517.106 
AIC: 2533.106 

The residual deviance and AIC on this model is higher than the overfit model m3, however lower than the under fit model m2 indicating this model would better reflect the raw data. Also the t value is higher hinting this model shows clearer differences between the variables.

Now lets try predicting from this model, we will use the marginaleffects package so we can extract standard errors.

test_dat <- data.table(Tm = seq(from = 10,
                                to = 35,
                                by = 0.1))
m4_pred <- marginaleffects::predictions(m4,
                                        newdata = test_dat,
                                        type = "probs")
# set data.table format
setDT(m4_pred)

# calculate approximate confidence intervals
m4_pred[, c("CI_low","CI_high") := .(predicted - 1.96*std.error,
                                   predicted + 1.96*std.error)]

# change the column names
setnames(m4_pred,
         old = c("group","Tm.x"),
         new = c("germtubes", "Tm"))

We get a warning indicating the the predicted values are outside the range of the spline boundary knots. This is ok as we know there should be a zero probability of spore germination outside this temperature range of 8\(^\circ\)C and 35\(^\circ\)C

# raw data overlay
Figure_3b <- 
   m4_pred |>
   ggplot(aes(Tm, predicted *100, colour = germtubes))+
   geom_line(size = 1)+
   geom_line(data = m4_pred, aes(x = Tm, y = CI_low*100, #colour = group,
                                 ), linetype = "dotted", size = 0.7)+
   geom_line(data = m4_pred, aes(x = Tm, y = CI_high*100, #colour = germtubes,
                                 ), linetype = "dotted", size = 0.7)+
   scale_color_grey(start = 0.75,end = 0)+
   theme_bw()+
   xlab(expression("Temperature " ( degree*C)))+
   ylab("Gserm tube production (%)")+
   labs(colour = "Germ tubes")+
   annotate(geom = "text",
            x = 6,
            y = 96,
            label = "(b)",size = 5)
Figure_3b

The plot shows a smoother fit with the knots we have tried. Lets compare the model using the anova() function.

anova(m3,m4)
Likelihood ratio tests of ordinal regression models

Response: germtubes
                              Model Resid. df Resid. Dev   Test    Df
1         bs(Tm, knots = c(23, 28))      1592   2517.106             
2 bs(Tm, knots = c(19, 23, 27, 30))      1590   2501.665 1 vs 2     2
  LR stat.     Pr(Chi)
1                     
2 15.44083 0.000443676

While the anova says m3 is a significantly better model, the residual deviance informs us it is not by much. Because the m3 plot looks over-fit and m4 plot seems a more reasonable fit to the raw data, we will use this.

Lets examine the coefficients.

coef(m4)
bs(Tm, knots = c(23, 28))1 bs(Tm, knots = c(23, 28))2 
                  2.454674                   8.255239 
bs(Tm, knots = c(23, 28))3 bs(Tm, knots = c(23, 28))4 
                  7.718719                   8.419468 
bs(Tm, knots = c(23, 28))5 
                -33.996879 

There are five coefficients for the Temperature variable because the b-spline breaks the data into different ranges depending on knot placement. Here we see there are four knots at the following temperatures

tm_bs <- bs(br_tm_lng$Tm, knots = c(23,28))
sort(c(attr(tm_bs, which = "knots"),
  attr(tm_bs, which = "Boundary.knots")))
[1]  8 23 28 35

As can be seen in the summary above it is a little hard to make sense of these values. The spline breaks up the temperature predictor into five predictors, each explaining the difference between temperatures for a particular range. The value gives us a scaled representation of the mean magnitude of each temperature range on a log odds scale, and therefore the only inference we can make from these values is some are greater some are less and some have a large negative influence on spore germination state.

The significance of each predictor can be observed by the standard error or the ‘t value’ \(t value = \frac{coef}{se}\). We can also obtain P-values from this output, however these methods are not advised given they assume a normal distribution and infinite degrees of freedom.

# get table of coefficients
coef_table <- data.table(coef(summary(m4)), keep.rownames = TRUE)

# calculate p-values
coef_table[, p_value := pnorm(abs(`t value`),lower.tail = FALSE) *2][, sig := p_star(p_value)]
coef_table
                           rn      Value Std. Error   t value
1: bs(Tm, knots = c(23, 28))1   2.454674   2.366050  1.037456
2: bs(Tm, knots = c(23, 28))2   8.255239   1.779860  4.638138
3: bs(Tm, knots = c(23, 28))3   7.718719   2.016666  3.827465
4: bs(Tm, knots = c(23, 28))4   8.419468   2.538598  3.316582
5: bs(Tm, knots = c(23, 28))5 -33.996879  11.786364 -2.884425
6:                        0|1   6.511094   1.814542  3.588284
7:                        1|2   7.168750   1.815077  3.949557
8:                        2|3   7.919989   1.815447  4.362556
        p_value sig
1: 2.995232e-01    
2: 3.515620e-06 ***
3: 1.294697e-04 ***
4: 9.112572e-04 ***
5: 3.921296e-03  **
6: 3.328613e-04 ***
7: 7.829585e-05 ***
8: 1.285514e-05 ***

Perhaps a better method for inferring statistical significance is to consider the confidence intervals and if the CI includes zero.

m4_ci <- confint(m4)
m4_ci
                                 2.5 %    97.5 %
bs(Tm, knots = c(23, 28))1  -0.9261495  10.34610
bs(Tm, knots = c(23, 28))2   5.6446504  14.10850
bs(Tm, knots = c(23, 28))3   4.9189731  14.53761
bs(Tm, knots = c(23, 28))4   3.6682267  14.58982
bs(Tm, knots = c(23, 28))5 -56.5696386 -10.20560

So for temperatures < 8\(^\circ\)C the coefficient was 2.4546742, with CIs that include zero indicating little difference between the number of conidial germ-tubes at temperature treatments in this range.

For temperatures 8\(^\circ\)C < Tm < 23\(^\circ\)C the coefficient was 8.2552385, with CIs that don’t include zero indicating a likely significant difference between the proportions of conidia with varying numbers of germ-tubes due to temperatures in this range.

For temperatures 23\(^\circ\)C < Tm < 28\(^\circ\)C the coefficient was 7.7187186, with CIs that don’t include zero indicating a likely significant difference between the proportions of conidia with varying numbers of germ-tubes due to temperatures in this range.

For temperatures 28\(^\circ\)C < Tm < 35\(^\circ\)C the coefficient was 8.4194678, with CIs that don’t include zero indicating a likely significant difference between the proportions of conidia with varying numbers of germ-tubes due to temperatures in this range.

For temperatures 35\(^\circ\)C < Tm the coefficient was -33.9968787, with CIs that don’t include zero indicating a likely significant difference and negative association between the proportions of conidia with varying numbers of germ-tubes due to temperatures in this range.

\(\frac{t}{13.3}I(0<=t<13.3)+\frac{38.83333-t}{38.83333-13.3}I(13.3<=t<38.83333)\)

R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.utf8 
[2] LC_CTYPE=English_Australia.utf8   
[3] LC_MONETARY=English_Australia.utf8
[4] LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.utf8    

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
 [1] marginaleffects_0.6.0 flextable_0.7.1       MASS_7.3-56          
 [4] ggplot2_3.3.6         rstanarm_2.21.3       Rcpp_1.0.8.3         
 [7] mgcv_1.8-40           nlme_3.1-157          data.table_1.14.2    
[10] devtools_2.4.3        usethis_2.1.5        

loaded via a namespace (and not attached):
  [1] minqa_1.2.4          colorspace_2.0-3     ellipsis_0.3.2      
  [4] ggridges_0.5.3       rprojroot_2.0.3      markdown_1.1        
  [7] base64enc_0.1-3      fs_1.5.2             rstudioapi_0.13     
 [10] farver_2.1.0         rstan_2.21.5         remotes_2.4.2       
 [13] DT_0.23              fansi_1.0.3          xml2_1.3.3          
 [16] codetools_0.2-18     downlit_0.4.0        cachem_1.0.6        
 [19] knitr_1.39           shinythemes_1.2.0    pkgload_1.2.4       
 [22] bayesplot_1.9.0      jsonlite_1.8.0       nloptr_2.0.1        
 [25] shiny_1.7.1          compiler_4.2.0       backports_1.4.1     
 [28] Matrix_1.4-1         fastmap_1.1.0        cli_3.3.0           
 [31] later_1.3.0          htmltools_0.5.2      prettyunits_1.1.1   
 [34] tools_4.2.0          igraph_1.3.1         gtable_0.3.0        
 [37] glue_1.6.2           reshape2_1.4.4       dplyr_1.0.9         
 [40] jquerylib_0.1.4      vctrs_0.4.1          crosstalk_1.2.0     
 [43] insight_0.18.0       xfun_0.31            stringr_1.4.0       
 [46] ps_1.7.0             brio_1.1.3           testthat_3.1.4      
 [49] lme4_1.1-29          mime_0.12            miniUI_0.1.1.1      
 [52] lifecycle_1.0.1      gtools_3.9.2         zoo_1.8-11          
 [55] scales_1.2.0         colourpicker_1.1.1   promises_1.2.0.1    
 [58] parallel_4.2.0       inline_0.3.19        shinystan_2.6.0     
 [61] yaml_2.3.5           memoise_2.0.1        gridExtra_2.3       
 [64] gdtools_0.2.4        loo_2.5.1            StanHeaders_2.21.0-7
 [67] sass_0.4.1           distill_1.4          stringi_1.7.6       
 [70] highr_0.9            dygraphs_1.1.1.6     desc_1.4.1          
 [73] checkmate_2.1.0      zip_2.2.0            boot_1.3-28         
 [76] pkgbuild_1.3.1       systemfonts_1.0.4    rlang_1.0.2         
 [79] pkgconfig_2.0.3      matrixStats_0.62.0   evaluate_0.15       
 [82] lattice_0.20-45      purrr_0.3.4          labeling_0.4.2      
 [85] rstantools_2.2.0     htmlwidgets_1.5.4    processx_3.5.3      
 [88] tidyselect_1.1.2     plyr_1.8.7           magrittr_2.0.3      
 [91] R6_2.5.1             generics_0.1.2       pillar_1.7.0        
 [94] withr_2.5.0          xts_0.12.1           survival_3.3-1      
 [97] tibble_3.1.7         crayon_1.5.1         uuid_1.1-0          
[100] utf8_1.2.2           officer_0.4.2        rmarkdown_2.14      
[103] grid_4.2.0           callr_3.7.0          threejs_0.3.3       
[106] digest_0.6.29        xtable_1.8-4         httpuv_1.6.5        
[109] RcppParallel_5.1.5   stats4_4.2.0         munsell_0.5.0       
[112] viridisLite_0.4.0    bslib_0.3.1          sessioninfo_1.2.2   
[115] shinyjs_2.1.0