- Journal List
- HHS Author Manuscripts
- PMC3103828

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.

Learn more: PMC Disclaimer | PMC Copyright Notice

Behav Brain Res. Author manuscript; available in PMC 2012 Aug 1.

*Published in final edited form as:*

Behav Brain Res. 2011 Aug; 221(1): 271–275.

Published online 2011 Mar 17. doi:10.1016/j.bbr.2011.03.007

PMCID: PMC3103828

NIHMSID: NIHMS285590

PMID: 21397635

Antje Jahn-Eimermacher,^{a,}^{*,}^{1} Irina Lasarzik,^{b} and Jacob Raber^{c}

Author information Copyright and License information PMC Disclaimer

The publisher's final edited version of this article is available at Behav Brain Res

## Abstract

In experimental designs of animal models, memory is often assessed by the time for a performance measure to occur (latency). Depending on the cognitive test, this may be the time it takes an animal to escape to a hidden platform (water maze), an escape tunnel (Barnes maze) or to enter a dark component (passive avoidance test). Latency outcomes are usually statistically analyzed using ANOVAs. Besides strong distributional assumptions, ANOVA cannot properly deal with animals not showing the performance measure within the trial time, potentially causing biased and misleading results. We propose an alternative approach for statistical analyses of latency outcomes. These analyses have less distributional assumptions and adequately handle results of trials in which the performance measure did not occur within the trial time. The proposed method is well known from survival analyses, provides comprehensible statistical results and allows the generation of meaningful graphs. Experiments of behavioral neuroscience and anesthesiology are used to illustrate this method.

**Keywords: **Statistical analysis, Latency, Barnes maze, Morris water maze, Passive avoidance

## 1. Introduction

In analyzing cognitive tests, time for a performance measure to occur (latency) is often used. In some cases, such as the Barnes maze and water maze, this refers to the time it takes the animal to successfully find the escape tunnel or hidden platform, respectively. In other cognitive tests, such as the passive avoidance test, latency refers to the time the animal takes to enter or re-enter the dark component. As an aversive stimulus is received in the dark component during training, a lower latency during the test is associated with reduced cognitive performance or memory. In all these kind of tests, there is a possibility that the animal does not show this behavior within the trial time. Thus, an animal might not locate the escape tunnel in the Barnes maze, locate the escape platform in the water maze, or enter the dark component of the passive avoidance test, within the maximum trial time. Typically, the maximum trial time is in such cases used to calculate latency data. However, there is obviously a large difference between animals that do and those that do not show the pertinent performance measure within the trial time. In this study, we will refer to these events as failures. If failures are treated as though the animals reached the performance measure in the maximum trial time, results will potentially be biased and conclusions potentially be misleading.

Having maximum trial time data in a particular experiment also questions that the data are normally distributed with equal variances per groups. This in turn affects assumptions for standard ANOVA tests typically used to analyze these kind of latency data. It has been well established that violations of these assumptions can have a severe impact on both type I error and power [1]. Still ANOVA *F*-tests are broadly applied, in most situations without checking for any deviation of the distributional assumptions [2]. We will demonstrate, why extreme caution is warranted in underlying a normal distribution with equal variances in latency data. As nonparametric methods follow naturally, we will show, that these may be preferable compared to ANOVA, but still overcome only some of the challenges arising in analyzing latency data. We will propose an appropriate and easy to apply alternative statistical method to analyze and graphically present latency data, which will provide well comprehensible results. The method is typically applied in survival data analysis and thus standard statistical software packages can be employed in most situations. All methods are illustrated using two examples of behavioural neuroscience.

The paper is organized as follows: In Section 2 the current practice to analyze latency outcomes and its shortcomings is described and an alternative approach overcoming these shortcomings is proposed in Section 2.2. This approach is illustrated on data examples including sample size considerations in Section 3. The paper closes with a discussion in Section 4.

## 2. Methods

A PubMed search using the term “Barnes Maze” was conducted to sketch the current practice of statistical analysis of latency data exemplarily on Barnes maze experiments. Fifty-four articles were published between January 2008 and November 2010, two of them were extracted as latency was not evaluated and two were not available to the authors. In forty-five of the remaining fifty articles (90%) *t*-tests are applied for the comparison of two groups in one trial, ANOVA *F*-tests for the comparison of more than two groups or the evaluation of more than one factor in one trial and repeated measurements ANOVA for the analysis of repeated measurement designs. In all of these publications results are graphically illustrated by mean plots with error bars. Distributional assumptions were checked in only four of these papers. In three articles ANOVA methods were corrected for deviations of the distributional assumptions. Only two groups of authors applied nonparametric methods to not rely on any specific distribution, however one of these groups still applied graphical presentations based on a normal distribution.

### 2.1. Shortcomings of the current practice for statistical analysis

#### 2.1.1. Handling of failures

Experiments focusing on latency are stopped after a specific amount of time, lets say *C* seconds. For example an animal in the Morris water maze test, which does not find the platform within *C* = 60 s is guided to the platform. An animal in the Barnes maze experiment, which does not find the escape tunnel within *C* = 300 s is guided to the tunnel. The passive avoidance test is stopped after *C* = 300 s even if the animal has not yet entered the dark chamber. If latency data are analyzed with ANOVA type methods, a stopped experiment is treated as though the animal reached the event of interest after *C* seconds, which is not the case. Thus, ANOVA methods do not differentiate between a failure within *C* seconds and a latency of *C* or slightly less than *C* seconds causing potentially biased results.

#### 2.1.2. Distributional assumptions

The ANOVA *F*-test relies on a normal distribution with equal variances per group (hom*oscedasticity). However, extreme caution is warranted in underlying a hom*oscedastic normal distribution with latency data for the following reasons:

Latencies are positive, whereas a normal distribution also allows for negative values.

Latencies which would have been larger than the maximum time

*C*are replaced by a value of*C*. Thus data are right-truncated contradictory to a normal distribution.Latency data usually come from small samples, in which deviations from the assumption of equal variances are often observed.

If there is only a small number of data points, nonnormality and heteroscedasticity (unequal variances) can be hard to detect. Some researchers rely on a hom*oscedastic normal distribution based on non-significant results of normality or hom*oscedasticity tests as for example the Shapiro–Wilk test, the Kolmogorov–Smirnov test or the Levene test. However, the power of these tests to detect any deviation of a hom*oscedastic normal distribution can be small according to the small sample sizes. Thus, a non-significant result should never be considered as a proof for normal distribution or hom*oscedasticity.

If data are suspected to be not normally distributed, nonparametric methods using only the ordering of latencies in the groups (ranks) follow naturally. The distribution of the data in the groups to be compared do no longer have to follow normal distributions, but have to be similar in shape and differ by a location shift in case of group effects. Due to the reliance on fewer assumption, nonparametric methods are more robust. For the same reason they loose precision and thus are less powerful than appropriate parametric methods. Power considerations, however, are of particular interest in cognitive experiments as they are often performed in relatively small samples. Furthermore, effect estimates from nonparametric analyses are difficult to interpret as will be described in the following section. Nonparametric models with more than one factor (sometimes called “ANOVA on the ranks”) for example to evaluate treatment* sex interactions are not implemented in most commonly used software packages.

#### 2.1.3. Effect estimates and graphical presentation

If failures are present in the dataset and are considered as latencies of size C, the mean estimates of latency and standard errors of the mean (SEM) are biased. Additionally, standard errors are calculated based on the normal distribution, which is questionable for the reasons described before. Thus, neither means nor mean plots with error bars are appropriate to describe effect estimates or provide a useful graphical summary of the data.

Effect estimates corresponding to nonparametric methods are to be interpreted as “probability, that an animal of group A has a higher latency than an animal of group B” (relative effects). This might be difficult to interpret and thus less useful for reporting to a non-statistician public.

#### 2.1.4. Data example 1: Passive avoidance memory retention

Effects of androgen treatment on passive avoidance memory retention was evaluated in twenty-eight 2-year-old female mice receiving testosterone (*n* = 10), dihydrotestosterone (*n* = 10) or placebo (*n*=8) [3]. In the passive avoidance test, performance is measured with a step-through box consisting of a brightly lit compartment and an identical dark compartment connected with a sliding door (Kinder Scientific, Poway, CA). In the experimental design used, mice were placed individually in the bright compartment and, after a habituation period of 5 s, the door was opened into the darkened compartment. Mice, being averse to bright light, were naturally inclined to enter the darkened compartment. When they did so, the door quickly shut and a slight foot-shock was delivered (0.3 mA for 3 s). Each mouse was trained until it met a learning criterion of three consecutive 120-second trials without entering the darkened compartment, or up to ten trials, whichever came first. Twenty-four hours later, each mouse was again placed in the bright compartment, and the latency to re-enter the dark compartment was recorded up to 300 s. The number of trials to criterion was used to measure acquisition, and time before entering the dark chamber 24 h after training was used to measure memory retention.

In the memory retention trial 2/8 animals of the placebo group, 6/10 animals of the testosterone group and 8/10 animals of the dihydrotestosterone (DHT) group have not entered the dark chamber within 300 s. The mean latency estimate in the DHT group is 273. However, only for 2/10 animals a latency less than the 300 s trial time was observed, whereas for 8/10 animals latency was replaced by a value of 300 s. It is evident, that an estimate based on truncated values in 8 of 10 animals will be severely biased and thus not meaningful. If means with standard error of the mean (SEM) bars are plotted anyway, one gets Fig. 1. Bars are the smaller the more failures (animals not entering the chamber within 300 s) exist in a group. In fact, replacing the same value of 300 s for all failures artificially reduces the variability of the data. Variability of the true latencies, however, might be much higher.

Fig. 1

Error bar plot of passive avoidance memory retention data (example 1).

Testing for normality applying Shapiro–Wilk tests confirms a deviation of a normal distribution with *p*-values less than 0.001 for the Testosterone and DHT group. In this case, nonparametric Mann–Whitney-*U*-tests indicate an effect of DHT on the latency compared to placebo (*p* = 0.013), whereas the *p*-value for a difference in latencies between the testosterone and the placebo group is greater than 5% (*p* = 0.104). However, the size of *p*-values depends on the sample size and does only provide information about statistically significance [4]. To judge the relevance of results or compare results of different experiments effect sizes are required. Comparing *p*-values or “significant” to “not significant” might just be comparing sample sizes.

### 2.2. Proposal for the analysis of latency outcomes

As described earlier, latency is the time to some event of interest, as the time to find the escape tunnel in Barnes maze tests, the time to find the hidden platform in Morris water maze tests or the time to enter the dark chamber in the passive avoidance tests. Survival methods, in particular Cox regression modelling and Kaplan Meier estimates, are the most appropriate analyzes for time to event data and we recommend these for the analysis of latency data for the following reasons:

Failures can be handled: Having the information that a latency was at least of size

*C*, but not having its exact size as the experiment was stopped, will be considered as a censored observation in survival methodology.Underlying data distribution is not relevant. The only assumption is the proportional hazard, which will be explained below.

Cumulative incidence plots illustrate the cumulative incidence of locating the target over the course of the trial separately for each considered group and are derived from Kaplan-Meier estimates. Cox proportional hazard models can be used to analyze the effect of one or more covariates on the latency. The effect of a categorical covariate such as treatment is estimated as a relative increase/decrease in the instantaneous rate to reach the target for an animal in the treatment group still looking for the target compared to an animal in the reference group (hazard ratio). The hazard ratio is assumed to stay constant over the course of the trial (proportional hazard) and thus refers to any point in time up to the maximum time *C*. If a specific survival distribution can be assumed, e.g. a Weibull or exponential density function, that particular function can be fitted to obtain more precise inferences. If the sample size is too small to check the fit of the particular distribution, however, the distributional assumption must be based on previous knowledge, which might not be available.

To analyze repeated measurement designs more complex survival methods are required to account for a possible correlation between the trials within the same animal. The marginal model and the conditional frailty model have been proposed to analyze correlated survival data [5]. In the marginal model, an average group effect in the underlying population is estimated and a robust variance estimate corrects for a possible correlation between repeated measurements within the same animal. Different robust variance estimates have been proposed but the Jackknife estimate should be used for its good performance even in small samples [6]. In the conditional frailty model, a random term is introduced to model the correlation and effect estimates are to be interpreted as conditional effects within an animal. The model of choice depends on the objective of the analysis (population averaged vs. conditional effects) and on the sample size as marginal models have been shown to be more robust in very small sample sizes than frailty models [6].

## 3. Results

We will illustrate the method to analyze latency data proposed in Section 2.2 on two examples each one evaluating a treatment effect.

### 3.1. Data example 1: Passive avoidance memory retention

The cumulative incidence plot for the passive avoidance memory retention trial described in Section 2.1.4 presents the cumulative incidences of escaping into the dark chamber over the course of the trial (Fig. 2). For example approximately 65% of animals treated with placebo escaped into the dark chamber within the first 150 s of the experiment in opposite to only 10% and 20% of the animals treated with DHT and testosterone, respectively. After 300 s only 20% and 40% of animals treated with DHT and testosterone, respectively, have entered the dark chamber in opposite to 80% in the placebo group. Dots represent the observed latencies, whereas the stars at 300 s represent the animals having avoided the chamber during the whole experiment. Table 1 gives the results of a Cox proportional hazard model including group as a fixed covariate. Animals treated with testosterone have a 68% smaller rate to enter the dark chamber than animals treated with placebo (*p* = 0.083). Animals treated with DHT have even a 86% smaller rate to enter the dark chamber than animals treated with placebo (*p* = 0.019).

Fig. 2

Cumulative incidence plot of passive avoidance memory retention data (example 1).

### Table 1

Cox regression results of passive avoidance memory retention data (example 1).

Hazard Ratio [95%-CI] | p-value | |
---|---|---|

Testosterone vs. placebo | 0.32 [0.09;1.16] | 0.083 |

DHT vs. placebo | 0.14 [0.03;0.73] | 0.019 |

Open in a separate window

### 3.2. Data example 2: Spatial memory retention

Effects of a treatment on spatial memory after forebrain ischemia was evaluated in 9–12 week old male rats receiving high-dose treatment, low-dose treatment or vehicle solution (*n* = 12/group) (unpublished data). In the Barnes maze test, performance is measured by the ability to locate a single escape tunnel below one hole on a platform with 8 equally spaced holes. In the experimental design used, rats were tested on postischemic days 3, 5, 7 and 9. The animals were individually accustomized to the escape tunnel for 1 min before each trial and the trial was started by placing the rat in a start box in the middle of the maze and removing the start box. Rats were tested using bright light and white noise as aversive stimuli. Latency to escape tunnel was recorded when all four legs were in the escape tunnel. In case of unsuccessful escape tunnel locating, the rat was directed to the tunnel after 5 min. After each trial the maze was cleaned with 70% ethanol to eliminate olfactory clues.

An initial training trial on post-ischemic day 3 is excluded from statistical analyses as spatial memory affects only experiments after the first random search. From the cumulative incidence plots (Fig. 3) a learning effect can be observed for all treatment groups, which increases the cumulative incidences of escaping from the maze with each post-ischemic trial. In each trial the animals treated with high-dose treatment are faster in escaping the maze compared to animals treated with low-dose treatment or vehicle. Applying a Cox proportional hazard model with group as a categorical covariate, post-ischemic day as continuous covariate and applying robust Jackknife variance estimates gives the results summarized in Table 2.

Fig. 3

Cumulative incidence plots of spatial memory retention data in the Barnes maze (example 2).

### Table 2

Cox regression results of spatial memory retention data in the Barnes maze (example 2).

Hazard Ratio [95%-CI] | p-value | |
---|---|---|

High dose vs. vehicle | 2.58 [0.91; 7.35] | 0.076 |

Low dose vs. vehicle | 0.87 [0.28; 2.69] | 0.810 |

Trial | 1.44 [1.20; 1.72] | <0.001 |

Open in a separate window

Animals treated with high dose treatment have a 2.58-fold higher rate to reach the escape tunnel than animals receiving vehicle substance (*p* = 0.076). Whereas the rate to reach the tunnel was slightly decreased by 13% by administration of low-dose treatment (*p* = 0.81). In all groups the rate for reaching the escape tunnel increased by 44% with each trial (*p* < 0.001).

### 3.3. Sample size requirements

Considering a treatment is assumed to affect the time to locate the target and is compared to a placebo by the proposed Cox regression in a 1:1 randomized experiment, the sample size required for a certain power depends on the expected treatment effect expressed as a relative change in the instantaneous rate to locate the target (hazard ratio). The number of animals locating the target within the trial time is the main determinant for the efficiency of the Cox regression. Therefore the sample size has to be increased by the amount of animals which is expected to fail locating the target. Table 3 gives sample size calculation results exemplarily for some situations using Schoenfelds formula. In a repeated measurement design the power can even be higher than anticipated as each animal contributes more than one observation to the analysis.

### Table 3

Sample size requirements for a two group comparison in a balanced randomized design

Two-sided level α | Power | Hazard ratio ^{a} | Total failure rate ^{b} | Required no. of animals succeeding ^{b} | Required total no. of animals |
---|---|---|---|---|---|

5% | 80% | 3 | 20% | 26 | 34 |

5% | 80% | 3 | 40% | 26 | 44 |

5% | 80% | 5 | 20% | 12 | 16 |

5% | 80% | 5 | 40% | 12 | 20 |

Open in a separate window

^{a}A hazard ratio of 3 or 5 indicates a 3-fold or 5-fold higher instantenous rate to locate the target under group A compared to group B, respectively.

^{b}Success/failure is defined as locating/not locating the target within trial time.

Sample sizes might be smaller than required to get a certain power, if the latency experiment is not the primary objective or if ethical concerns limit the sample size. Statistical analyses can still provide useful information even if no statistically significant results can be expected. However, if the sample size becomes too small, statistical methods might fail to provide sound results and should therefore not be applied. For single trial experiments Vittinghoff and McCulloch [7] demonstrated, that a minimum of five animals locating the target within the trial time per factor of the Cox model is sufficient to limit the risk of severely invalid results. Continuous covariates such as age contribute one factor to the model, whereas categorical covariates such as treatment or sex contribute *K* − 1 factors to the model with *K* being the number of categories. To illustrate how to check the minimum sample size requirements the passive avoidance memory retention trial of Section 2.1.4 is used. A Cox model with two factors was applied since three treatments were compared. Escaping into the chamber was observed in 12 animals corresponding to 6 animals per factor, thus exceeding the minimum sample size requirements. For repeated measurement Cox models Jahn-Eimermacher [6] suggest, that 8 animals succeeding at least once in locating the target are required per factor to limit the risk of severely invalid results.

## 4. Discussion

We proposed a statistical approach for the analysis of latency data, which can deal with failures within the maximum duration of the experiment and has less distributional assumptions than ANOVA type methods. It also provides illustrative plots and meaningful effect sizes corresponding to its *p*-values. In contrary, nonparametric methods typically recommended for non-normally distributed data provide *p*-values, which are broadly presented without corresponding effect sizes mainly due to interpretational difficulties. However, it is well known, that *p*-values for its own are meaningless [4]. The here proposed Cox regression was originally developed to analyze survival data and thus is implemented in most statistical software packages. Only if repeated measurement designs are to be evaluated more sophisticated packages such as R or SAS have to be applied. Cox regression modeling strategies can be applied to handle complex designs as was done by Faes et al. [8] on Morris water maze data.

ANOVA methods do not handle failures within the maximum trial time properly. If there is only a negligible amount of failures in the dataset to be analyzed, ANOVA methodology might be applicable. However, distributional assumptions are still questionable and generalizations correcting for a deviation of these assumptions should be considered.

We illustrated, how to present latency data using cumulative incidence plots. In order to present learning curves, cumulative incidence plots are not appropriate and median plots should be provided to overcome the distributional requirements of the broadly used mean and error bar plots. A nice example is given in [9].

A minimum sample size should be ensured to get valid results from the Cox regression. Certainly, larger samples are preferable and may even be required for a confirmatory analysis of latency data as the primary objective and/or to get a satisfactory power of e.g. 80%. However, latency experiments are typically performed in an exploratory manner to transfer promising results into clinical approaches and thus ethical concerns limit the sample size. In particular small datasets should be used most efficiently by choosing a well fitting approach for statistical analysis.

## Acknowledgments

The first author thanks the Department of Public Health and Preventive Medicine of the Oregon Health & Science University, USA, for hosting the author during her research fellowship. This work was supported by JA1821/2 DFG grant (Deutsche Forschungsgemeinschaft), IIRG-05-14021 (Alzheimer’s Association), NNJ05HE63G (NASA) grants, and MH77647 (NIH).

## References

1. Gamage J, Weerahandi S. Size performance of some tests in one-way ANOVA. Commun Stat: Simulat Comput. 1998;27:625–640. [Google Scholar]

2. Keselman H, Huberty C, Lix L, Olejnik S, Cribbie R, Donahue B, et al. Statistical practices of educational researchers: an analysis of their ANOVA, Manova, and Ancova analyses. Rev Educ Res. 1998;68:350–386. [Google Scholar]

3. Benice T, Raber J. Testosterone and dihhydrotestosterone differentially improve cognition in aged female mice. Learn Mem. 2009;16:479–485. [PMC free article] [PubMed] [Google Scholar]

4. du Prel J, Hommel G, Roehrig B, Blettner M. Confidence interval or *p*-value? Part 4 of a series on evaluation of scientific publications. Dtsch Arztebl Int. 2009;106:336–339. [PMC free article] [PubMed] [Google Scholar]

5. Therneau T, Grambsch P. Modeling survival data. New York: Springer; 2000. [Google Scholar]

6. Jahn-Eimermacher A. Robustness of semiparametric methods in small samples of clustered survival data with an application to maze experiments. Comput Stat Data Anal. submitted for publication. [Google Scholar]

7. Vittinghoff E, McCulloch C. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol. 2006;165:710–718. [PubMed] [Google Scholar]

8. Faes C, Aerts M, Geys H, De Schaepdrijver L. Modeling spatial learning in rats based on Morris water maze experiments. Pharm Stat. 2010;9:10–20. [PubMed] [Google Scholar]

9. Goertz N, Lewejohann L, Tomm M, Ambree R, Keyvani K, Paulus W, et al. Effects of environmental enrichment on exploration, anxiety, and memory in female tgcrnd8 Alzheimer mice. Behav Brain Res. 2008;191:43–48. [PubMed] [Google Scholar]