ABSTRACT

The first edition of Design and Analysis of Cross-Over Trials quickly became the standard reference on the subject and has remained so for more than 12 years. In that time, however, the use of cross-over trials has grown rapidly, particularly in the pharmaceutical arena, and researchers have made a number of advances in both the theory and methods

chapter 7|1 pages

CHAPTER 7

chapter |1 pages

the ‘Area Under the Curve’ or AUC. The AUC is taken as a measure of exposure of the drug to the subject. The peak or maximum concen-tration is referred to as Cmax and is an important safety measure. For regulatory approval of bioequivalence it is necessary to show from the trial results that the mean values of AUC and Cmax for T and R are not significantly different. The AUC is calculated by adding up the ar-eas of the regions identified by the vertical lines under the plot in Figure 7.1 using an arithmetic technique such as the trapezoidal rule (see, for example, Welling, 1986, 145–149, Rowland and Tozer, 1995, 469–471). Experience (e.g., FDA Guidance, 1992, 1997, 1999b, 2001) has dictated that AUC and Cmax need to be transformed to the natural logarithmic scale prior to analysis if the usual assumptions of normally distributed errors are to be made. Each of AUC and Cmax is analyzed separately and there is no adjustment to significance levels to allow for multiple testing (Hauck et al., 1995). We will refer to the derived variates as log(AUC) and log(Cmax), respectively. In bioequivalence trials there should be a wash-out period of at least five half-lives of the drugs between the active treatment periods. If this is the case, and there are no detectable pre-dose drug concentrations, there is no need to assume that carry-over effects are present and so it is not necessary to test for a differential carry-over effect (FDA Guidance, 2001). The model that is fitted to the data will be the one used in Section 5.3 of Chapter 5, which contains terms for subjects, periods and treatments. Following common practice we will also fit a sequence or group effect and consider subjects as a random effect nested within sequence. An example of fitting this model will be given in the next section. In the following sections we will consider three forms of bioequivalence: average (ABE), population (PBE) and individual (IBE). To simplify the following discussion we will refer only to log(AUC); the discussion for log(Cmax) is identical. To show that T and R are average bioequivalent it is only necessary to show that the mean log(AUC) for T is not significantly different from the mean log(AUC) for R. In other words we need to show that, ‘on average’, in the population of intended patients, the two drugs are bioequivalent. This measure does not take into account the variability of T and R. It is possible for one drug to be much more variable than the other, yet be similar in terms of mean log(AUC). It was for this reason that PBE was introduced. As we will see in Section 7.5, the measure of PBE that has been recommended by the regulators is a mixture of the mean and variance of the log(AUC) values (FDA Guidance, 1997, 1999a,b, 2000, 2001). Of course, two drugs could be similar in mean and variance over the

chapter |1 pages

population of potential patients, but be such that they produce different effects when a patient is switched from formulation T to formulation R or vice-versa. In other words there is a significant subject-by-formulation interaction. To show that this is not the case T and R have to be shown to be IBE, i.e., individually bioequivalent. The measure of IBE that has been suggested by the regulators is an aggregate measure involving the means and variances of T and R and the subject-by-formulation inter-action. We will describe this measure in Section 7.4. In simple terms PBE can be considered as a measure that permits patients who have not yet been treated with T or R to be safely prescribed either. IBE, on the other hand, is a measure which permits a patient who is cur-rently being treated with R to be safely switched to T (FDA Guid-ance, 1997, 1999a,b, 2000, 2001). It is worth noting that if T is IBE to R it does not imply that R is IBE to T. The same can be said for PBE. An important practical implication of testing for IBE is that the 2×2 cross-over trial is no longer adequate. As will be seen, the volunteers in the study will have to receive at least one repeat dose of R or T. In other words, three-or four-period designs with sequences such as [RTR,TRT] and [RTRT,TRTR], respectively, must be used. The measures of ABE, PBE and IBE that will be described in Sec-tions 7.2, 7.5 and 7.4 are those suggested by the regulators. Dragalin and Fedorov (1999) and Dragalin et al. (2002) have pointed out some drawbacks of these measures and suggested alternatives which have more attractive properties. We will consider these alternatives in Section 7.7. All the analyzes considered in Sections 7.2 to 7.4 are based on sum-mary measures (AUC and Cmax) obtained from the concentration-time profiles. If testing for bioequivalence is all that is of interest, then these measures are adequate and have been extensively used in practice. How-ever, there is often a need to obtain an understanding of the absorb-tion and elimination processes to which the drug is exposed once it has entered the body, e.g., when bioequivalence is not demonstrated. This can be done by fitting compartmental models to the drug con-centrations obtained from each volunteer. These models not only pro-vide insight into the mechanisms of action of the drugs, but can also be used to calculate the AUC and Cmax values. In Section 7.8 we de-scribe how such models can be fitted using the methods proposed by Lindsey et al. (2000a). The history of bioequivalence testing dates back to the late 1960s and early 1970s. Two excellent review articles written by Patterson (2001a, 2001b) give a more detailed description of the history, as well as a more extensive discussion of the points raised in this section. The regulatory

chapter |1 pages

implications of bioequivalence testing are also described by Patterson and so we do not repeat these here. At the present time average bioequivalence (see Section 7.2) serves as the current international standard for bioequivalence testing using a 2× 2 cross-over design. Alternative designs (e.g., replicate cross-over designs) may be also utilized for drug products to improve power (see Section 7.6). We will consider population and individual bioequivalence testing as these utilize cross-over study designs and were the subject of extensive debate in the 1990s (see Patterson, 2001b, for a summary), but these may not currently be used for access to the marketplace (FDA Guidance, 2002). 7.2 Testing for average bioequivalence The now generally accepted method of testing for ABE is the two-one-sided-tests procedure (TOST) proposed by Schuirmann (1987). It is con-veniently done using a confidence interval calculation. Let µ be the (true) mean values of log(AUC) (or log(Cmax)) when subjects are treated with T and R, respectively. ABE is demonstrated if the 90% two-sided confidence interval for µ falls within the acceptance limits of − ln 1.25 = −0.2231 and + ln 1.25 = 0.2231. These limits are set by the regulator (FDA Guid-ance, 1992, 2001, 2002) and when exponentiated give limits of 0.80 and 1.25. That is, on the natural scale ABE is demonstrated if there is good evidence that: 0.80 ≤ exp(µ ) ≤ 1.25. We note that symmetry of the confidence interval is on the logarithmic scale, not the natural scale. The method gets its name (TOST) because the process of deciding if the 90% confidence interval lies within the acceptance limits is equiva-lent to rejecting both of the following one-sided hypotheses at the 5% significance level: H :µ ≤− ln 1.25 H :µ ≥ ln 1.25. Example 7.1 The derived data given in Tables 7.1 and 7.2are from a pharmacoki-netic study that compared a test drug (T) with a known reference drug (R). The design used was a 2×2 cross-over with 24 healthy volunteers in the RT sequence group and 25 in the TR sequence group. Each volunteer should have provided both an AUC and Cmax value. However, as can

chapter 14|1 pages

APBC 355 310 295

chapter |1 pages

switched to T. It is a criterion that takes into account within-subject variability and subject-by-formulation interaction. (See FDA Guidance, 1997, 1999b, and final Guidance, 2000, 2001.) In order to define it we need some additional notation. Let σ and σ denote the within-subject variances for T and R, respectively, σ and σ denote the between-subject variances for T and R, respectively, and σ − 2ρσ σBR , denote the subject-by-formulation interaction, where ρ is the between-subject cor-relation of T and R. The means of T and R are denoted by µ , respectively, and δ = (µ ). IBE is assessed using the following aggregate metric (FDA Guidance, 1997): (µ (7.2) max(0.04,σ ) which tests the following linearized null hypotheses: if σˆ > 0.04: H : ν = δ +σ − (1 + c ≥ 0. (7.3) If σˆ ≤ 0.04: H : ν = δ +σ − 0.04(c ) ≥ 0. (7.4) Note that the metric in (7.3) is scaled using the within-variance σˆ from the trial and (7.4) is scaled using the constant value of 0.04. We note that when calculating metrics here, and in the following, each parameter is estimated from the trial data; hence we use hatted values (e.g., σˆ ) for the calculations. When the metric is scaled using σˆ , we will refer to it as reference-scaled. When it is scaled using 0.04, we refer to it as constant-scaled. Note that if σˆ < 0 then the contribution to the numerator of (7.2) by δˆ and σˆ will be reduced. This is a potentially undesirable property and it is known such trade-offs have occurred in practice (Zariffa et al., 2000). Constant-scaling was introduced (FDA Guidance, 1997) as a means of keeping low-variability products from being held to what was felt to be an unreasonably strict standard for bioequivalence. The value c in H is a regulatory goalpost equal to 2.49 (FDA Guidance, 2000). It assumes a within-subject variance for R of 0.04 and is calculated as follows: (ln(1.25)) + (0.03) + (0.02) c = = 2.49 (7.5) 0.04 allowing for a difference in means of ln(1.25) and a variance allowance of 0.03 in the numerator for the subject-by-formulation interaction and an allowance of 0.02 for the difference in within-subject variances under the procedure proposed by the FDA (FDA Guidance, 1997). To demon-

chapter |1 pages

strate IBE the upper bound of a 90% confidence interval for the above aggregate metric must fall below 2.49. The required upper bound can be calculated in at least three different ways: (1) method-of-moments estimation with a Cornish-Fisher approx-imation (Hyslop et al., 2000; FDA Guidance, 2001), (2) bootstrapping (FDA Guidance, 1997), and (3) by asymptotic approximations to the mean and variance of ν and ν (Patterson, 2003; Patterson and Jones, 2002b,c). Method (1) derives from theory that assumes the inde-pendence of chi-squared variables and is more appropriate to the analysis of a parallel group design. Hence it does not fully account for the within-subject correlation that is present in data obtained from cross-over tri-als. Moreover, the approach is potentially sensitive to bias introduced by missing data and imbalance in the study data (Patterson and Jones, 2002c). Method (2), which uses the nonparametric percentile bootstrap method (Efron and Tibshirani, 1993), was the earliest suggested method of calculating the upper bound (FDA Guidance, 1997), but it has sev-eral disadvantages. Among these are that it is computationally intensive and it introduces randomness into the final calculated upper bound. Re-cent modifications to ensure consistency of the bootstrap (Shao et al., 2000) do not appear to protect the Type I error rate (Patterson and Jones, 2002c) around the mixed-scaling cut-off (0.04) unless calibration (Efron and Tibshirani, 1993) is used. Use of such a calibration technique is questionable if one is making a regulatory submission. Hence, we pre-fer to use method (3) and will illustrate its use shortly. We note that this method appears to protect against inflation of the Type I error rate in IBE and PBE testing, and the use of REML ensures unbiased esti-mates (Patterson and Jones, 2002c) in data sets with missing data and imbalance, a common occurrence in cross-over designs, (Patterson and Jones, 2002a,b). In general (Patterson and Jones, 2002a), cross-over tri-als that have been used to test for IBE and PBE have used sample sizes in excess of 20 to 30 subjects, so asymptotic testing is not unreasonable, and there is a precedent for the use of such procedures in the study of pharmacokinetics (Machado et al., 1999). We present findings here based on asymptotic normal theory using REML and not taking into account shrinkage (Patterson and Jones, 2002b,c). It is possible to account for this factor using the approach of Harville and Jeske (1992); see also Ken-ward and Roger (1997). However, this approach is not considered here in the interests of space and as the approach described below appears to control the Type I error rate for sample sizes as low as 16 (Patterson and Jones, 2002c). In a 2 × 2 cross-over trial it is not possible to estimate separately the within-and between-subject variances and hence a replicate design, where subjects receiving each formulation more than once is required.

chapter 4|1 pages

BPAC 323 300 440

chapter |1 pages

ances and covariances obtained from REML are normally distributed with expectation vector and variance-covariance matrix equal to the fol-low  ing, r  espectiv    ely,   When σˆ > 0.04, let νˆ = δˆ + σˆ + σˆ − 2ωˆ + σˆ − (1 + c (7.6) be an estimate for the (7.3) reference-scaled metric in accordance with FDA Guidance (2001) and using a REML UN model. Then (Patter-son, 2003; Patterson and Jones, 2002b), this estimate is asymptotically normally distributed and unbiased with E[νˆ ] = δ +σ − (1 + c and Var[νˆ ] = 4σ + l + 4l + (1 + c ) (l )+ 2l −2(1+c − 2(1+c +4(1+c −2(1+c . Similarly, for the constant-scaled metric, when σˆ ≤ 0.04, νˆ = δˆ + σˆ + σˆ − 2ωˆ + σˆ − σˆ − 0.04(c ) (7.7) E[νˆ ] = δ +σ − 0.04(c ) Var[νˆ ] = 4σ + l + 4l + 2l − 2l − 4l + 4l − 2l . The required asymptotic upper bound √ of the 90% confidence interval can √ then be calculated as νˆ + 1.645× V̂ ar[νˆ ] or νˆ + 1.645× V̂ ar[νˆ ], where the variances are obtained by ‘plugging in’ the estimated values of the variances and covariances obtained from SAS proc mixed into the formulae for Var[νˆ ] or Var[νˆ ]. The necessary SAS code to do this is given in Appendix B. The output reveals that σˆ = 0.0714 and the upper bound is−0.060 for log(AUC). For log(Cmax), σˆ = 0.1060 and the upper bound is −0.055. As both of these upper bounds are below zero, IBE can be claimed.

chapter 7|1 pages

5 Population bioequivalence As noted in Section 7.1, population bioequivalence (PBE) is concerned with assessing whether a patient who has not yet been treated with R or T can be prescribed either formulation. It can be assessed using the following aggregate metric (FDA Guidance, 1997). (µ (7.8) max(0.04,σ ) where σ and σ . As long as an appropriate mixed model is fitted to the data, this metric can be calculated using data from a 2×2 design or from a replicate design. Using data from Sections 7.2 and 7.4, we will illustrate the calculation of the metric in each of the two designs. 7.5.1 PBE using a 2× 2 design As in the previous section we will test for equivalence using a linearized version of the metric and test the null hypotheses: H : ν = δ +σ − (1 + c when σ > 0.04 or H : ν = δ +σ −σ (0.04) ≥ 0, (7.10) when σ > 0.04, where σ and σ are the between-subject variances of T and R, re-spectively. Let ω denote the between-subject covariance of T and R and σ denote the variance of δˆ = µˆ . The REML estimates of σ , o  btained from using the SAS code in Appendix B, are asymptoti-cally normally distributed with expecta  tion vector   σ   l lT×ω σ and variance-covariance matrix l lT×ω l lω Then νˆ = δˆ + σˆ − (1 + c )σˆ (7.11) is an estimate for the reference-scaled PBE metric in accordance with FDA Guidance (2001) when σˆ > 0.04 and using a REML UN model. This estimate is asymptotically normally distributed and unbiased (Pat-terson, 2003; Patterson and Jones, 2002b) with E[νˆ ] = δ +σ

chapter |1 pages

Guidance FDA (2001) using a REML UN model. Then, this estimate is asymptotically normally distributed, unbiased with E[νˆ ] = δ +σ − (σ )− 0.04(c ) and has variance of Var[νˆ ] = 4σ δ + l + 2l − 2l + 2l To assess PBE we ‘plug-in’ estimates of δ and the variance components and calculate the upper bound of an asymptotic 90% confidence interval. If this upper bound is below zero we declare that PBE has been shown. Using the code in Appendix B and the data in Section 7.4, we obtain the value −0.24 for log(AUC) and the value −0.19 for log(Cmax). As both of these are below zero, we can declare that T and R are PBE. 7.6 ABE for a replicate design Although ABE can be assessed using a 2× 2 design, it can also be as-sessed using a replicate design. If a replicate design is used the number of subjects can be reduced to up to half that required for a 2 × 2 de-sign. In addition it permits the estimation of σ and σ . The SAS code to assess ABE for a replicate design is given in Appendix B. Using the data from Section 7.4, the 90% confidence interval for µ is (−0.1697,−0.0155) for log(AUC) and (−0.2474,−0.0505) for log(Cmax). Exponentiating the limits to obtain confidence limits for exp(µ ), gives (0.8439,0.9846) for AUC and (0.7808,0.9508) for Cmax. Only the first of these intervals is contained within the limits of 0.8 to 1.25, there-fore T cannot be considered average bioequivalent to R. To calculate the power for a replicate design with four periods and with a total of n subjects we can still use the SAS code given in Section 7.3, if we alter the formula for the variance of a difference of two obser-vations from the same subject. This will now be σ +σ instead of σ , where σ is the subject-by-formulation interaction. Note the use of σ rather than 2σ as used in the RT/TR design. This is a result of the estimator using the average of two measurements on each treatment on each subject. One advantage of using a replicate design is that the number of sub-jects needed can be much smaller than that needed for a 2×2 design. As an example, suppose that σ = 0, and we take σ = 0.355 and α = 0.05, as done in Section 7.3. Then a power of 90.5% can be achieved with only 30 subjects, which is about half the number (58) needed for the 2 × 2 design.

chapter |1 pages

* for a replicate design account for VarD; * for example here, set VarD=0; varD=0.0; s = sqrt(varD + sigmaW*sigmaW); It is worth noting here that REML modelling in replicate designs and the resulting ABE assessments are sensitive to the way in which the variance-covariance matrix is constructed (Patterson and Jones, 2002a). The recommended FDA procedure (FDA Guidance, 2001) provides bi-ased variance estimates (Patterson and Jones, 2002c) in certain situa-tions; however, it also constrains the Type I error rate to be less than 5% for average bioequivalence due to the constraints placed on the variance-covariance parameter space, which is a desirable property for regulators reviewing such data. 7.7 Kullback–Leibler divergence Dragalin and Fedorov (1999) and Dragalin et al. (2002) pointed out some disadvantages of using the metrics for ABE, PBE and IBE, that we have described in the previous sections, and proposed a unified approach to equivalence testing based on the Kullback–Leibler divergence (KLD) (Kullback and Leibler, 1951). In this approach bioequivalence testing is regarded as evaluating the distance between two distributions of selected pharmacokinetic statistics or parameters for T and R. For example, the selected statistics might be log(AUC) or log(Cmax), as used in the previous sections. To demonstrate bioequivalence, the following hypotheses are tested: H : d(f ) > d vs. H , (7.15) where f are the appropriate density functions of the observa-tions from T and R, respectively, and d is a pre-defined boundary or goal-post. Equivalence is determined if the following null hypothesis is rejected. For convenience the upper bound of a 90% confidence interval, d . If d then bioequivalence is accepted; otherwise it is rejected. Under the assumption that T and R have the same variance, i.e., σ , the KLD for ABE becomes (µ −µ ) d( fT , f σ2 which differs from the (unscaled) measure defined in Section 4.2. If the statistics (e.g., log(AUC)) for T and R are normally distributed with means µ , respectively, and variances σ

chapter |1 pages

and σ +σ { , respectively, the KL } D ( for PBE ) is 111d(f ) = (µ −µ ) + − 2. (7.17) 2 σ For IBE, the { KLD is 1 }( ) d(f ) = (µ +σ + 2 − 2, 2 σ σWR (7.18) where σ = Var(s ) = σ − 2ρσ σBR . Advantages of using the KLD are that it: (1) possesses a natural hi-erarchical property such that IBE implies PBE and PBE implies ABE, (2) satisfies the properties of a true distance metric, (3) is invariant to monotonic transformations of the data, (4) generalizes easily to the mul-tivariate case where equivalence on more than one parameter (e.g., AUC, Cmax and Tmax) is required and (5) is applicable over a wide range of distributions of the response variable (e.g., those in the exponential fam-ily). Patterson et al. (2001) and Dragalin et al. (2002), described the results of a simulation study and a retrospective analysis of 22 replicate design datasets, to compare testing for IBE using the KLD with testing based on the FDA-recommended metric defined earlier in Section 7.4. One notable finding of these studies was that the KLD metric identified more datasets as being of concern than the FDA-recommended metric. This appeared to be due to ability of the FDA-recommended metric to reward novel formulations for which the within-subject variance is decreased relative to the reference. 7.8 Modelling pharmacokinetic data Although AUC and Cmax are adequate for testing for bioequivalence, there is sometimes a need to model the drug concentrations over time. Fitting such models aids the understanding of how the drug is absorbed and eliminated from the body, as well as allowing model-based estimates of AUC and Cmax to be obtained. A popular type of model that is fitted in these circumstances is the compartmental model, which considers the body as made up of a number of compartments through which the drug circulates. For example, the circulating blood might be considered as the single compartment in a one-compartment model. If a drug is taken orally as a tablet, say, the drug is absorbed into this compartment as the tablet dissolves in the stomach and is eliminated from this compartment by (among other things) the actions of the liver and kidneys. While the tablet is still being dissolved in the stomach, the rate of absorption of the drug into the circulating blood is greater than the rate that is eliminated

chapter |34 pages

centration, the concentrations are usually analysed on the logarithmic scale. In other words, it is assumed that the log-transformed data are normally distributed. Lindsey et al. (2000a) argue that this may not be reasonable and suggest that the raw concentrations might be modelled using a distribution from a particular family of skewed distributions that include the gamma and Weibull. They further suggested that the disper-sion parameter of the c [ hosen distribution be modelle ] d by the function k − −k = ′ ′ − ke ) where t is time, V and k correspond to V , k , respectively, and δ is an additional parameter. Lindsey et al. (2000a) also showed how censored data can be included in the model. Censoring usually occurs when the later concentrations of drug in the blood cannot be measured because they fall below some limit of detection in the measuring device. (Censoring can also occur if samples fall above some upper limit of detection.) A common practice for dealing with censored data is to replace the missing value with a fixed value, such as half the limit of detection. Lindsey et al. (2000a) give an example of the analysis of some PK data and show that the gamma distribution allowing for censoring is a better fit than using the lognormal distribution and replacing the censored data with a fixed value. An interesting feature of the example considered by Lindsey et al. (2000a) is that the concentration of a metabolite was also measured on each subject. Lindsey et al. (2000b) show how the drug and its metabolite can be simultaneously modelled. Further analyses of this example are described in Lindsey et al. (1999), Lindsey and Jones (2000) and Lindsey et al. (2001).