Rumination 21 Experimental Design 101
posted Thursday, 5 March 2009
Rumination 21. Experimental design 101 By Thomas P. Vogl March 5, 2009 I have just spent a week in Boston, arriving late Tuesday 2/25 and finally getting home on Tuesday, 3/3 because the Nor'easter on Monday canceled both buses and planes (I spent six hours at the airport after which Cape Air canceled the delayed flight.) Wednesday and Thursday were devoted to scans - ECHO cardiogram, CT scan, and two PET scans, one with FDG and the other with FLT. The FDG is the usual pet scan using glucose with a radioactive tagged fluorine atom attached; the FLT is a fluorothymidine with a tagged fluorine atom that identifies high thymidine kinase-1 activity. FDG allows visualization of glucose metabolism which indicates how active the cell is; FLT allows visualization of DNA turnover, a measure of cell division (proliferation). If you think of cells as rabbits, FDG PET tells you how much they are running around and FLT PET tells you how fast they are breeding. FDG PET is a recognized and approved procedure; FLT is still experimental and it looks like it is more useful for blood cancers, such as the lymphomas, than for solid tumors like carcinomas, melanomas, etc. On Friday I had the first infusion of the new drug, AUY922, an Hsp90 inhibitor and endless hourly EKGs, (I suspect the plethora of EKGs are the contribution of the lawyers who may have had more input into the experimental design than the scientists.) The only side effect was a tolerable level of diarrhea. The scans showed that in the five weeks since my previous scan the size of my metastases had increased only slightly despite the fact that there had been no treatment during that time and that the treatment in the previous three to six weeks had been ineffective. Looking back over my shoulder at the whole sequence of events since my initial surgery in July of 2005, it is clear that my mucosal melanoma is exceptionally indolent considering that mucosal melanoma is known to be especially aggressive and fast growing, even compared to dermal melanoma. Now, whether this is due to particular strengths in my immune system, or a genetic peculiarity of my melanoma, or the effect of some combination of the Celebrex and fish and flax oils, all of which actively reduce inflammation, is not known and cannot be determined. However, I'll take what I can get. The reason it cannot be determined is relevant to the primary topic of this rumination. An obvious experiment would be for me to stop taking the Celebrex and fish/flax oils for four to six weeks and see if the disease progress has increased; then, as a check, resume the three 'drugs' (technically the oils are not drugs) and see if the progression again slows down. What is wrong with this scenario? There are three possible outcomes: (1) progression increases when the drugs are withdrawn and decreases when they are resumed; (2) progression increases when the drugs are withdrawn and continues to increase when they are resumed; (3) progression does not change throughout. In case 1, the conclusion that the drugs are of benefit is reasonable; in case 3, the conclusion that the drug is of no benefit is reasonable; but in case 2 the conclusion that the drug is of no benefit is not reasonable because the increase while the drugs were withdrawn may have overwhelmed the immune system and that the drugs, while effective before, are no longer up to the task. So, in that case, we have learned nothing. Consequently, it is a poorly designed experiment because it is a fundamental principle of experimental design that a well designed experiment yields useful information irrespective of the outcome of the experiment. It is a dictum as fundamental to experimental design as 'Above all do no harm' is to medicine. So, in that spirit, let us examine the design of clinical trials. To make the discussion concrete, consider an (imaginary) drug, TPV001, that in animal experiments has been shown to produce an effective treatment (of some disease) at a dose of 60 U(nits). The first step is to clearly state the objectives of the experiment. I submit that the appropriate objective is 'Is this drug effective in humans without producing unacceptable side effects'. Put more formally, can we disprove the statement (null hypothesis) that TPV001 is ineffective or that it causes unacceptable side effects. Let us further suppose that we can afford to test the drug on 40 individuals and that this is a sufficient number to satisfy the statistical requirements. What the drug companies have elected to do is to divorce the primary objective (does it work) from the secondary objective (and can it do so without unacceptable side effects) and give primacy to the secondary objective. In fact, they ignore the primary objective until they have established whether the secondary objective can be met. This is called a phase I dose escalation trial. So, they take their 40 patients and give the drug to the first 10 patients at a dose of 15U. These patients exhibit no side effects and no therapeutic benefit. What has been learned from this experiment? Essentially nothing -- no side effects (that's nice), no therapeutic benefit (not a surprise at this dose). If the drug companies were honest with these patients they would have told them from the beginning that the chance of this dose doing them any good at all was minimal. The next group of 10 patients get 30 U. Of the 10, two have mild side effects, say one has fatigue lasting less than 48 hours and the other mild diarrhea lasting less than a day. One of these patients showed a slight therapeutic benefit. What does this experiment show? Not much. It suggests that the drug may work on people and that at this dosage the side effects are mild and the therapeutic effect, milder. The next group of 10 get 45 U. Four have mild side effects, one had more severe but still acceptable side effects, and one has severe enough side effects to cause concern. Three patients demonstrate some therapeutic benefit, say lack of progression. While some information can be gleaned from knowing whether it is the patients who had the side effects were the ones who benefited from the treatment, the most likely (and usual) result is that some of the patients who benefited had no side effects. Undaunted, because the FDA has approved the protocol and it is, therefore, set in concrete, they forge ahead to a dose of 60U for the last group of 10 patients. Now there are 4 severe and 4 moderate side effects and some therapeutic response in 6 patients. They conclude that four severe reactions is too much and go on to the phase II trial (therapeutic efficacy) at 45 U. At 45 U, the phase II trial fails to show adequate efficacy in a sufficient number of patients and the drug is abandoned. This is the fate of about 80% or more of the drugs entering phase I trials. What an incredible waste of time and money. All that has really been shown is the well known fact that people differ greatly in response to drugs and that the average response of a group of ten patients totally fails to capture the extent of the variation. With the same number of subject, there is a much better way, i.e., a way that yields far more information without any added danger to the patients. In fact, when compared to the current system, it diminishes the danger to the most at risk group (the last ten subjects) and an enhances the likelihood of benefits to the group least likely to benefit under the current system (the first ten subjects). As before, start the first ten patients at 15U. If they show no adverse reaction, escalate the dose to 30U for each of those patients, Continue the dose escalation on each patient until the criteria used before to establish the unacceptability of side effects is reached, but on each patient individually. Note that this does not subject any patient to more risk that they would be exposed to if they had been part of the 40 patient group described above (much less risk for the last 10). But it has huge advantages: It immediately relates the dose, side effects,and therapeutic efficacy for each individual patient; it assures each patient that if there is therapeutic potential for him/her, then it will be reached; and it allows an immediate initial quantitative determination of not only the average maximum tolerable dose (MTD) but also of its variance. Assume that from this first group of 10, it is determined that the MTD is 42 +/- 16 U. Then the next group of 10 patients can be started at, say, 41-16 (1 SD below the mean) at 26 U and the dose escalation might be in 10U steps instead of 15U steps as it was in the first group. By the time all four groups of 10 have been through this study, so much more will be known: the MTD for all 40 subjects individually and its variance; the therapeutic response of each of the 40 to their individualized MTD; the side effects of each individual at their MTD and whether it is correlated with therapeutic efficacy, and even dose-response curves. Last, but not least, each of the 40 patients will have had the same optimal opportunity to benefit from the trial. Little, if any, of such data are available from Phase I studies as currently carried out. The pharmaceutical companies are not incompetent and they have highly skilled scientists and statisticians in their employ. The same can be said about the FDA, possibly not quite as enthusiastically. Consequently, I find it difficult to believe that the current system is the result of negligence or stupidity. That leaves the question of how the current state of affairs came about and why it continues. It is a mystery. As the detectives would have it, cui bono? I cannot figure it out. Maybe a reader can enlighten me. I will appreciate it. The collected Ruminations may be found at http://upislandeggs.com/Ruminations.htm and my e-mail address at the bottom of the page at http://upislandeggs.com/tags: mucosal melanoma