Making Sense of Antidepressant Outcome Studies
Do Antidepressants Really Work?
Do Antidepressants Really Work?
I have written previously about some issues regarding antidepressant medication. This prompted some questions about the effectiveness of this kind of medication. It has occurred to me that the best way to respond would be to explain the apparent paradox: antidepressant medications (ADMs) are very widely used, yet many studies show only modest benefit. Some studies show no benefit at all. So if the drugs don't work very well, why are so many people taking them?
In order to understand this apparent paradox, it first is necessary to understand where medical knowledge comes from, how it is evaluated for usefulness, and how useful knowledge is applied to patient care. This process has been formalized into what is known as Evidence-Based Medicine. A good review, as concise as any ever could be, can be found here. To borrow their opening:
EBM is the
integration of clinical expertise, patient values, and the best
evidence into the decision making process for patient care. Clinical
expertise refers to the clinician's cumulated experience, education and
clinical skills. The patient brings to the encounter his or her own
personal and unique concerns, expectations, and values. The best
evidence is usually found in clinically relevant research that has been
conducted using sound methodology. (Sackett, D.)
[link added]
(A more technical explanation of EBM can be found at the Centre for Evidence-Based Medicine.)
Much of what has been written about EBM is focused on the interpretation of clinical evidence. Most good, relevant, clinical evidence comes from treatment studies. Often, the results of the studies appear to be fairly easy to interpret. However, the outcome of the study can only be understood fully by placing it in a clinical context. That is what Sackett means when he talks about integrating clinical expertise, patient values, and the best evidence. The evidence by itself does not mean much. It has to be interpreted before it can be used. That is were the challenge comes from: if you do not have clinical experience, how do you interpret the research in a clinical context?
I will try to explain how a patient can place the clinical research data into a clinical perspective. When trying to decide whether to take a medication, the question of greatest interest is this: what will happen to me, in the future, when I take this medication. The research does not answer this question directly. Rather, it shows what happened to other people, in the past when they took the medication. The problem here is obvious. How do you use the information about other people, in the past, to try to predict what will happen to you, in the future? In order to do this, you have to understand something about how research is done, and how it differs from clinical practice. This still does not enable you to predict the future, but it can help you assess what is likely to happen (as opposed to what will happen) to you, in the future, if you take the medication.
Studies are always done according to a protocol. The protocols are designed to make it hard to show a positive treatment effect. The reason for this is that research always start with the assumption is that the medication has no effect. The burden of proof is on the researcher, to show that the medication does, in fact, have an effect. In practical terms, this means that there are several aspects of the design of the study that make it hard to show a positive effect from medication. The idea is that, if the research can show an effect -- even with the cards stacked against him or her -- then it probably is a real effect.
In order to show a positive medication effect, the group of patients in the study is divided into two groups. This is done by some method of randomization, in order to control for variables. One group gets the medication; the other gets a placebo. Otherwise, both groups then are treated exactly alike. The patients do not know whether they are getting active drug or placebo, and the researchers who are assessing them do know who is getting the drug.
The first method of stacking the deck is this: the research protocol gives the doctor who prescribes the medication no choice about what medication to give. Then, after the medication has been started, any changes in the dose have to be done according to a pre-established set of guidelines. In order for the study to be valid, it is necessary to eliminate as many variables as possible. The rigidity of the dosing schedule helps to minimize the variables. However, it also gives rise to the biggest difference between a research study and routine clinical practice. In a clinical setting, the doctor evaluates the patient, decides whether or not to give medication, selects the best medication, and adjusts it according to outcome. To use an analogy, the research study is like a W.W.II bazooka: you pull the trigger, and whatever happens, happens. Maybe you hit the target, maybe not. The use of medication in a clinical setting is more like a optically-tracked, wire-guided missile. You get to make adjustments after you pull the trigger, so you are more likely to hit the target. The rigidity of the design of the research study helps make sure the results are interpretable, but it makes it harder to show a positive outcome.
The second method of stacking the deck also is done in order to minimize variables. Patients are selected for the study according to rigid guidelines. This factor is more important that it may seem, at first. Most studies exclude everyone with multiple medical problems, active or recent substance abuse, or anyone who is too young or too old. They exclude people who are already taking certain medications. Often there are so many selection criteria that only a small percentage of patients will be enrolled. I once read an article in which the authors determined that only about 15% of the people who applied to participate in a study actually met the criteria to be included. This means that most studies are done on a highly selected group of people. This group will not be representative of the general population. There actually were two selection processes: first, patient select themselves whether to volunteer for the study. As a result, the group of applicants is already different from the the general population in a systematic way. Then, the population is selected according to the study design. This results in a relatively homogeneous group, but the group of patient selected for the study differs from the general population in some important ways.
Often, people who volunteer for studies are people who are at least a little bit desperate. Enrolling in a study entails a willingness to accept an unknown risk. This means that people with mild cases of an illness are not likely to apply to participate. Likewise, people who already have had a good response to an existing treatment are not likely to sign up for the study. What this means is that the group of people in the study will, on average, consist of people who are harder to treat than the average patient would be. In most kinds of illness, milder form of the illness are more common than severe forms. The mild cases tend to respond best to treatment, but they tend to not be included in the studies. As a result, the process of selection makes it harder to show a positive outcome.
So far, we have seen that there are two factors that make it harder to show a good result with the medication, when a research study is done. There are two more to consider.
The third method of stacking the deck is to use a statistical method called a last-observation-carried-forward analysis (LOCF). This means that if a patient drops out from the study, for any reason, or is disqualified in the middle of the study, for any reason, the amount of progress that the patient had made at the time of the last observation is the amount of progress used in the final analysis of the data. This tends to make it harder to show a good treatment outcome, because often the patients who dropped out would have shown a better response if they had stayed on the medication for a longer period of time.
The fourth method of stacking the deck is to provide active treatment to the patients who get placebo. This is something most people assume is not true. It also is something that applies more in psychiatry than in other fields. Placebo is supposed to mimic the absence of treatment, right? No. Because the studies are done on humans, and humans have to be treated ethically, it is necessary to educate all study participants about their diagnosis, the methods of treatment available, why certain treatments might be chosen over others, and so forth. In psychiatric illness, this educational process has a positive treatment effect by itself. Also, as patients are assessed at regular intervals throughout the study, they are questioned about their symptoms. This process tends to improve their insight. Thus, the very act of asking questions has a therapeutic effect. This works because the more people look inward to understand what is happening to them, the more effectively they can devise coping strategies. A discussion of this is embedded in the article, Finding the Signal through the Noise: The Use of Surrogate Markers (by Sheldon Preskorn; for masochists only). You might think that this would not affect the outcome of the study, since the same factors apply to the patients getting placebo and the patients getting active medication. What Sheldon Preskorn point out in his article is that the introduction of an active treatment into the placebo group increases the statistical noise, which makes it harder to demonstrate a treatment effect.
So far, I have written about four factors in the design of research studies that can introduce a bias in the study results. In most cases, the bias is going to make it harder to show that the medication has a positive effect. Let me give some examples. This should make the article more interesting and, I hope, help clarify the topic.
My first day after medical school, I began specialty training. I was assigned to a unit in a veteran's hospital. At the end of three months there, before going on to my next placement, I sat down with my supervisor to review the experience. I had counted up all my patients, and divided them into three groups. I figured that about a third of them had gotten a lot better, a third had gotten somewhat better, and the remaining third really had not seemed to get much benefit. To my surprise, my supervisor said that those results were pretty good. He pointed out that people don't come to the hospital, in general, unless they already have shown a lack of progress to a less intensive treatment. So my entire patient population had been pre-selected to include only people who had a fair to poor prognosis.
In contrast, when I finished training, I spent part of my time in a college counseling service. Only a small proportion of the students at the clinic were referred to me. Most of them were people who had seen a therapist for a while, had not shown much improvement with psychotherapy alone, and who were sent to me as a result. In that way, the student group also had been pre-selected, but the starting population was overall a much healthier group. They were young, usually did not have any medical problems, did not have to worry about homelessness or hunger, and had not had anyone shoot at them. Few had any substance abuse problems; those who did, tended to have a short history of binge drinking, not a long history of daily drinking. Also, as students at a competitive college, they were relatively high-achieving people. They also tended to be intelligent, and to be good candidates for psychotherapy. Furthermore, because only the ones who stuck with psychotherapy were referred to me, I tended to see people who were more likely than average to stick with their treatment. Also, because they continued to see a therapist while I was prescribing medication, they had frequent contact, more incentive to remain on the medication, and any problems that arose could be spotted quickly. In short, this was an ideal setting in which to demonstrate that the medication had a positive effect. Although I did not formally keep track of the outcomes, I had the impression that almost everyone who took an antidepressant got significantly better.
These examples represent extreme ends of a spectrum. The veteran population probably was, on average, more seriously ill than the population in a typical drug study. The college kids were much healthier. A drug company could do a study in a population of college kids, and they could easily show a good result. They don't do that, though, because nobody would take the study very seriously. When doctors read the results of the study, they try to see how closely the study population matches their own patient population. Then they interpret the results in the context of their own practice. It would be difficult for a patient to do this, because it takes a lot of clinical experience to comprehend the full spectrum of severity of illness.
At this point, we have seen that antidepressant drug studies are designed to make it difficult to show a positive effect, and that the probability of a medication response, in a group, depends greatly on the nature of the group. This helps to explain why so many studies show a relatively modest treatment effect. We also have seen that doctors who treat patients in a clinical setting (as opposed to a research setting) are free to select medications and make treatment adjustments as they see fit. This gives the patients treated in a clinical setting a better chance of getting a good result.
There is one more factor to consider. When a drug is first released, no one really knows how best to use it. Over time, doctors learn more about which medications are better for which patients, what adverse effects to look for, and how to manage those adverse effects. They also learn more about how to adjust dosages.
When a drug is first released, the manufacturer declares a certain dosage range to be the recommended range. It seems as though the initial recommendations are always wrong. This is because there has not yet been enough experience with the drug to establish the ideal range. Also, the manufacturer will always recommended a certain starting dose. This is usually correct, for the type of patient who was in the study. But clinicians often find that, with experience, it is better to start some patients at higher doses, and some at lower doses. This kind of clinical experience adds to the effectiveness of the medication. It is still the same medication, but the person prescribing it is able to prescribe it in a way that is more effective and which lowers the risk of adverse effects.
A case in point involves the popular antidepressant, Prozac (fluoxetine.) What I am about to say is anecdotal, and may not be 100% accurate, but it illustrates the point. When Prozac was introduced, the only other antidepressants had much higher probabilities of causing unacceptable adverse effects. Eli Lilly (the company) knew that their product had a lower adverse effect burden. They also knew that the most frequent cause of treatment failure with antidepressants was the use of inadequate doses. So they calculated an initial recommended dose of 20mg. This was intended to be enough for the majority of patients. They only made a 20mg capsule. It was rumored that they knew this was more than what some people would need, but they did not want to make it easy to give too small of a dose. This, they thought, would improve the results -- on average. They probably were right. It is likely that millions of people were able to get a good result from the medication as a result of this strategy. Otherwise, they might have started at a lower dose, and given up before ever getting to an adequate dose.
The problem was that some patients really need to start at a lower dose. They get unacceptable adverse effects if started at a higher dose. This tends to occur mostly in patients who have a lot of anxiety. Even though Prozac can reduce anxiety, some people get a transient worsening if they start at the full dose. Consequently, some people ended up doing poorly with Prozac, when they would have done better if started at 5 or 10mg. I believe that the number of people who benefited from starting right at 20mg was higher than the number who had problems because there were no smaller doses available. Still, that is no comfort o the people who had problems at the 20mg starting dose.
Doctors figured this out pretty quickly. For a while, if someone had a lot of anxiety, I would have them open up the capsule, pour the contents back and forth between the two halves until they were approximately equal, plug the half-capsules with peanut butter, and take the smaller dose that way. I also had some people dissolve the capsule contents in apple juice (It has to be slightly acidic in order to dissolve.) One cup of apple hjuice would yield four 2-ounce doses with 5mg in each dose. Some years later, Lilly came out with a scored 10mg tablet. This made it simpler to start people at the lower doses.
I do not mean to blame Lilly for this. It is likely that their premarketing studies screened out people with depression and panic disorder, or depression and generalized anxiety disorder, or depression and posttraumatic stress disorder. This would have made sense, as you want an homogenous group in the study. But such a population is not typical in a typical setting. In routine office practice, quite a lot of the depressed patients have an anxiety disorder as well. There is no way that Lilly could have known about this problem, because of the way the studies are constructed. Furthermore, Lilly did not have any choice about the way they did the early studies. If they had not screened the patients to be a "pure culture" of depression, the FDA would have rejected the study. Another complication is that, even after it was learned that a smaller pill was needed, Lilly had to go through a complex and expensive process of getting FDA approval for the smaller dose.
The next antidepressant to come out had two strengths right away. Zoloft was released with a 50mg scored tablet, and a 100mg scored tablet. The next one, Paxil, come out with 20mg, 30mg, and 40mg tablets. Pfizer later started making 25mg scored Zoloft Tablets. Smith-Klein Beecham (now part of GlaxoSmithKlein, or GSK) came out with a 10mg Paxil tablet. All three now are available in a liquid form that permits any dosage to be given easily. No more apple juice.
This has turned out to be longer than I had hoped, but I think that some of the issues are complex enough that there isn't any quick way to explain them. Even with the length of this post, it is really an oversimpification. In particular, the details of research design and statistical evaluation of result are a lot more compicated than what I presented here. I am hopeful that I hit the right balance between simplicity and technical detail to be useful.
<< Home