New Data Reveal the Full Extent of STAR*D Failure

The initial study, which has been used to promote antidepressants, employed outcome switching to hide poor results.


Psychiatrists tout the STAR*D trial as strong evidence for the use of antidepressant drugs. It was a real-world study of over 4,000 people with depression who were able to receive up to four different trials of antidepressant drugs. The STAR*D researchers reported that over the course of the study, more than two-thirds of the patients had remitted (no longer had depression).

Since the 2008 publication of the STAR*D results in the American Journal of Psychiatry (AJP), however, those researchers have been criticized for misleading the public about the true remission rates in the study.

Now, in a new study, researchers were able to obtain the patient-level data used in the STAR*D. Based on their analysis, the true remission rateā€”over the year-long trialā€”was a little more than one-third.

ā€œIn contrast to the 67% cumulative remission rate reported in AJP, the actual rate was 35.0% when using the protocol-specified HRSD,ā€ the researchers write.

STAR*D was an open-label, real-world trial of antidepressants conducted at 41 different treatment centers. The study was meant to show outcomes after one year. There was no placebo control group with which to compare the results.

The study included 4,041 patients with major depressive disorder. They were started on the SSRI citalopram (Celexa), but if they did not respond to that drug, there were three more treatment levels, individualized for each patient. There were 11 drug combinations offered in the study. This was meant to reflect real-world practice, in which patients who donā€™t respond to a drug are (theoretically) given a different drug until they find something that works.

The 2010 Reanalysis

In a 2010 study, researchers H. Edmund Pigott, Allan M. Leventhal, Gregory S. Alter, and John J. Boren reanalyzed the published results of STAR*Dā€”combining data from various tables and other data reportingā€”to discover that the original publication had misled the public.

The STAR*D researchers submitted a protocol before conducting the study, which outlines exactly what measures will be used and how they will be reported. In that protocol, remission on the Hamilton Rating Scale for Depression (HRSD) was listed as the primary outcome measureā€”the main way to tell whether the treatment was successful or not.

However, in their AJP publication of the STAR*D results, the researchers did not include the primary outcome of remission on the HRSD. They simply left this out of the publication entirely. Instead, they reported on a different measure, one that they themselves created: the Quick Inventory of Depressive Symptomatologyā€”Self Report (QIDS-SR).

Crucially, the HRSD was delivered by a third party to ensure that the researchers were blinded to the outcomes, which guards against their biases and the placebo effect. Unlike the HRSD, though, the QIDS-SR was unblinded, meaning that researcher biases and the placebo effect likely enhanced the scores.

However, without access to the original patient-level data, it was not possible to see exactly how much this outcome switching affected the results. Thatā€™s why the new studyā€”with its finding that only 35% counted as ā€œremittedā€ on the HRSDā€”is so important.

In 2010, Pigott, Levantal, Alter, and Boren documented that 607 of the STAR*D had an HRSD score of less than 14 and thus were ineligible to be in the trial because they werenā€™t very depressed to begin with. Yet, many in this group subsequently scored as remitted during one of the four stages of active treatment, inflating the remission rates.

Moreover, for those that remitted and entered into the year-long follow-up, they would not be scored as having ā€œrelapsedā€ during the follow-up unless their scores rose back up to 14 or higher on the HRSD scale. Thus, patients in this group of 607 who werenā€™t eligible for the trial in the first place could be counted as remitted and non-relapsed at the end of one year, even though, at that point, they were worse than when they entered the study.

And finally, Pigott, Leventhal, Alter, and Boren found that the actual number of people who stayed remitted and continued to the end of the trial was dismalā€”108 of the 4,041 in the trial, or about 2.7%.

A huge percentage of the STAR*D participants dropped out of the trial. Almost 10% dropped out within two weeks, and over a thousand participants dropped out during their first trial of antidepressantsā€”many of them counted as having ā€œremitted,ā€ despite the fact that itā€™s usually people who do poorly or have adverse effects that drop out of studies.

The 2018 Reanalysis

In 2018, Pigott and other researchersā€”led by renowned Harvard researcher in placebo studies Irving Kirsch, along with Tania B. Huedo-Medina, and Blair T. Johnsonā€”were able to access the patient-level data. They analyzed this data, focusing on just the first antidepressant trial in STAR*D.

Kirsch, Huedo-Medina, Pigott, and Johnson juxtaposed these outcomes to comparator trials of antidepressants (studies that compare antidepressant drugs against one another, rather than against a placebo, since STAR*D did not have a placebo group).

In comparator trials, the average improvement in HRSD score is 14.8 points. In the STAR*D, it was 6.6 points.

In comparator trials, the average remission rate is 48.4%. In STAR*D, it was 25.6%.

In comparator trials, the average response rate is 65.2%. In STAR*D, it was 32.5%.

They add that the antidepressants in STAR*D performed worse than what is typically seen from a placebo group in clinical trials.

The New Reanalysis

In this context, the new reanalysis of patient-level data, which shows that the original STAR*D publication used outcome switching to double the efficacy of antidepressant drugs (from 35% to 67%), is a confirmation of the way the original study results misled the public.

The reanalysis was conducted by Pigott and Kirsch, along with Thomas Kim, Colin Xu, and Jay Amsterdam.

According to Pigott, Kim, Xu, Kirsch, and Amsterdam, the highly publicized inflated outcomes presented in the original STAR*D publication have left the public with the incorrect assumption that antidepressant drugs are effective for over 15 years. They argue that this misleading data has led to a failure to search for better interventions that could be more effective.

ā€œBias in the clinical literature is commonly associated with industry-funded RCTs, not publicly funded ones. Our RIAT reanalysis though documents scientific errors in this NIMH-funded study. These errors inflated STAR*D investigatorsā€™ report of positive outcomes,ā€ they write.
ā€œThe STAR*D summary articleā€™s claim of a 67% cumulative remission rate was published in 2006. If STAR*Dā€™s outcomes had been reported as prespecified, its model of care would likely have faced much stronger criticism 16 years ago and fuelled a more vigorous search for evidence-based treatment alternatives,ā€ they add.




Pigott, H. E., Kim, T., Xu, C., Kirsch, I., & Amsterdam, J. (2023). What are the treatment remission, response and extent of improvement rates after up to four trials of antidepressant therapies in real-world depressed patients? A reanalysis of the STAR*D studyā€™s patient-level data with fidelity to the original research protocol. BMJ Open, 0, e063095. doi:10.1136/bmjopen-2022-063095 (Link)


  1. Did the STAR*D study used “add-on” medications?, like benzos or Z-drugs, etc. during the study? Because I remember the kitchen sink approach in a”treatment-resistant” study recently published that got around similar(!?) rates of “got better” as this reanalysis?
    How does the use of add-ons compare to the reference rates of improvement, remission and response in the “comparator” studies?
    How would that affect the intrepretation of this reanalysis?
    So, assuming the comparator studies and the STAR*D, and maybe(!?) the kitchen sink approach, are not THAT different I guess I’ll imprint in my mind, that as for depression and SSRIs you get between ONLY 50% to 40% the “benefit” in more real life-like conditions, relative to what you could expect from published RCTs.
    Assuming you reach the end of the “trial”, and the depression is “bad enough” at the beginning. Alternatively or complementarily, certainly not contradictorily, there indeed is/may be a strong placebo effect in SSRIs use, since this was “open-label”, and a strong, maybe, “kindly leave because of side effects” effect. This last one more noticeable, not less (another irony), in an open-label fashion? (am I wrong?). Like a “nocebo get better” effect?.
    Funny, a clinical psychologist over 20yrs ago told me something like that: half the people stopped taking SSRIs within 30 days of starting them because “they felt better”, despite that it takes 4 weeks for patients to actually START to “look better”, talk about patient insight beating the clinical eye!. Or was it “foresight”? Precognition?. Nah, maybe just banal nocebo lived experience, perhaps…
    His words more or less, both statements not at the same time though. And he was a smart, educated, even cultured, although aggresive and a little antisocial fellow, to my mind, that described himself as a “people’s person”, another irony. But yeah, conceding, he didn’t call himself a people’s pleaser. My cynic inside made me write this: “How could he? Honestly?”.
    Am I wrong?, surely I am, somehow, somewhere with something. And by analogy or extension, the kitchen sink approach “got better” rate would be less than half the around 25% that I think I missremember?.
    Was there any correlation between the leaving rate and the “severity” of depression at the start of the trail or during it? like because side effects or drug switching?.
    Could that explain the “got better” rate recently published, also in some BMJ mag, that suggested that MOST of the SSRI “got better” rates occur on the top raters in the severity scales of depression? Like, they have no better choice that to stick with the treatment and “wait” for the “got better” either way kicks in?. Weren’t these last rates also around the same 30% range?.
    So these “new” results could or would(?!) bring down the clinicaly “unnoticeable” depression improvement with SSRI treament above (or was it below?!) the mythical expert consensus 3 points in depression scale from RCTs to less than half than it was before? Or was it 2 points? Or was it 7 points?
    I mean, explain, me trying to make a collage, not a puzzle since the pieces seem no to fit, that is understandable out of so much discordant info…
    And from other reviews here in MIA, one could get over 400% the side effects and risks, relative to what is published, originally, in the RCTs, if I’m remebering correctly. What a pickle and ordeal from trials, clinical ones that is. Even reading about them sounds excruciating, I imagine going through at least one, medication or trail, either or both?.
    Note to self, I got to start taking notes, and I have a first good one…

      • Could be that these “brilliant” shrinks paid no attention to your physical condition. Pernicious anemia is treated with B12, which has antidepressant (and sometimes antipsychotic) properties, while depression is common in the anemic, for obvious reasons. Are you under the care of diagnostic morons?

    • Completely agree.

      I was diagnosed with bipolar II at 17, was swapped to borderline (which I vehemently resisted due to the amount of stigma I was facing from all sorts of clinicians), and then back to bipolar II.

      Fifteen years, 31 different psychotropic medications, and two extremely long courses of ECT later, and I finally found the language and strength to describe my observations/frustrations to psychiatrists who were willing to listen, and even agreed that the language of the borderline diagnosis is stigmatizing and invalidating in and of itself. Now people are coming around to what in my heart I always knew was the root issue: Complex PTSD.

      Every time a borderline diagnosis was given to me, it was always with a sense of futility, dismissal, invalidation, judgment, and in the case of two psychiatrists, a visible sneer.

      If their prescribed therapies or treatments don’t work, I’m resisting or sabotaging my treatment. If the coping skills to manage the unbearable emotional-agony prove ineffective, I’m manipulative and attention seeking. If I have a legitimate concern, I’m being combative. What simplistic, “black-and-white” thinking on their part.

      People labeled borderline do not like being in the ridiculous emotional turmoil they experience. No one would. People with borderline are people. People are more complex than the simple, surface level reasoning the mental health community has provided for a phenomena they are too lazy, or uncreative to accurately describe or treat.

      I mean, god forbid a clinician’s ego gets bruised! Most are so obsessed with being right and retaining their power/status, that they disdain being truly helpful. They’d rather us suffer, than their theory prove false.

      (In case anyone is curious, psychedelic assisted psychotherapy helped me immensely, although I’m still not quite where I want to be yet. I’m going to try schema therapy next. At least I never have to take another blasted psych med again!)

      • But there is an over 80% spontaneous remission reported on a “self limited” “spontaneously self remitting condition”, like depression “used” to be considered and evidence NOW suggests somehow is for the MOST part the case, without medication, I assume.

    Does anyone know about this town in Washington state, Wenatchee, where a large percentage of the population was prescribed Prozac? Apparently there was a clinical psychologist there who got prescribers to put his patients on Prozac and then later on maybe he lost his license?

    If it’s true that a lot of people in this town were put on Prozac, I wonder if there were any follow-up studies. I saw one thing that said that the rates of crime went up after this happened.

