THE UTILITY OF FUTILITY

Companies can stop studies before reaching projected enrollment numbers.  It could be a decision based on a planned interim analysis (IA) or it could be for  ‘administrative’ reasons.  The latter may not indicate that a company has fallen on hard times financially; it could be the result of operational problems with recruitment, or in response to regulatory changes, or due to an outside event like Covid.

Usually a DSMB handles the unblinded data review and IA.  A safety review is always done with particular attention to deaths, premature discontinuations, SAEs and AESIs, Hy’s Law cases, etc.  DSMB members also need access to efficacy data to weigh risks and benefits, ie, the therapeutic index of a new drug.  This can be a particularly difficult for NCEs going for their first efficacy testing.

The purpose of a futility analysis is to stop a trial if a benefit seems unlikely to materialize upon further enrollment.  Clearly, a trial should not run to completion when early test results indicate that patients may not benefit.  

Timing of the first IA needs to be carefully selected  to ensure sufficient information on outcomes, enough to capture an emerging efficacy trend[1],[2],[3]

The topic of our blog today centers on the operational aspects of conducting an IA in the course of an ongoing study.  We would argue not to stop a study, even if the first IA indicates futility.  We would rather put a study on temporary hold, esp. those with brisk enrollment as a single patient outcome can change the assessment in small sample sizes.

Timing of the 1st IA is extremely important as it sets the smallest possible trial size and requires the most extreme results

K Viele. Interpretation of Trials that Stop Early. JAMA 2016, 315:1646

Let me explain.  Stopping a study often means stopping an entire program, and a lot depends on getting it right. Determination of futility may be based on very few cases 4.

More information accumulates between time of data cut-off and the time the DSMB provides its verdict.   Why not look at the totality of accrued efficacy data, including supporting and secondary endpoints, run a per-protocol analysis of PP valid cases and not just of the ITT population?  Data from patients with incomplete treatment should not be ignored.

The operational aspects of the IA process should be taken into account.  Too often it is assumed that data cut-off, data analysis and declaring futility all happen instantaneously and simultaneously.  This is never the case.

Once a certain cut-off date is reached, several operational steps still need to be completed, all of which take time.  A study continues recruiting patients as noone wants to stop a trial awaiting the outcome of the IA.  The longer the data cleaning continues, the more patients and information accrue after data cut-off.

Verification is time consuming but a necessary task before a DSMB can be convened and a company can make a decision.  Therefore, the IA population becomes a subset of what is actually available at the time when a company, based on DSMB recommendation, accepts or rejects the futility analysis and stops a trial for good.

The operational work takes time, effort and money even for small trials.  Here are a few examples:

  • 30-day post treatment survival/mortality is the key endpoint in sepsis studies and relatively easy to determine.  Not all endpoints are categorical and as easy to verify. 
  • Take the EASI or PASI scores in dermatology studies, which may need to be confirmed with before and after photos.
  • The completeness of all the MACE components in CV trials need to be verified. 
  • How about the PANSS Total Score in schizophrenia studies which is so complex that special training is usually required for reproducible scoring?    
  • Batched analysis for microbiology outcomes is not an option if the patient response endpoint is a combination of both clinical and micro data.  If a central lab cannot provide this data at the time of the IA, should one rely on local lab microbiology reports?
  • Special issues exist when treatment effects materialize relatively late, like in oncology or neurology trials.

Bottom line: Collecting outcome events and scores requires cross-checks and data verification. This takes time and diligence. Meanwhile, the trial is running its course. 

Remember: No company is compelled to call off a trial based on a futility analysis.

Data cleaning is mostly relegated to CROs by the sponsor.  However, CROs are not in the habit of doing data cleaning in synchrony with enrollment; instead they batch up this work for site visits as convenient.  Working under time pressure for the IA cut-off date makes their work processes less efficient and more costly.  In a moderately fast enrolling trial, clean-up and preparations for an IA can take weeks during which enrollment continues.

If the IA was planned to capture the midway point for data accumulation (50% of enrollment), the trial may be close to 75% of enrollment by the time the data is clean.  In such cases, there is little benefit in stopping for futility as the trial is pretty much over.

If a company wants to stop enrollment in order to avoid costs, they would plan for an early futility test using a large efficacy delta and do it with 20-35% of patients enrolled.  This way, data cleaning and futility can be done when less than half the enrollment has occurred, resulting in considerable cost savings if the trial is stopped.

Indeed, futility can be established using rather small sample sizes.  In some instances, only 10 to 15 patients per arm are needed, if the statistical criteria for futility testing represent high hurdles to justify study continuation[4]. Sponsors may elect to use more stringent criteria, if there is interest in phasing out a therapeutic area or a competing funding need.

In the absence of precedent data, choice of alpha and delta is based on judgment.  Corporate finance and marketing will weigh in here; it is no longer a purely statistical or scientific exercise that frames the conditions for a futility analysis.

By setting the bar for efficacy too high or making decisions on futility too early, sponsors can make drugs ‘fail’ or appear ineffective. In the language of Press Releases we are told that a program was scuttled, because it did  not “meet our high standards or expectations”.   Often the real reason why a study was stopped will never be revealed.


REFERENCES
[1] Chang Y. Futility stopping in clinical trials, optimality and practical considerations.  Biopharm Stat 30:1050, 2020
[2] Ciolino J. Guidance on interim analysis methods in clinical trials. J Clinical Translat Science 7: e124, 1
[3] Togo K. Optimal Timing for Interim Analyses in Clinical Trials, J Biopharmaceut Stat 23:5, 1067, 2013
[4] Simon R. Optimal Two-Stage Designs for Phase II Clinical Trials.  Controlled Clinical Trials 10:1, 1989

ABBREVIATIONS
AESI        adverse event of special interest
CRO        contract research organization
DSMB     drug safety monitoring board
EASI        Eczema Area and Severity Index
IA             interim analysis
ITT           intent-to-treat analysis
MACE     Psoriasis Area and Severity Index
NCE        new chemical entity
PANSS   Positive & Negative Schizophenia Symptom Score
PASI        Psoriasis Area and Severity Index
PP            per protocol analysis
SAE         serious/severe adverse event

Leave a Reply

Your email address will not be published. Required fields are marked *