HOMO SAPIENS DISEASES - B-CELL ACUTE LYMPHOBLASTIC OR LYMPHOCYTIC LEUKEMIA (B-ALL)

Table of contents :


  • Epidemiology
  • Aetiology
  • Pathogenesis
  • Symptoms & signs
  • Laboratory examinations
  • Therapy 
  • Prognosis

  • Epidemiology : 80% of all ALLs; ALL afflicts 4 out of every 100 000 children worldwide. About 3,930 new cases and 1,490 deaths are expected in the USA in 2006 In the UK there are approximately 450 cases per year. B-ALL represents 80% of leukemias before age 15 (75% has < 6 years of age) (highly aggressive, high grade malignancy)
    Aetiology :

    Pathogenesis : translocations : Symptoms & signs : leukocytosis (>100,000/ml in 10%) lymphadenomegaly, hepatosplenomegaly (50%), mediastinal involvement (15), anemia, fatigue, weight loss, easy bruising, thrombocytopenia (< 25,000/ml in 30%), granulocytopenia with bacterial infections, involvement of bones (1-2%), skin, kidneys, lungs, and sometimes spread to the central nervous system (meningismus : 5% of adults and < 10% of children at diagnosis; prophylaxis reduces incidence from 75% to 10%; expecially for mature B forms (10.1%))
    Laboratory examinations : Prognosis : better than T-ALL; survival rates have increased from 4% to > 80% during the past 40 years. Currently, 20% of children with ALL do not respond to the same drug therapy that cures the remaining 80%. Children who undergo chemotherapy and survive ALL endure a 200-fold increase in the frequency of somatic mutations in their DNA (estimated by the number of HPRT mutations in peripheral blood T lymphocytes) and pose a 5-20 times greater risk risk for development of second malignancies and other diseases later in life. Pediatricians are continually monitoring these children as they live beyond 5, 10, and more recently, 15 years after their ALL is in remission. We now need to be proactive about studying any long term genetic ramifications that these children may face due to the treatment therapy they endured during their bout with cancer. In a study in 45 babies diagnosed with ALL at average 5.5 yr, at the time of diagnosis, the blood of patients contained an average of 1.4 cells with HPRT mutations out of every million T cells. By the time the patients completed their consolidation phase of treatment, an average of 52 T cells per million cells contained HPRT mutations. By the final stage of treatment, an average of 93 of every million T cells had mutations in HPRT. After treatment was stopped, an average of 271 of every million T cells contained HPRT mutations, > a 200-fold increase. The post-treatment rate of HRPT mutations in the ALL survivors paralleled the number of new gene alterations observed in healthy children of similar age. Because babies have larger numbers of replicating cell populations during their growth and development stages than adults have, they are more susceptible than adults to effects of the chemotherapies genotoxicityref. The prognosis for children with acute lymphoblastic leukemia (ALL) has improved dramatically over the past four decades. Breakthroughs in therapy have been achieved in a stepwise fashion through carefully controlled, cooperative group clinical protocols, the hallmark of care within the childhood cancer community. Contemporary therapy has focused on intensification using established agents rather than the introduction of new drugs. Despite these improvements, many children are being overtreated, while subgroups of children still do poorly despite recent therapeutic advances. Risk-adapted therapy tailors treatment based on the predicted risk of relapse—augmenting therapy for those whose tumors require this approach while avoiding the more toxic side effects of augmented therapy in children who can be cured with treatment of standard intensity. Currently, risk-adapted therapy is used for almost all pediatric tumors. Treatment outcome is dependent not only on the therapy applied, but importantly, also on the underlying biology of the tumor and the host. Each of these variables must be factored into initial treatment decisions, as well as later refinements based on initial response, and several biological features. This review will discuss the most important variables that are currently used to design therapy for children with ALL as well as emerging data from transcript and protein profiling that might be applied to risk assignment in the future. It is recognized that with improvements in therapy, certain variables might lose their prognostic value; therefore, risk assignment plans should be routinely reassessed. Finally an optimal system should allow for comparison of the outcomes of similar, or identical patients, treated on different protocols. The contribution of acquired genetic changes in ALL blasts to the long-term outcome of treatment has been widely studied, and genetic subtype of ALL blasts (e.g., presence of the TEL/AML1 translocation, MLL rearrangements, hyperdiploidy) is well accepted as one of the features that is used to "individualize" therapyref. Although many of the mechanisms by which these acquired changes affect prognosis and response to therapy are unknown, their strong prognostic significance has led to use of these somatically acquired genetic variations to intensify (e.g., for MLL rearrangements) or to deintensify (e.g., TEL/AML1) therapy. Much less attention has been given to the role of germline, inherited genetic variation to the outcome of ALL therapy. It has been known that inheritance affects interindividual variability in response to specific drugs for almost 50 yearsref (Carson PE, Flanagan CL, Ickes CE, Alving AS. Enzymatic deficiency in primaquine-sensitive erythrocytes. Science. 1956;124:484–485; Evans DAP, Manley KA, McKusick VA. Genetic control of isoniazid metabolism in man. BMJ. 1960;2:485–491). Driven by phenotypic variation, the field of pharmacogenetics first developed in the absence of molecular biology. Pharmacogenetics is the study of how interindividual genetic variability affects interindividual differences in drug response. Based on a phenotype-to-genotype approach, it is understandable that the first important examples of pharmacogenetics were monogenic, relatively penetrant traits, and molecular biology eventually defined the molecular genetic basis of phenotypic variabilityref. Pharmacogenetics can now be conducted using a genotype-to-phenotype approach. The private initiative to sequence "the" human genome involved the sequencing of germline DNA from 5 individualsref. Related initiatives, from the Single Nucleotide Polymorphism (SNP) Consortium and other groupsref, indicate that there is no justification for the article "the" when referencing our genomes, and that each of us may differ from other human individuals on average every 300–1500 nucleotidesref. These interindividual differences in human genomes may have important functional consequences, and partly account for the ways in which individuals differ from one another in the risk of disease (e.g., in the risk of cancer) and in probability of favorable versus unfavorable outcomes for treatment of cancer (e.g., relapse versus remission; adverse effects versus none). With the technical improvements in assessing genomic variation, a genotype-to-phenotype approach may facilitate the elucidation of effects of multigenic variation on drug-induced phenotypes. Thus, there is increased interest in determining which of the millions of human genetic variations are functionally important, and which, if any, may be important for individualizing therapy for a number of diseases. Childhood ALL represents a disease that theoretically can benefit tremendously from individualizing dosages. Medications alone can cure the disease, otherwise uniformly fatal, in over 75% of patients; the medications have a narrow therapeutic range, with death from drug toxicity or second tumors being a significant cause of mortality (in addition to relapse); drug-induced adverse events can be dose-limiting in many cases; dose intensity is an important determinant of outcome; there is significant interpatient variability in systemic exposure to most of the antileukemic agents that have been examined; and there is proof of principle that adjusting dosages based on drug clearance improves ALL outcomesref. Therefore, genetic variants that affect the probability of cure versus adverse effects of antileukemic agents are likely to have an important impact on ALL outcomes. Differences in outcomes may be influenced by population polymorphisms in genes that influence the disposition of chemotherapy drugs (pharmacokinetics), or influence the response to these drugs (pharmacodynamics). It should also be noted that germline genetic variation may influence the probability of or the nature of the acquired genetic changes in ALL, thus influencing directly or indirectly the intrinsic sensitivity of the blasts. Approaches to establishing genotype/phenotype associations include genome-wide approaches and target gene approaches, in which a small number of genes are very "deeply" sequenced or a somewhat larger number of functionally important genotypes are assessed, haplotypes determined, and associations with phenotypes explored. Several genetic polymorphisms have been studied in childhood ALL.  Approaches to genotype/phenotype association studies :

    One of the key medications for treatment of ALL is 6-mercaptopurine (6MP). Thiopurine methyltransferase (TPMT) is a key enzyme in the metabolism of 6MP. TPMT activity is inherited as an autosomal codominant trait, and activity is polymorphic in all tissues and in all large populations studied to date. About 1 in 300 people are TPMT-deficient, and approximately 10% inherit intermediate TPMT activity due to heterozygosity at the TPMT locusref1, ref2, ref3, ref4. We and others have shown that, in over 90% of the cases, defective TPMT activity is due to inheritance of TPMT alleles containing at least 1 of 3 single nucleotide polymorphisms (SNPs)ref1, ref2. These SNPs have been shown to lead to enhanced protein degradation as the mechanism underlying low TPMT activityref. Individuals with both alleles carrying inactivating mutations (homozygous mutant) cannot methylate (inactivate) 6MP base, accumulate extremely high levels of active thioguanine nucleotides, and thus have unacceptable, life-threatening toxicity from normal doses of 6MP. The fate of TPMT heterozygotes was less clearly defined. In an analysis of 182 children (St Jude Children’s Research Hospital Protocol Total XII) receiving antimetabolite based therapy for ALL, we examined in detail the impact of 6-MP dosing and metabolism on outcome of treatment for ALLref1, ref2. The cumulative incidence of 6-mercaptopurine dose reductions for myelosuppression was highest among patients homozygous for TPMT deficiency (100% of patients), intermediate among heterozygous patients (35%), and lowest among wild-type patients (7%) (P < .001)—indicating that heterozygosity (present in 10% of the population) will have an impact on the optimal dose of 6MP. Importantly, in a further analysis of the same patient population, we showed that a higher dose intensity of 6-mercaptopurine was associated with improved event-free survival. In agreement with this, we also demonstrated a trend toward better survival in patients with at least 1 defective TPMT allele (who would be expected to have greater efficacy if treated with an equivalent or lesser dose of 6-mercaptopurine) compared with wild-type cases. Probability of complete remission vs thiopurine methyltransferase (TPMT) genotype on St Jude Children’s Research Hospital Protocol Total XII (SJCRH Total XII) :

    This same polymorphism has been linked to the occurrence of drug-induced second cancers among children with ALL. The incidence of malignant brain tumors is increased as much as 6-30 times in survivors of ALL, occurring almost exclusively in those who have received craniospinal irradiationref. In an analysis of patients enrolled on the St Jude Total XII treatment protocol, we reported a 12.8% incidence of brain tumors in irradiated patientsref. Importantly, the incidence of brain tumors was significantly impacted by TPMT genotype (42% versus 8.3% in defective and wild-type TPMT genotypes respectively; P = .0077). Among children with ALL, defective TPMT has also been associated with the risk of topoisomerase II inhibitor–induced secondary myeloid malignancies by 2 independent groupsref1, ref2. Follow-up laboratory studies indicate that thioguanine incorporation into an oligonucleotide DNA substrate, which would be higher in TPMT defective patients, affects the avidity of topoisomerase II–stabilized DNA cleavage, with and without etoposide presentref. Moreover, thioguanine substitution for guanine in DNA creates a structural modification in DNAref, which affects the interactions of multiple DNA-directed enzymesref1, ref2. Thus, there are multiple mechanisms whereby a pharmacogenetic polymorphism in TPMT could affect the disposition of 1 antileukemic drug (6MP), which could then in turn have profound effects on the adverse effects of other elements of therapy (e.g., topoisomerase II inhibitors, cranial irradiation). Thymidylate synthetase (TS) catalyzes the intracellular conversion of deoxyuridylate monophosphate to deoxythymidylate monophosphate, which makes it an essential enzyme in proliferating cellsref. Thymidylate synthase is the target of several anticancer drugs, including the widely used antileukemic agent, methotrexateref1, ref2, ref3. TS expression has been related to a germline polymorphism in the number of tandem-repeats in its enhancer, with the triple-repeat associated with increased expression of thymidylate synthase, which has been linked to poor antitumor response to the TS inhibitor 5-fluorouracil in adult gastrointestinal tumorsref. The TS enhancer polymorphism has been studied in children with ALLref32 and found to be associated with outcome in 205 children with ALL treated with a number of different methotrexate regimens. Individuals who were homozygous for the triple repeat had a poorer outlook than those with other genotypes (odds ratio 4.1, 95% Confidence Interval [CI] 1.9-9.0, P = .001). In a follow-up of this study, Lauten et al reported a case control study of the frequency of thymidylate synthase polymorphisms in 40 children who relapsed and 40 children with ALL successfully treated on Berlin-Frankfurt-Munster (BFM) protocolsref.33 This study found that the thymidylate synthase polymorphism had no impact on relapse of disease. The discrepant results from these 2 studies could be because of heterogeneity in the ALL cases, as both included subsets of patients enrolled on the respective treatment protocols, and the effect of thymidylate synthase polymorphism might be specific to particular molecular and immunophenotypic subsets of ALL. In addition, these studies included cases treated quite differently and it is possible that the polymorphism is important only in a specific therapeutic context. For example, higher doses of methotrexate were used in the BFM studies than in the treatment protocols used in the study of Krajinovic et alref, and it is possible that the use of high doses of methotrexate can overcome the adverse impact of the TS polymorphism. Methotrexate plays a central role in treatment of ALL, and several studies have indicated the importance of achieving high intracellular concentrations of this drugref1, ref2, ref3, ref4. Resistance to methotrexate can be a result of altered cellular uptakeref, and in vivo accumulation of methotrexate is related to the expression of the reduced folate carrier (RFC1)ref. A common G(80)A polymorphism has been described in the RFC1 gene, which encodes the major transporter for MTX influx into ALL blasts. In a study of 204 children with ALL treated on a heterogeneous group of treatment regimens, children with the A allele variant had worse event-free survival than patients with the GG genotype (P = .04)ref. However, patients homozygous for the A allele had higher levels of MTX (P = .004) than the other genotype groups. Thus, the role of this RFC polymorphism remains unclear, and may interact with other folate-related polymorphisms. A simplified diagram of methotrexate-related targets and enzymes illustrates the fact that multiple gene polymorphisms might interact to affect the pharmacodynamics of this critically important agent for ALL. Genes (italicized) whose products interact with methotrexate include DHFR (dihydrofolate reductase), GGH (gamma glutamyl hydrolase), MTHFR (methylenetetra-hydrofolate reductase), and FPGS (folylpoly-glutamate synthetase). All are subject to common genetic polymorphisms :

    The enzyme 5,10-methylenetetrahydrofolate reductase (MTHFR), which catalyzes the reduction of 5,10-methylenetetrahydrofolate to 5-methyltetrahydrofolate, is crucial to folate metabolism. A common polymorphism, a C-to-T substitution at nucleotide 677 (replacing alanine with valine)ref, reduces the activity of MTHFR but results in a much less severe phenotype than the rare mutations that cause severe MTHFR deficiency. Approximately 10% of Caucasians and 1.0% of African Americans are homozygous for the lower-activity alleleref. The 677T/T genotype has been associated with hyperhomocysteinemiaref1, ref2, especially in patients with low folateref1, ref2; a lesser effect of a second common SNP at 1298 (A > C) of MTHFR has also been demonstratedref. A higher incidence of gastrointestinal or hepatic toxicity following chronic, low-dose methotrexate has been noted among patients with the 677T alleleref1, ref2 although our preliminary analysis in a relatively small group (53) of children with ALL did not link MTHFR genotypes with MTX-associated neurotoxicityref. Thus, the role of MTHFR genotypes on MTX-related toxicity and efficacy is still somewhat unclear, and may depend upon the context (high-dose, low-dose, chronicity) of methotrexate therapy in the ALL trials of interest. Common polymorphisms have also been demonstrated in cystathionine beta synthaseref1, ref2 and dihydrofolate reductaseref. Conjugation of electrophilic compounds to glutathione, mediated by the family of glutathione S-transferase enzymes (GSTs), is an important detoxifying pathway for mutagens such as organophosphates (including pesticides), alkylating agents, epoxides, and polycyclic aromatic hydrocarbonsref1, ref2. The glutathione S-transferase mu(µ)1 (GSTM1) and the glutathione S-transferase theta 1 (GSTT1) genes are polymorphic in humans, and the phenotypic absence of enzyme activity is due to a homozygous inherited deletion of the generef. The frequency of the null phenotype varies by race, with approximately 50% of whites and 28% of blacks having the GSTM1 null phenotyperef. The frequency of the GSTT1 null phenotype is 15% in whites and 24% in blacks. It has been suggested that GST expression may play an important role in the outcome of therapy of leukemia as GSTs detoxify many of the drugs used to treat leukemia, and are involved more generally in "protecting the genome" from electrophilic oxidative damageref. In an immunohistochemical study of 71 cases of childhood ALL, ALL blast samples from 44 were negative for µ class GST; of these, 39 (82%) remained in remission. Of 27 patients who were positive for µ class GST, only 14 (52%) remained in remissionref, so that expression of µ class GST predicts a 3-fold increased risk of relapse (95% CI 1.25-7.26). Pharmacogenetic studies from St Jude Children’s Research Hospital in 197 children with ALL demonstrated that the null genotype for GSTM1, GSTT1, or both was not found to be a prognostic factor for disease-free survival or probability of hematologic remissionref. CNS relapse tended to be less common in those with the GSTM1 null genotype (P = .054), with similar borderline significant findings reported by the BFM group in a case-control designref. In an investigation from Children’s Cancer Group (CCG), we analyzed GST genotypes in 710 children with ALLref. Stratification of cases by age at diagnosis, sex, white blood cell count at diagnosis, B or T lineage, or cytogenetics revealed no differences in genotype frequencies. There were no differences in treatment outcomes according to GST genotype. Varying results from these clinical epidemiologic studies could again be because only subsets of patients enrolled on the respective treatment protocols were genotyped, and that the treatment regimens may differ in their dependence on glutathione conjugation. Cytokines modify the proliferation and activation of normal hematopoietic cells, and can stimulate or inhibit growth in hematological malignancies also. Plasma levels of TNF and IL-10 have been associated with therapy outcome in hematological malignancies and are influenced by genetic variation due to germline polymorphisms within the TNF and IL-10 genes. TNF and IL-10 genetic polymorphisms might therefore also influence clinical outcome in childhood ALL. In 214 childhood ALL patients,ref patients with a high-risk TNF haplotype were older than patients with low-risk haplotype (P = .024). No statistically significant associations were found between TNF haplotype and sex, white blood cell (WBC) counts, central nervous system involvement, immunophenotype, response to chemotherapy, and event-free survival. In contrast, Lauten et alref analyzed the association of TNF and IL-10 polymorphisms with response to initial treatment and risk of relapse in 135 children with ALL, treated according to BFM protocols. BFM trials use clearance of peripheral blood blasts in response to an 8-day course of prednisone for treatment stratification, and have shown rapid clearance of blasts to be a powerful prognostic indicator. The data showed that prednisone poor response was less frequent in patients with the IL10GG genotype, whereas no association of the risk of relapse and IL-10 genotype was found. In the total study group, patients expressing the TNF2 allele showed neither a statistically significant general association with prednisone response nor with risk of relapse compared to subjects homozygous for the TNF1 allele. Nevertheless, there was a higher risk of relapse in poor prednisone responders expressing the TNF2 allele compared to poor prednisone responders not expressing the TNF2 allele. The authors concluded that IL-10 genotype might influence prednisone response in patients with childhood ALL, whereas TNF genotype was associated with the risk of relapse in high risk ALL patients. The authors note that the number of cases in this study was small (n = 135) and the cases heterogeneous, and that further investigation in a larger more homogeneous population is necessary.
    Infections remain a serious and common complication of ALL therapy, and (after relapse) are the second most common cause of death among children with ALL. Infection risk may be increased due to polymorphisms involved in the pharmacokinetics of myelosuppressive antileukemic agents (resulting in abnormally high systemic exposure to active drug), or to polymorphisms in genes whose products are involved in protective immunity from pathogens. Polymorphisms in TNF, IL-10, and mannose binding protein have been linked to the risk of infection in other populationsref1, ref2, ref3, ref4 (Ackerman H, Usen S, Mott R, et al. Haplotypic analysis of the TNF locus by association efficiency and entropy. Genome Biol. 2003;4:R24.1–R24.13) but haven’t been fully evaluated in children with ALL. Many prior studies suffer from relatively small sizes, the possibility of selection bias because only subsets of patients have been studied, lack of multivariate analyses including other known prognostic factors, and lack of accounting for race and population substructure. The ability to genotype at multiple polymorphic loci, many of which display remarkable racial/ethnic diversity in the frequencies of variant alleles, complicates the use of multivariate analyses. For example, perhaps some of the inferior outcome in blacks compared to whites, which has been reported by several groupsref1, ref2, ref3, is in fact due to different polymorphic allele frequencies. Thus, adjusting or stratifying for race might obscure an important relationship between allele frequency and outcome. Additional analyses are necessary to determine the association of prognostically important, acquired genetic abnormalities in the ALL blasts (e.g., TEL-AML1, t(4;11), t(9;22) etc) with the frequency of specific germline genetic polymorphismsref1, ref2. As is true for the associations of race with genotypes, associations of acquired molecular defects with germline genotype frequencies will greatly complicate the handling of data in genotype/outcome analyses. The optimal methods for analyzing large genotype/phenotype association studies have not yet been demonstrated.

    III. EXPRESSION PROFILING IN ACUTE LEUKEMIA
    In most contemporary treatment protocols the different genetic subtypes of pediatric acute leukemia are treated using so-called risk adapted therapy—that is, therapy in which the intensity of treatment is tailored to a patient’s relative risk of relapse. Critical to the success of this approach is the accurate assignment of individual patients to specific risk groups. Unfortunately, this is a difficult and expensive process requiring a variety of laboratory studies including morphology, immunophenotyping, cytogenetics, and molecular diagnostics. With the recent development of expression microarrays it should now be possible to take a genome-wide approach to leukemia classification. This approach not only offers the potential of an efficient diagnostic platform for identifying the known prognostic subtypes of leukemia, but should also help us to identify specific gene signatures that will allow us to more accurately identify those individual patients who are at a high risk of relapse. In addition, this approach offers the potential of providing unique insights into the altered biology underlying the growth of the leukemic cells. However, before this methodology can be applied in the clinical setting significant developmental work remains to be done. Importantly, a number of methodological issues must be considered in both the design and analysis of these studies. In this lecture, I will address some of the more important methodological issues and will then summarize the gene expression data that has been generated in my own laboratory on pediatric acute leukemias.
    Methodological Considerations : expression microarray platforms, either cDNA- or oligonucleotide-based, result in the collection of expression values for a large number of genes, varying from several hundred up to 33,000 genes depending on the specific microarray platform being used. For leukemias, analysis is typically performed on leukemic cells isolated from either a diagnostic bone marrow aspirate or a peripheral blood sampleref1, ref2, ref3, ref4, ref5, ref6. Typically, the leukemic cells are partially purified away from more mature hematopoietic cells by density gradient centrifugation prior to analysis. The leukemic cells are then either processed immediately to isolate total RNA, or frozen as viable cell suspensions and the RNA isolated at a later time. A number of variables affect the expression profile obtained from a clinical sample. These include, but are not limited to, the percentage of leukemic cells, the time between obtaining the sample from a patient and either freezing or isolating RNA, the quality of RNA extracted, and the methods used for labeling the RNA and detecting the hybridized signals. One of the most important variables is the percentage of leukemic blasts within the sample. Since our goal is to obtain the expression profile of the leukemic cells, we strive to ensure that the sample being analyzed consists of a majority of leukemic blasts. For our initial exploratory studies in pediatric ALL our criteria for inclusion in the study has been to restrict our analysis to samples that contain a minimum of 70% lymphoblasts. A second critical variable is the time between obtaining the sample and isolating RNA. Experimental data have demonstrated changes in the expression profile of freshly isolated leukemic blasts compared to those placed on ice or stored at room temperature for extended periods of time. The longer a sample sits prior to RNA extraction the greater is the change in the expression profile. Moreover, the extent of change in the expression profile can vary significantly between leukemia subtypes. This is a confounding variable that for many retrospective studies cannot be controlled. Thus, it is important to know that it exists and to ensure that the interpretation of the results of an expression profiling study takes this into account. It is also important to use an RNA extraction procedure that provides high-quality RNA and to rigorously assess not only the quality and purity of the RNA, but also the efficiency of labeling and hybridization to the microarray. Last, variation in expression profiling can result from a variety of technical issues. Therefore, to minimize these variations it is important to assess the reproducibility of data acquisition throughout an experiment. This can easily be done by analyzing replicate samples at multiple points during the experiment.

    Prognosis prediction by expression profiling : the data presented above demonstrate that expression profiling can provide prognostic information by accurately identifying known prognostically important subtypes of both ALL and AML. What remains to be proven, however, is whether expression profiling can also provide independent prognostic information. Studies on a variety of other types of cancer including breast, colon, prostate, and melanoma suggest that this should be possible. In fact, early work on ALL suggests that expression signatures can be identified within specific genetic subtypes of ALL that predict whether a patient will have a high risk of relapsingref1, ref2, ref3, ref4. These data, however, should be considered preliminary. For these types of studies to be considered "validated," it will be essential to first make sure that the data has been checked on a blinded test set. Beyond this, it will also be necessary to show that the expression signatures accurately predict prognosis in an independent dataset that has been generated in a second laboratory. The latter requirement is necessary to make sure that no unrecognized confounding variables are inappropriately influencing the interpretation of the data. Last, for a prognosisassociated expression profile to be of clinical value it will need to be determined if it is specific for a particular therapeutic regimen, or alternatively, predicts prognosis irrespective of the specific therapy being used. Although a number of groups are pursuing these types of studies, it is likely to be years before this type of analysis moves into the clinic.
    Gene expression profiling is yielding a view of the leukemia cells that is not only providing insights into pathogenesis, but is also providing new diagnostic markers and therapeutic targets. In the not too distant future, this information should begin to have a major impact on the way we diagnose and treat leukemia patients. Although considerable work remains to be done before these predictions are realized, our ability to acquire and appropriately analyze this type of data continues to mature at a rapid pace. Thus, the fruits of gene expression profiling should soon help us to accurately identify specific leukemia subtypes, and to select therapies targeted to the underlying molecular lesions or their altered downstream consequences.
    Over the past 3 decades, remarkable advances have been made in the treatment of ALL in children. Yet significant challenges remain. Although the use of modern combination chemotherapy and post-induction therapeutic intensification now yield long-term remissions in nearly 75% of children affected by ALL, 25% ultimately relapse with disease that is highly refractory to current therapyref. Conversely, another 25% of children with ALL who now receive dose intensification are likely "overtreated" and may well be cured using less intensive regimens resulting in fewer toxicities and long-term side effects. Thus, a major challenge for the treatment of children with ALL in the next decade is to improve and refine ALL diagnosis and risk classification schemes in order to precisely tailor therapeutic approaches to the biology of the tumor and the genotype of the host. Current risk classification schemes in pediatric ALL use clinical and laboratory parameters such as patient age, initial white blood cell count (WBC), and the presence of specific ALL-associated cytogenetic or molecular genetic abnormalities to stratify patients into groups at increasing risk for relapse or treatment failureref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9, ref10. NCI risk criteria are first applied to all children with ALL, dividing them into categories based on age and initial WBC at disease presentation : In addition to these more general NCI criteria, classic cytogenetic analysis or molecular genetic detection of more frequently recurring cytogenetic abnormalities has been used to stratify B precursor ALL patients more precisely into low, standard, high, and very high risk categories. These chromosomal aberrations primarily involve structural rearrangements (translocations) or numerical imbalances (hyperdiploidy—now assessed as specific chromosome trisomies, or hypodiploidy). Alternatively, the rate of disappearance of both B precursor and T ALL leukemic cells during induction chemotherapy (assessed morphologically or by other quantitative measures of residual disease) has also been used as an assessment of early therapeutic response and as a means of targeting children for therapeutic intensificationref1, ref2, ref3, ref4, ref5, ref6, ref7. In new risk classification schemes employing all of these factors in the Children’s Oncology Group, children with B precursor ALL with "low-risk" disease (22% of all B precursor ALL cases) are defined as having standard NCI risk criteria, the presence of low risk cytogenetic abnormalities (t(12;21)/TEL;AML1 or trisomies of chromosomes 4, 10, and 17), and a rapid early clearance of bone marrow blasts during induction chemotherapy. Children with "standard risk" disease (50% of ALL cases) are NCI standard risk without "low-risk" or unfavorable cytogenetic features, or are children with low-risk cytogenetic features who have NCI high-risk criteria or slow clearance of blasts during induction. Although therapeutic intensification has yielded significant improvements in outcome in these two risk groups, it is likely that a significant number of these children are currently "overtreated" and could be cured with less intensive regimens resulting in fewer toxicities and long-term side effects. Conversely, a significant number of children even in these good-risk categories still relapse and a precise means to prospectively identify them has remained elusive. "Standard-risk" disease in particular is highly heterogeneous both in clinical and molecular genetic features. Nearly 30% of children with ALL have "high-risk" disease, defined by NCI high-risk criteria and the presence of specific cytogenetic abnormalities; again, precise measures to distinguish children more prone to relapse in this heterogeneous group have not been established. Finally, in a minority (approximately 3%) of children with B precursor ALL, a very poor outcome has been associated with certain "poor prognosis" cytogenetic abnormalities (t(9;22), hypodiploid DNA content < 45 chromosomes). While T ALL cases have not been traditionally divided into distinct risk groupings similar to B ALL, recent gene expression profiling studies published by others (Weiss SM and Kulikowski CA. Computer systems that learn. San Francisco, CA: Morgan Kaufmann Publishers, Inc; 1991) indicate that distinct intrinsic biologic clusters of T ALL cases can be defined. Recurrent genetic subtypes of B- and T-cell ALL :
    subtype
    associated genetic abnormalities
    frequency in children
    risk category
    B-precursor ALL hyperdiploid DNA content; trisomies of chromosomes 4, 10, 17 25% of B precursor cases low
    t(12;21)(p13;q22): 28% of B precursor cases low
    TEL/AML1 11q23/ rearrangements; 4% of B precursor cases; high
    particularly t(4;11)(q21;q23) < 80% of infant ALL
    t(1;19)9q23;p13) – E2A/PBX1 6% of B precursor cases high
    t(9;22)(q34;q11): BCR/ABL 2% of B precursor cases very high
    hypodiploidy relatively rare very high
    B-ALL t(8;14)(q24;q32) – IgH/MYC 5% of all B lineage ALL cases high
    T-ALL numerous translocations involving the TCR ß (7q35) or TCR  (14q11) loci 7% of ALL cases not clearly defined
    Thus, despite the refinement of risk classification schemes employing cytogenetics and the rate of clearance of leukemic blasts or other measures of minimal residual disease, current diagnosis and risk classification schemes in pediatric ALL remain imprecise. Children with ALL more prone to relapse who require more intensive approaches and children with low-risk disease who could be cured with less-intensive therapies are not adequately predicted by current classification schemes and are distributed among all currently defined risk groups and a precise means to prospectively identify such children has remained elusive. As striking differences in therapeutic response and outcome may still be observed in ALL patients with the same cytogenetic profile or within the same risk classification group, it is likely that other molecular genetic abnormalities and functional activation or inactivation of critical cellular pathways (cell signaling, cell cycle regulation, adhesion, DNA repair, apoptosis, drug resistance) in leukemic cells also impact disease biology and therapeutic response. Thus, many investigators in this field are engaged in applying large-scale genomic technologies that measure global patterns of gene expression in leukemic cells to acquire systematic gene expression profiles and sets of genes that can be used for improved diagnosis and risk classification in pediatric ALL and for the prediction of therapeutic response or resistance in individual patients.
    Funded under the NCI Director’s Challenge Program: Toward a Molecular Classification of Tumors (NCI CA88361: Molecular Taxonomy of Adult and Pediatric Acute Leukemia; PI: CL Willman, Co-PI: WM Carrollref), our investigative team has recently completed comprehensive gene expression profiling in two large statistically designed, retrospective cohorts of pediatric ALL patients, designed by Dr. Jon Schuster, registered to clinical trials previously coordinated by the Pediatric Oncology Group (POG): (1) a cohort of 127 infant leukemias; and (2) a case control study of 254 pediatric B-precursor and T-cell ALL cases. Using both unsupervised learning tools and novel data visualization techniques to discover intrinsic biologic clusters of ALL and supervised machine learning algorithms and statistical methods to model gene expression profiles associated with clinical characteristics, cytogenetics, and therapeutic response, we have made a number of novel and potentially important discoveries. We have identified novel intrinsic biologic clusters of ALL and novel genes that are strongly predictive of outcome. These discoveries are providing us with new tools and approaches to refine and improve molecular diagnosis and risk classification in pediatric ALL that will be implemented and tested prospectively in the context of Children’s Oncology Group (COG) clinical trials within the next 5 years.
    Gene expression studies in infant acute leukemia: novel biologic clusters and genes predictive of outcome. Over the past 3 years, many investigative teams have developed reproducible methods for leukemia blast purification, RNA isolation, linear amplification, and hybridization to oligonucleotide and printed cDNA microarrays. Our approach is a modification of a double amplification method originally developed by Ihor Lemischka and colleagues from Princeton University (protocols available at the NCI Director’s Challenge). Using Affymetrix (U-95A.v2) oligonucleotide arrays, we have obtained gene expression profiles from ALL patient cohorts in the KUGR (Keck University of New Mexico [UNM] Genomics Resource) housed in the UNM Cancer Research Facility (http://hsc.unm.edu/som/micro/genomics). We have used powerful, multidimensional unsupervised learning algorithms and data visualization tools (VxInsight, principal component analysis)ref1, ref2 for class discovery and for the identification of intrinsic biologic clusters of pediatric leukemia. Supervised computer learning methods (primarily Bayesian analysis of gene expression networks, support vector machines [SVM], and neuro fuzzy logic)ref  (Bishop C. Neural Networks for Pattern Recognition. New York, NY: Oxford University Press; 1995; Guyon I, Weston, J, Barnhill S, Vapnik V. 2002. Gene selection for cancer classification using support vector machines. Machine Learning. In press) were used to identify genes and groups of genes that were significantly associated with various parameters (outcome, specific cytogenetic abnormalities, etc) by our collaborators at UNM and Sandia National Laboratory.
    Infant leukemia cohort studies : in the 2 POG infant trials, 142 retrospective cases (9407 for infant ALL; 9421 for infant AML) were initially chosen for analysis in our infant leukemia cohort. Infants as defined were < 365 days in age and had overall extremely poor survival rates (< 25%). Of the 142 cases, 127 were ultimately retained in the study; 15 cases were excluded from the final analysis due to poor quality total RNA, cRNA amplification, or hybridization. Of the final 127 cases analyzed, 79 were considered traditional ALL by morphology and immunophenotyping and 48 were considered AML. Of the 127 cases, 59 had rearrangements of the MLL gene. Nonsupervised learning tools for hierarchical clustering of gene expression data and other clustering approaches are most useful for the discovery of intrinsic biology in patient cohorts and discovery of coincident patterns of gene expression. However, most unsupervised hierarchical clustering algorithms are not powerful enough to resolve multiple clusters in very large datasets (> 12,000 genes in > 100 cases) without the investigator first selecting a more limited subset of expressed genes on which to actually perform clustering (usually < 100), which may introduce significant bias and limit the analysis. In the retrospective infant leukemia cohort, nearly 7000 of the 12,625 genes and ESTs on the Affymetrix U95A.v2 chip were expressed at significant levels in at least 1 of the 127 infant leukemia cases. To attempt to avoid bias by limiting gene selection and to use higher dimensional methods for discovery of inherent clusters of patients based on common gene expression patterns, we turned to 2 methods: (1) principal component analysis (PCA: see Bioinformatics Core), and (2) VxInsight, a new and very powerful tool for nonsupervised clustering and visualization of genomic data developed by our collaborators at Sandia National Laboratory. VxInsight has the capacity to cluster patients or genes, using all of the gene expression data without having to select smaller subsets of genes for actual clustering, in a novel and intuitive way. When VxInsight or PCA was applied to the infant leukemia dataset, we discovered that there were 3 statistically significant, intrinsic biologic groups of infant leukemia and that these intrinsic biologic groups could not simply be predicted by ALL versus AML labels or by the presence or absence of cytogenetic abnormalities involving MLL as these labels were distributed among each of the intrinsic biologic clusters. Importantly, when an alternative multidimensional clustering method (PCA) was used on this dataset (data not shown), we also identified 3 distinct clusters. And as the membership between the clusters defined by VxInsight and PCA was 99% correlated, it was satisfying that 2 highly different multidimensional clustering algorithms yielded a highly similar result. In each cluster, an individual patient is represented by a pyramid (highly similar patients in each cluster will be overlapping in this "high level" view). By querying VxInsight in "real-time," we could determine which cases had been assigned an ALL versus an AML label or which cases had rearrangements of the MLL gene. In addition, we performed the ANOVA (analysis of variance) function in VxInsight to provide the most statistically significant genes that distinguish each cluster. The top cluster of cases, which we refer to as cluster A, contained 21 infant leukemia cases, 16 of which had been labeled ALL and 5 AML. The shared gene expression profile of these cases is unique when compared with the other clusters and is highly similar to recent reports of gene expression profiles in very primitive hematopoietic stem cells (HSC)ref. The gene expression profile shared by cases in cluster A reflect the earliest hematopoietic antigens (high EPOR, AML1, KIT, CD34, FLK1, and HOX family members) as well as a number of genes associated with the development of endothelial cells, leading us to speculate that this distinct group of infant leukemias may have arisen by transformation of very primitive HSCs or even the HSC-endothelial cell precursor, the hemangioblast. Perturbations of the TGF-ß/bone morphogenetic protein/SKI oncogene pathway involved in early mesoderm development are unique to this cluster and may provide new insights into novel therapeutic approaches. Interestingly, the majority of MLL-containing cases in this group were t(4;11) variants. The leftmost cluster contains 52 cases, 51 of which were ALL. Many of these cases contained a t(4;11) or other MLL variant, but the gene expression profile of the t(4;11)-containing cases in this cluster were quite distinct from those in cluster A and are quite similar to the gene expression profiles obtained by Armstrong and colleaguesref. The cases in this relatively homogeneous cluster, which we refer to as cluster B, share a gene expression profile reflective of a committed B lymphocyte precursor, more differentiated than the cases in cluster A. Finally, the third distinct cluster of infant cases (Figure 8A, blue, bottom right) is quite heterogeneous, containing 54 cases, 42 with AML, and 12 with ALL morphology. The MLL variants seen in this group were more frequently t(9;11) and other MLL rearrangements. The shared gene expression profiles distinguishing these interesting cases include expression of many members of the RAS family and genes that impinge upon, regulate, or are regulated by RAS. In addition to RAS signaling pathways, cases in this cluster are also characterized by expression of several DNA repair and GST genes, leading us to speculate that it is this group of infant leukemia cases that might uniquely result from environmental exposures. Clearly the 3 intrinsic biologic groups of infant leukemia that we have identified through gene expression profiling are not predicted by ALL or AML labels or MLL-containing cytogenetic abnormalities. If validated, the distinct sets of genes that can be used to identify each biologic group represent potentially important diagnostic and therapeutic targets. Using supervised learning techniques, we have also identified genes that are predictive of outcome at initial diagnosis in this infant cohort. While we could not statistically model outcome using all of the cases combined (ALL and AML) or when cases were divided into morphologically-defined groups (AML versus ALL), we could best model genes predictive of outcome when cases were grouped (conditioned) on which VxInsight (or PCA) cluster that they were assigned to (P = .01), providing further evidence that the VxInsight cluster assignment has biologic and clinical validity.
    Gene expression studies in pediatic ALL : novel biologic clusters and new genes strongly predictive of outcome at initial diagnosis. To obtain gene expression profiles associated with outcome in a statistically significant fashion, we developed a care control cohort design that could compare and contrast gene expression profiles in distinct cytogenetic subgroups of ALL patients who either did or did not achieve a long-term remission (for example, comparing children with t(4;11) who failed versus those who achieved long-term remission. The design developed by Dr. Jonathan Shuster was constructed to look at a number of small independent case-control studies within B precursor ALL. These included t(4;11), t(9;22), t(1;19), monosomy 7, monosomy 21, female, male, African American, Hispanic, and POG AlinC15 arm A. Cases were selected from several completed POG clinical treatment trials, but the majority of cases came from the POG 9000 series. As standard cytogenetic analysis of the samples from patients registered to these older trials would not have usually detected the t(12;21), we performed RT-PCR studies on a large cohort of these cases to select ALL cases with t(12;21) who either failed therapy (n = 8) or achieved long-term remissions (n = 22). Cases who "failed" had failed within 4 years while "controls" had achieved a complete continuous remission of 4 or more years. A case-control study of induction failures (cases) versus complete remissions (CRs; controls) was also included in this cohort design as was a T-cell cohort. It is very important to recognize that the study was designed for efficiency and maximum overlap, without adversely affecting the random sampling assumptions for the individual case-control studies. As for the infant leukemia cases, gene expression arrays were completed using 2.5 mg of RNA per case (all samples had > 90% blasts) with double linear amplification. All amplified RNAs were hybridized to Affymetrix U95A.v2 chips.
    Excellent studies previously published by Yeoh et alref and Ross et alref have found that the gene expression profiles of pediatric ALL cases cluster according to the recurrent cytogenetic abnormalities associated with this disease, and thus, that cytogenetics essentially define the intrinsic biologic groups of disease. However, Yeoh et al first used supervised learning algorithms (primarily support vector machines) to identify expressed genes that were associated with each recurrent cytogenetic abnormality in ALL.27 Using a highly selected set of 271 genes that resulted from this supervised learning approach, hierarchical clustering or PCA was performed. As would be expected from this approach, distinct ALL clusters could be defined based on shared gene expression profiles and each cluster was associated with a specific cytogenetic abnormality. Similar to this approach, we also first used supervised learning methods for class prediction (Bayesian networks, support vector machines) to identify a set of 147 genes from the POG ALL case control study that could predict for the presence of the most frequent cytogenetic abnormalities seen in ALL; clustering with this limited set of genes did indeed yield clusters that correlated relatively well with specific karyotypes. However, when we performed a full unsupervised learning approach (VxInsight, principal component analysis), we discovered 9 novel biologic clusters of ALL (2 distinct T ALL clusters and 7 distinct B precursor ALL clusters) each with distinguishing gene expression profiles (Mosquera-Caro MP, Helman P, Veroff RV, et al. Identification, validation, and cloning of a novel gene (opal1) and associated genes highly predictive of outcome in pediatric acute lymphoblastic leukemia using gene expression profiling [plenary session abstract]. Blood. In press). 2 distinct clusters of T lineage ALL (Figure 9, S1 and S2, and 7 distinct B precursor ALL clusters (A, B, C, X, Y, Z)) were identified. Using ANOVA, we identified over 100 statistically significant genes uniquely distinguishing each of these cohorts; review of these lists of genes reveals many interesting signaling molecules and transcription factors. While there were some trends, no cytogenetic abnormality precisely defined any specific cluster. Cases with a t(12;21) or hyperdiploidy, both conferring low risk and good outcomes, tend to cluster together and were seen primarily in clusters C and Z as well as the top component of the X cluster. On the terrain map from VxInsight (Figure 9, top), these 3 cluster regions (C, Z, and X) are actually fairly closely approximated indicating they are more related than for example cluster C to cluster S2. Similarly, the t(1;19) cases clustered in Y had a poorer outcome than those in clusters A and B. Finally,