Technical outline of SAGE
Starting from purified cell populations, mRNA will be isolated by magnetically labelled oligo (dT) nucleotides and transcribed into double-stranded cDNA (Figure 1A, B). cDNA molecules will be digested by the NlaIII enzyme, which recognizes a CATG motif and only 3’cDNA fragments will be retained within a magnetic field (due to magnetic oligo(dT) label) while all other fragments will be washed out. As CATG motifs are spaced in average by 256 (44) bp, the retained 3’ESTs have a length centered around 250 bp. Two aliquots of 3’ESTs will be ligated to double-stranded linker molecules of a known sequence (Figure 1C).Thereafter, linker-ligated 3’ ESTs will be cut by the BsmFI enzyme, which also recognizes CATG but cuts 10 bps downstream from the recognition site (Figure 1D). This digestion yields 51 bp-long fragments, which consist of 37 bp linker molecules, the CATG motif and 10 bp of an unknown cDNA sequence, the so-called SAGE-tag. Both aliquots of linker-ligated 3’ESTs will be ligated together to form ditags, which are flanked by known linker sequences (Figure 1F). Linker-ligated ditags (102 bp) can be amplified using linker-specific primers in a low-cycle PCR (Figure 1G). After further NlaIII-digestion, linkers can be removed and 28 bp ditags will be released an gel-purified (Figure 1H). Due to complementary 5’ and 3’ ends, ditag molecules can be ligated to form large concatemer molecules, which carry the information of up to 60 expressed genes. Concatemers are also gel-purified (Figure 1I), cloned into a vector. Plasmids containing a concatemer insert of sufficient length (Figure 1J) are subsequently sequenced.
Figure1: Schematic Overview over the SAGE technique

Following the SAGE-analysis, SAGE-tags will be extracted using the SAGE2000 software (www.sagenet.org). SAGE-tags can be matched to the reference database UniGene (www.ncbi.nlm.nih.gov/UniGene/) in order to assign expressed genes to SAGE-tag sequences. In average, about 10 percent of SAGE-tags have multiple matches in UniGene and do not allow unambiguous identification of an expressed gene (see Table 2). In addition, about one third of all SAGE-tags do not have a UniGene match at all, suggesting that these SAGE-tags are derived from novel transcripts (novel isoforms or novel genes; Table 2). In the case of multiple matches or no match, SAGE-tags can be extended to longer 3’ESTs by a 3‘EST PCR strategy (Chen et al., 2000).
PCR-strategy to extend SAGE-tags to longer 3’ESTs
In a previous study (Müschen et al., 2002), we have extended 96 SAGE-tags matching multiple genes to 3’ESTs. Using the SAGE-tag as a forward and an anchored oligo (dT) nucleotide as a reverse primer, longer 3’ESTs can be amplified (see below).

In the follow-up of 96 pre-B cell-specific SAGE-tags with multiple matches in UniGene, PCR amplification was successful in 83 cases and yielded a correct identification of the expressed genes. The collection of these SAGE-tags is available at:
http://www.pnas.org/cgi/content/full/152327399/DC1/1
This strategy is also suitable to extend SAGE-tags without any match to a known gene and will be used to extend SAGE-tags without match in UniGene, and which are differentially expressed by more than 25-fold. This approach was established based on 62 SAGE-tags lacking a match in UniGene, which are more than 25-fold differentially expressed comparing pre-B cells to HSC. The length of the generated 3’ESTs ranged between 100 and 300 bp. After extension to longer 3’ESTs, SAGE-tags become accessible to a functional screen by RNA interference, which is based on 23 bp interfering RNA molecules.
Functional screening of 3’ESTs by RNA interference (RNAi)
RNA interference’ (RNAi) describes the specific suppression of genes by complementary dsRNA (for review, see Sharp 2001). Although the mechanism by which dsRNA suppresses gene expression is not entirely understood, RNAi can provide phenotypic knockouts of target genes at a large-scale level. It appears that longer dsRNA is processed into 23 bp dsRNA (called small interfering RNA or siRNA) by the so-called dicer enzyme containing RNase III motifs (Bernstein et al. 2001). The siRNA apparently then acts as a guide sequence within a multicomponent nuclease complex to target complementary mRNA for degradation (RNA-induced silencing complex, RISC; Hammond et al. 2001). For siRNA studies in human cells, two 21-mer RNAs with 19 complementary nucleotides and 3' terminal noncomplementary dimers of thymidine or uridine (Elbashir et al. 2001) are typically used. The antisense siRNA strand is fully complementary to the mRNA target sequence. The antisense siRNA strand is the reverse complement of the target sequence. The sense strand of the siRNA is the same sequence as the target mRNA sequence except that it will lack the 5' AA sequence (see below).
Construction of small interfering RNA molecules
| 5' | - | A | A | C | G | A | U | U | G | A | C | A | G | C | G | G | A | U | U | G | C | C | - | 3' | target mRNA sequence | ||||
| 5' | - | C | G | A | U | U | G | A | C | A | G | C | G | G | A | U | U | G | C | C | U | U | - | 3' | sense strand siRNA | ||||
| 3' | - | U | U | G | C | U | A | A | C | U | G | U | C | G | C | C | U | A | A | C | G | G | - | 5' | antisense strand siRNA |
Gene expression profiling of normal and malignant B cell development
Unlike many other cell types, the development of B cells can be studied at a number of distinct checkpoints. These checkpoints are defined by the presence and pattern of immunoglobulin gene rearrangements, somatic mutations within Ig V region genes, class switch recombination and a growing number of surface molecules, which may be used for the distinction of developmental stages.
Previous work: Using immunomagnetic beads, we have purified CD34+ hematopoietic stem cells (HSC), common lymphoid progenitor cells (CLP; CD10+ CD7- CD19-) and pre-B cells (CD10+ CD19+) from human bone marrow, CD5+ CD19+ B-1 cells from umbilical cord blood, CD19+ CD27- Naïve B cells, CD19+ CD27+ Memory B cells and CD138+ plasma cells from tonsils. mRNA was extracted from these B cell subsets and subjected to SAGE-analysis in an attempt to generate a genome-wide portrait of normal B cell development. With the exception of plasma cells, the SAGE-profiles have been completed. The profiles of normal B cell subsets should also serve as a baseline for further comparison with B cell-derived leukemia and lymphoma entities. Recently, SAGE-profiles from each two cases of t(9;22)+ [BCR-ABL] pre-B cell leukemia and t(8;14)+[c-myc-IgH] Burkitt’s lymphoma have been completed and are currently being compared with their normal counterparts.
Specific aims:
I. Using SAGE, we generate genome-wide gene expression profiles covering major steps of B cell development. The comparison of these SAGE-profiles will identify known and novel genes that define specific checkpoints during B cell differentiation, including
- B cell lineage commitment,
- selection and affinity maturation within the germinal center
- establishment of memory,
- terminal plasma cell differentiation.
II. The comparison of SAGE-profiles of B cell lineage leukemia and lymphoma cases to their normal counterparts aims at the identification of novel transforming events that initiate and/ or promote malignant progression.
Research program: We are using SAGE in combination with gene identification techniques to screen for potentially interesting candidate genes that are involved in normal and/ or malignant B cell development. From the SAGE-data, we select SAGE-tags for gene identification that i.) do not match a known UniGene cluster and ii.) that are > 10-fold differentially expressed as compared to a reference SAGE-profile. Using the GLGI technique (Chen et al., 2000), we extend SAGE-tags to 200-300 bp 3’ESTs that can serve as a template for 5’RACE to ultimately identify the full coding sequence of a putatively novel gene. The differential expression of the selected genes will be verified using a self-made microarray that carries PCR-derived probes from the cDNAs of interest (microarray facility at the Institute for Genetics, Cologne). Among the newly identified genes, for which differential expression could be verified, we select those that give indication of functional relevance in a high-throughput RNA interference approach (RNAi; Elbashir et al., 2001) and based on bioinformatic protein domain prediction (Conserved Domain Data base, CDD). Further functional studies include the generation of specific antibodies using a novel high throughput in vitro translation system (Expressway; Invitrogen) to perform stainings of lymph node and tonsillar tissue sections and -eventually- the generation of a mouse model.