Main
RNA polymerase II (RNAPII) extensively transcribes mammalian genomes, yielding a wide range of unadenylated and polyadenylated RNAs1,2,3,4. Moreover, individual transcription units that generate standard full-length transcripts also give rise to an array of shorter isoforms9,10. Thus, functional RNAs are produced alongside a wealth of futile RNAPII products. Whereas mature functional pA+ RNAs, such as protein-coding mRNAs, are exported from the nucleus to the cytoplasm, their non-functional counterparts are typically retained and degraded3,4. This is primarily achieved by the nucleoplasmic PAXT connection, which consists of a heterodimeric core of the RNA helicase MTR4 and the Zn-finger protein ZFC3H18,11. Additional, and less well-described, interactions with the nuclear pA+ RNA-binding protein PABPN1 and other transiently interacting RNP components, sometimes referred to as extended PAXT components, may aid in directing transcript turnover by the 3′−5′ exonucleolytic exosome complex12,13,14,15. However, how these interactions provide a biochemical basis by which PAXT distinguishes non-functional pA+ RNAs remains a major unresolved question.
Prior to their nuclear export, pA+ RNAs are packaged with proteins into pA+ RNPs. Central to this process is the export factor and DExD-box ATPase UAP56, which is recruited to pA+ RNPs in preparation for their nuclear export5,16. At the nuclear envelope, the activity of the nuclear pore complex (NPC)-associated GANP–PCID2–SEM1 (GANP–PS) trimer of TREX-26 facilitates the release of RNA from UAP56, enabling export7,17. Again, how pA+ RNP sorting is orchestrated to favour the selected export of functional pA+ RNAs is unknown.
Here we interrogate two TREX-2-like human complexes, SAC3D1–PCID2–SEM1 (SAC3D1–PS) and LENG8–PCID2–SEM1 (LENG8–PS), in which the conserved SAC3D1 and LENG8 proteins, respectively, replace GANP. The GANP–PS, SAC3D1–PS and LENG8–PS complexes are structurally similar and share the ability to release UAP56 from RNA. Notably, we show that LENG8–PS offers PAXT a module, that acts on UAP56 to promote transcript turnover in contrast to the RNA export activity of TREX-2. Our findings reveal that nuclear pA+ RNA export and decay utilize a shared biochemical mechanism to act on pA+ RNPs but with fundamentally different outcomes. Based on the substrate preference of PAXT and its separate nuclear localization from TREX-2, we propose a general model for pA+ RNP fate determination.
TREX-2-like complexes release RNA from UAP56
To mediate the docking of export-competent pA+ RNPs at the NPC, UAP56 binds the five subunit TREX-2 complex7 (GANP, PCID2, SEM1, CETN2 or CETN3, and ENY2). Within this complex, UAP56 contacts the TREX-2 complex core (TREX-2M), which comprises PCID2, SEM1 and the SAC3 domain of the scaffolding subunit GANP7 (Fig. 1a, left). The ability of TREX-2M to release UAP56 from the pA+ RNP depends on the conserved ‘wedge loop’ within the SAC3 domain7,17 (Fig. 1b). Notably, similar SAC3 domains are found in the UAP56-interacting LENG8 and SAC3D1 proteins7. Although they are broadly conserved amongst eukaryotes, these proteins share no sequence features with GANP or each other aside from the SAC3 domain (Fig. 1a,b). Moreover, proteome-wide AlphaFold2 screens suggested that SAC3 domains of LENG8 or SAC3D1 form complexes with PCID2 and SEM118,19,20, thus mimicking TREX-2M. Finally, and central to the present study, LENG8 co-immunoprecipitated with PAXT core components ZFC3H1 and MTR48,21 (also see Fig. 2 below) and was shown to interact with PCID2 and SEM1 in both human and yeast cells22,23. Collectively, this prompted us to investigate these TREX-2M-like complexes in more detail.
Given the critical role of UAP56 in pA+ RNP export via TREX-2, we hypothesized that LENG8 and SAC3D1 might target UAP56-bound RNPs to different cellular fates. To address this, we first explored the structure–function relationships of LENG8 or SAC3D1 with UAP56 in vitro. As previously achieved for GANP7, we purified stable recombinant complexes of the SAC3 domain-containing constructs of LENG8491–800 or SAC3D148–404 in the presence of PCID2–SEM1 (constituting LENG8–PSM or SAC3D1–PSM; Supplementary Fig. 2a,b). Both complexes could bind UAP56 in the presence of the non-hydrolysable ATP analogue adenylyl-imidodiphosphate (AMP-PNP) and a 15-nucleotide poly-U RNA substrate (Extended Data Fig. 1a,b, lanes 1–5). TREX-2M facilitates the release of ADP and Pi from UAP56, thus accelerating the rate-limiting step in the disassembly of UAP56–RNA complexes, releasing free UAP56 available for RNA re-binding, and resulting in an increased apparent ATPase activity7. Similarly, the LENG8–PSM or SAC3D1–PSM complexes stimulated the apparent ATPase rate of UAP56 in the presence of RNA and ATP in vitro, revealing approximately a 290-fold and 60-fold stimulation, respectively (Extended Data Fig. 1c,d). Moreover, substituting a highly conserved arginine residue in the LENG8 or SAC3D1 wedge loops with an alanine7 (LENG8(R563A) or SAC3D1(R102A)) (Fig. 1b, Extended Data Fig. 1e and Supplementary Fig. 2a,b), did not affect UAP56 binding (Extended Data Fig. 1f,g and Supplementary Fig. 2c), but largely abrogated the ATPase stimulatory activity on UAP56 (Extended Data Fig. 1h,i). Finally, to test whether LENG8–PSM and SAC3D1–PSM, like TREX-2M (ref. 7), would promote the release of RNA from UAP56, we incubated UAP56 with RNA and ATP to form UAP56–ADP-Pi–RNA complexes, which we immobilized on streptavidin beads via the biotinylated RNA7. These complexes were then challenged with either LENG8–PSM or SAC3D1–PSM, revealing that both moieties released UAP56 efficiently (Fig. 1c,d, compare lanes 4 and 5), whereas the respective wedge loop mutants did not (Fig. 1c,d, lane 6). Of note, mutating three residues targeting the UAP56 N-terminal domain (NTD) and UAP56 RecA2-binding interfaces of LENG8 (LENG8(TRR)) led to diminished LENG8–PSM–UAP56 interaction (Extended Data Fig. 1j,k and Supplementary Fig. 2a) and parallel declines in both the apparent ATPase activity (Extended Data Fig. 1h, lane 7) and the release of UAP56 from RNA (Extended Data Fig. 1l). We conclude that TREX-2-like complexes, like TREX-2, bind UAP56 and trigger the release of its bound RNA through a shared mechanism.
Although previous structural studies of UAP56–TREX-2M complexes had revealed their protein–protein interfaces, it remained unclear how the wedge loop functions in releasing UAP56 from RNA. To investigate the molecular basis for this function, we analysed LENG8–PSM and SAC3D1–PSM complexes with UAP56 in the presence of 15-nucleotide poly(U) RNA and ATP or AMP-PNP using cryo-electron microscopy (cryo-EM). This revealed a fraction of complexes without UAP56, enabling us to solve the structures of apo LENG8–PSM and SAC3D1–PSM at 3.5 Å and 3.6 Å resolution, respectively (Extended Data Table 1). Both complexes showed the same V-shaped architecture previously observed for TREX-2M (Extended Data Figs. 2a and 3 and Supplementary Figs. 3a–c and 4a–c) and a yeast LENG8–PSM complex23,24. Unexpectedly, two-dimensional class averages of the UAP56-engaged fractions of LENG8–PSM and SAC3D1–PSM suggested that UAP56 could be in a closed, RNA-bound state, prior to its release via the wedge loop (Extended Data Fig. 2b). Together with our previously reported UAP56–TREX-2M structure7, in which UAP56 was captured after RNA release, this enabled us to investigate the RNA-releasing mechanism of SAC3 domain-containing complexes. We resolved the cryo-EM structures of UAP56–LENG8–PSM and UAP56–SAC3D1–PSM complexes in the pre-RNA release state (Fig. 1e,f, Extended Data Fig. 3, Supplementary Figs. 3d–g and 4f,g and Extended Data Table 1). A severe bias in particle orientation limited resolution to 6–12 Å for UAP56–LENG8–PSM in the RNA-clamped pre-release state. Reconstitution of the complex with ATP yielded a higher resolution structure at 4.9 Å containing density only for the UAP56 NTD (Supplementary Fig. 4d,e). We could, however, resolve the structure of UAP56–RNA–SAC3D1–PSM to 2.6 Å, enabling a detailed structural analysis. The structure of the pre-release state shared key architectural features with UAP56–TREX-2M, including the anchoring of the NTD of UAP56 at the base of the SAC3D1–PS complex. Truncating the UAP56 NTD reduced the affinity of UAP56 for both SAC3D1–PSM and LENG8–PSM by more than 30-fold, as measured by grating-coupled interferometry (Extended Data Fig. 2c) and supported by in vitro pulldown assays (Extended Data Fig. 1a,b, lanes 6 and 7). Thus, the UAP56 NTD is equally important for TREX-2-like complex and TREX-2-complex7 interactions. In addition, the UAP56–SAC3D1–PSM structure provided insights into the action of the wedge loop. In the structure, this region (residues Y100–P111; Fig. 1b and Extended Data Fig. 1e) is bound near the two RecA lobes through largely electrostatic interactions between the peptide backbone and UAP56 residues R135 in RecA1 and K334 in RecA2 (Fig. 1g and Extended Data Fig. 2d). The critical R102 wedge loop residue in SAC3D1 forms a hydrogen bond with UAP56 E354, positioning R102 close to F381 in the RecA2 lobe of UAP56. By contrast, in the post-release state observed for UAP56–TREX-2M, this central wedge loop arginine (R102 in SAC3D1, R678 in GANP) replaced UAP56 F381 in the nucleotide binding site (Fig. 1g, right). The positioning of the wedge loop arginine in the clamped state might prime it to replace UAP56 F381 in a subsequent step, releasing RNA from UAP56 (Fig. 1g,h and ref. 7).
The RNA-clamped RecA lobes of UAP56 are bound between PCID2, the wedge loop and the SAC3 domain in these SAC3 domain-containing complexes. Notably, the protein–protein interfaces between UAP56 and PCID2 in both TREX-2M and the TREX-2M-like complexes involve only few specific interactions, except for the UAP56 NTD7, suggesting that PCID2–SEM1 has an architectural role in ensuring specificity for UAP56. Indeed, superposition of the evolutionarily related and RNA-bound form of the DExD-box ATPase EIF4A316 onto the UAP56–SAC3D1–PSM structure revealed clashes between EIF4A3 and PCID2 (Extended Data Fig. 2e). Consistently, LENG8–PSM bound UAP56, but not the closely related DExD-box proteins EIF4A3 and DDX19 in vitro (Extended Data Fig. 2f and Supplementary Fig. 2d) and did not stimulate the EIF4A3 ATPase (Extended Data Fig. 2g).
We conclude that human cells contain three structurally and biochemically equivalent SAC3 domain-containing complexes. Aided by their complex architecture and the uniqueness of the UAP56 NTD, they all target UAP56 specifically and their conserved wedge loops can dislodge UAP56 from RNA.
LENG8–PS provides a physical module for PAXT
Our biochemical and structural analyses suggested that the GANP–PS, LENG8–PS and SAC3D1–PS complexes can individually act on UAP56. To address where these complexes act in vivo, we generated HeLa cell lines25 stably expressing C-terminally 3×Flag-tagged versions of endogenous GANP, LENG8 or SAC3D1 as well as the common subunit PCID2 (Extended Data Fig. 4a, lanes 3–6), and analysed these with immunofluorescence microscopy using a Flag antibody. As previously reported, GANP was found primarily at the nuclear envelope, consistent with its NPC association6,26 (Fig. 2a, top row). By contrast, LENG8 and SAC3D1 localized to the nucleoplasm and the cytoplasm, respectively (Fig. 2a, second and third rows). In agreement with its presumed presence in all three complexes, PCID2 was distributed between the nucleoplasm, the nuclear envelope and the cytoplasm (Fig. 2a, bottom row).
With our focus on nuclear RNA sorting, we examined the TREX-2 and LENG8–PS complexes in more detail, performing immunoprecipitation–mass spectrometry (IP–MS) analyses of the 3×Flag-tagged GANP, LENG8 and PCID2 proteins. Stringent immunoprecipitation conditions were used to enrich for high-affinity interactors (Methods). In the GANP–3×Flag immunoprecipitation, this yielded large amounts of PCID2, in addition to the ENY2 and CETN2 or CETN3 subunits of TREX-26, and the nuclear pore basket protein TPR26 (Fig. 2b, left). Although LENG8–3×Flag also precipitated PCID2, this immunoprecipitation was instead enriched for the PAXT core components ZFC3H1 and MTR4 along with nuclear exosome subunits (Fig. 2b, middle; note that SEM1 was also detected applying an alternative protocol (Extended Data Fig. 5l)). Finally, reflecting its dual interaction with GANP and LENG8, PCID2–3×Flag precipitated these proteins and their respective interactors (Fig. 2b, right). Parallel, low-stringency immunoprecipitation–western blotting analyses qualitatively recapitulated these interaction patterns (Extended Data Fig. 4a, lanes 9–12) and subjecting the IP–MS experiments to RNase treatment revealed that the interactions were not facilitated by RNA (Extended Data Fig. 4b). Moreover, analysing mean enrichments over background across immunoprecipitation experiments displayed near-identical interaction levels of GANP with TREX-2 components, and LENG8 precipitated similar amounts of PCID2 as well as the core PAXT factors ZFC3H1 and MTR4 (Fig. 2c and Supplementary Table 1). This indicated that LENG8–PS complexes constitute a major module for PAXT. To address this notion further, we generated ZFC3H1–3×Flag cells (Extended Data Fig. 4a, lane 2) and conducted IP–MS experiments at both low and high stringency, which revealed RNase-resistant interactions with LENG8 and PCID2 (Extended Data Fig. 4c,d). Mean enrichments calculations showed that, along with exosome subunits, LENG8 and PCID2 were sub-abundant to MTR4, consistent with the reported presence of inactive nuclear ZFC3H1–MTR4 dimers12 (Extended Data Fig. 4e, columns 1 and 2). Nonetheless, the ZFC3H1 immunoprecipitations returned LENG8 and PCID2 in 5- to 10-fold excess of extended PAXT components and reciprocal LENG8 immunoprecipitations recovered abundant amounts of the ZFC3H1–MTR4 dimer (Extended Data Fig. 4e, columns 3 and 4, f). We further reproduced key high-stringency interactions in HCT116 cells, expressing endogenous LENG8–2×HA–dTAG or ZFC3H1–2×HA–dTAG proteins (Extended Data Fig. 4g–i and Supplementary Table 1). Finally, association of LENG8 with ZFC3H1 was mirrored by their measured nuclear colocalization, as revealed by immunostaining of LENG8–3×Flag cells using Flag- and ZFC3H1-specific antibodies. Here, LENG8 and PCID2, but not GANP, displayed weighed colocalization coefficients with ZFC3H1 in the nucleoplasm of more than 0.90 (Fig. 2d and Extended Data Fig. 4j).
Having established a physical link between LENG8 and ZFC3H1, we inquired whether LENG8 interacts with MTR4 and the exosome via ZFC3H1. Indeed, rapid ZFC3H1 depletion, using the FKBP12(F36V)–degron (dTAG)27 (Extended Data Fig. 5a, left), prevented interactions of MTR4 and the exosome component EXOSC10 with LENG8–3×Flag (Extended Data Fig. 5a, right). To identify a possible ZFC3H1 interaction site on LENG8, we utilized AlphaFold2, which revealed a conserved motif of two α-helices (residues 288–342) in the otherwise unstructured region N-terminal to the SAC3 domain (Extended Data Fig. 5b–d). A direct interaction was predicted between this helical region of LENG8 and a single α-helix (residues 730–747) of ZFC3H1 (Fig. 2e and Extended Data Fig. 5e), which we verified using an in vitro pulldown assay, comparing relevant wild-type or mutant versions of recombinant LENG8 and ZFC3H1 peptides (Extended Data Fig. 5f and Supplementary Fig. 2e). Further validating the interaction, IP–MS analysis of a ZFC3H1(Δ730–747) compared to a wild-type ZFC3H1 3×Flag construct demonstrated the specific loss of LENG8 and PCID2 over exosome subunits and extended PAXT components (Fig. 2f and Extended Data Fig. 5g) also upon nuclease treatment (Extended Data Fig. 5h). Similarly, mutation of the central LENG8 phenylalanine at the LENG8–ZFC3H1 interface (Fig. 2e) to an alanine (F301A) led to the selective loss of ZFC3H1, MTR4 and exosome subunits in a comparative IP–MS analysis of wild-type LENG8 versus LENG8(F301A) 3×Flag constructs (Fig. 2g, Extended Data Fig. 5i–l and Supplementary Table 1).
Together, we conclude that the TREX-2-like LENG8–PS complex constitutes a physical module of the nucleoplasmic PAXT connection (Fig. 2h).
LENG8–PS works with PAXT
Given the physical link between ZFC3H1 and LENG8, we next sought to probe its functional relevance. Expression of wild-type ZFC3H1–3×Flag, but not the ZFC3H1(Δ730–747) LENG8 binding-deficient variant (Extended Data Fig. 6a), suppressed selected formerly established PAXT substrates8,14 from accumulating after rapid depletion of endogenous ZFC3H1 (Fig. 3a). Similarly, PAXT substrate accumulation, following rapid depletion of endogenous LENG8, was suppressed by expression of wild-type LENG8–3×Flag but not by the ZFC3H1-binding mutant LENG8(F301A) (Fig. 3b and Extended Data Fig. 6b). Since these analyses corroborated a role for LENG8 in PAXT function, we obtained a transcriptome-wide view of this relationship by sequencing pA+ RNA from cells rapidly depleted for either LENG8 or ZFC3H1 (Extended Data Fig. 6c,d). PAXT substrates include numerous prematurely terminated transcripts (PTTs), deriving from transcription start site (TSS)-proximal regions of protein-coding genes8,14. To incorporate these in our analysis, we identified transcription units displaying such increased pA+ RNA coverage upon depletion of ZFC3H1 or LENG8 (Extended Data Fig. 6e) and intersected data with pA+ RNA 3′ end peaks previously identified upon depletion of ZFC3H128 (Methods). As a result, 1,202 pA+ PTTs were defined (see Extended Data Fig. 6f for an example) and included in our HeLa transcriptome annotation29 (Extended Data Fig. 6g).
Although numbers and changed levels of affected RNAs were higher with ZFC3H1 depletion, individual LENG8- and ZFC3H1-sensitive transcripts were strongly correlated (Fig. 3c), consistent with a shared pathway. This was substantiated by sequencing pA+ RNA from ZFC3H1-depleted cells, exogenously expressing either wild-type ZFC3H1 or the LENG8-binding mutant ZFC3H1(Δ730–747) (Extended Data Fig. 6h); only wild-type ZFC3H1 efficiently suppressed the accumulation of ZFC3H1 substrates, whereas ZFC3H1(Δ730–747) retained only partial activity (Extended Data Fig. 6i). Moreover, differential expression analysis, applying DESeq2 (ref. 30) with two different cut-offs, revealed an extensive overlap of RNAs that were increased upon depletion of ZFC3H1 or LENG8 (Extended Data Fig. 6j and Supplementary Table 2). We speculated that the stronger effect size with ZFC3H1 depletion might, at least in part, reflect the interaction of ZFC3H1 with the early RNA-processing factor ARS214, possibly allowing residual LENG8-independent ZFC3H1 activity. To test this, we compared ARS2- and LENG8-dependent PAXT targets by intersecting RNA-sequencing (RNA-seq) data from a ZFC3H1 complementation assay using the ZFC3H1(ARM) mutant, which cannot bind ARS214, with RNA-seq data from the LENG8 binding-deficient ZFC3H1(Δ730–747) mutant. There was significant overlap between LENG8- and ARS2-dependent ZFC3H1 substrates, which suggested partially redundant LENG8 and ARS2 activities on a shared substrate pool (Extended Data Fig. 6k).
In agreement with previous studies on PAXT8,11,13,14, the ZFC3H1- and LENG8-depleted samples revealed upregulation of promoter upstream transcripts (PROMPTs)31, PTTs, other noncoding RNAs (ncRNAs) and a minor fraction of full-length mRNAs, both when each sample was evaluated individually (Extended Data Fig. 6l) and when interrogating the common substrate cohort (Extended Data Fig. 6m, left). In line with this result, the majority of ZFC3H1 and LENG8 substrates contained only one or a few exons (Fig. 3d and Extended Data Fig. 6m, right), and for both depletion conditions RNA accumulation levels decreased with increasing exon number (Fig. 3e and Extended Data Fig. 6n) and mature RNA length (Extended Data Fig. 6o,p).
In addition to decay, ZFC3H1 also contributes to nuclear retention of pA+ RNA11,12,15, and LENG8 has been suggested to have a similar capacity32. To analyse the effect of ZFC3H1 or LENG8 on transcript fate, we therefore performed nuclear/cytoplasmic pA+ RNA-seq obtained by fractionating HeLa cells before or after the rapid depletion of these factors (Extended Data Fig. 7a,b). Consistent with the RNA-retention capacities of ZFC3H1 and LENG8, PAXT-sensitive RNAs, defined by their increased abundance upon either ZFC3H1 or LENG8 depletion (Methods), displayed higher nuclear-to-cytoplasmic ratios in unperturbed HeLa cells than the remaining pA+ transcriptome (Fig. 3f). Clustering all transcripts by their depletion-dependent nuclear or cytoplasmic content changes demonstrated that more than half of the displayed RNAs were immune to ZFC3H1 or LENG8 depletion (Extended Data Fig. 7c, left, cluster 1, Supplementary Table 2 and Methods). Whereas most of these transcripts were accounted for by spliced mRNAs with multiple exons (Extended Data Fig. 7c, biotypes), a closer examination of the low-exon count RNAs from cluster 1 revealed mild, but detectable, sensitivity to ZFC3H1 depletion (Extended Data Fig. 7d). Outside of cluster 1, the remaining transcripts were, to variable extents, upregulated in both compartments upon ZFC3H1 and LENG8 depletions, demonstrating inefficient nuclear retention (Extended Data Fig. 7a, lanes 1–4 and 6–9 and Extended Data Fig. 7c, clusters 2–4). Repeating the same depletion experiments in HCT116 cells (Extended Data Fig. 7e,f) recapitulated these trends (Extended Data Fig. 7g). It therefore appears that PAXT generally retains and mediates decay of short pA+ RNAs with few exons, whereas transcripts that escape these fates are commonly longer and more exon-rich.
Although the protein-coding fraction of analysed transcripts was largely insensitive to PAXT, a minor subset of sensitive mRNAs was still detectable (Extended Data Fig. 7c, note biotypes of clusters 2–4). Similar to noncoding PAXT substrates, these were primarily low-exon (1–4 exons) transcripts (Fig. 3g), enriched in nuclei of unperturbed cells (Extended Data Fig. 7h). However, in the same condition, these short mRNAs were present at higher levels than their ncRNA counterparts (Extended Data Fig. 7i), suggesting that they might have acquired means to fend off nuclear turnover (see Discussion). DESeq2 also identified 105 longer, multi-exonic mRNA outliers (more than 4 exons), that were PAXT-sensitive despite their higher exon number and length. As general PAXT targets (Fig. 3f), these transcripts showed increased nuclear-to-cytoplasmic ratios in unperturbed cells compared with PAXT-insensitive controls (Extended Data Fig. 7j), suggesting that prolonged nuclear residence time may drive their sensitivity to PAXT. Notably, the LENG8 mRNA belongs to this transcript category, implying autoregulation of the PAXT pathway (see below). Since introns contribute to nuclear RNA retention33,34,35, we assessed PAXT-sensitive multi-exonic mRNAs (more than 4 exons) for reads spanning both 5′ and 3′ splice junctions (Methods). Indeed, when compared with a control population, these transcripts were enriched for retained introns, including previously described detained introns36 (Extended Data Fig. 7k). Moreover, upon ZFC3H1 or LENG8 depletion, PAXT-sensitive multi-exonic transcripts accumulated largely as incompletely spliced precursors in the nucleus and as fully spliced mRNAs in the cytoplasm (Extended Data Fig. 7l). Thus, upon PAXT impairment, this group of transcripts can be post-transcriptionally spliced and exported. Finally, we also identified an additional 3,209 PAXT-sensitive introns, including 161 detained introns (Supplementary Table 3), for which the corresponding spliced mRNAs were PAXT-insensitive as measured by DESeq2. We therefore propose that, in addition to short RNAs, PAXT can also target multi-exonic transcripts, which are retained in the nucleus due to incomplete splicing. In certain cases, this alters the cellular levels of the mature transcript, the extent of which might differ between cell types.
Together, these cellular data indicate that LENG8–PS constitutes a functional module of PAXT, which primarily targets nuclear pA+ RNAs with no, or only a few, introns as well as a minor pool of longer intron-containing transcripts. This overall retention–decay regime may be exploited for the regulation of selected mRNAs.
PAXT and TREX-2 govern UAP56-bound RNP fate
Having defined LENG8–PS as a central physical and functional module of PAXT, we investigated how the ability of LENG8 to release UAP56 from RNA in vitro (Fig. 1c and Extended Data Fig. 1l) affects its function in vivo. Similar to the ZFC3H1-binding–defective LENG8(F301A) mutant, the LENG8(TRR) and LENG8(R563A) variants (Extended Data Fig. 8a)—which were compromised for UAP56 binding (Extended Data Fig. 1j,k) and UAP56 release from RNA (Fig. 1c), respectively, in vitro—were unable to suppress selected PAXT targets that were upregulated upon depletion of endogenous LENG8 (Fig. 4a). Notably, compared with the wild-type and F301A constructs, LENG8(R563A) displayed reduced RNA binding in vivo (Extended Data Fig. 8b), despite its ability to bind major interactors (Extended Data Fig. 8c and Supplementary Table 1). We therefore reasoned that release of RNA from UAP56 by LENG8–PS is central for PAXT-mediated turnover. To further pursue this notion, we conducted crosslinking followed by immunoprecipitation (iCLIP) experiments37,38 with endogenous UAP56 and endogenously 3×Flag-tagged LENG8 in HeLa cells (Extended Data Fig. 8d,e and Supplementary Table 4) and quantified the iCLIP coverages of these proteins, normalized to transcript abundance, across three distinct classes of PAXT targets: pA+ ncRNAs, PROMPTs and PTTs. Although the majority of the UAP56 iCLIP signal mapped to exons (Extended Data Fig. 8f), consistent with its established role in messenger ribonucleoprotein particle (mRNP) maturation and export7,16, both UAP56 and LENG8 exhibited selective enrichment over PAXT-sensitive transcripts compared with their expression-matched and exosome-sensitive control groups (Extended Data Fig. 8g,h, Supplementary Fig. 5a,b and Methods). Thus UAP56 binds a wide range of RNAPII transcripts, including PAXT substrates. Such broad RNP incorporation presumably occurs via UAP56 recruitment through diverse set of factors, many of which include UAP56 binding motifs (UBMs)7,39,40.
Together, the above experiments demonstrated the targeting of UAP56-bound RNPs by PAXT–LENG8–PS in vivo, suggesting that UAP56, while being an RNA export factor, is also crucial for nuclear RNA decay. To interrogate possible relations between these two fates of UAP56-bound RNPs, we first analysed the consequence of rapid GANP depletion in cells expressing GANP–2×HA–dTAG (Extended Data Fig. 6c) by pA+ RNA-seq (Extended Data Fig. 6d). In agreement with previous studies26,41, this mainly resulted in the downregulation of short pA+ RNAs with few introns (Fig. 4b and Extended Data Fig. 8i,j), resembling PAXT-sensitive transcripts (Fig. 3d and Extended Data Fig. 6o). Indeed, GANP-sensitive RNAs were largely upregulated upon depletion of ZFC3H1 or LENG8 (Fig. 4b, left and Extended Data Fig. 8k). We therefore propose that the suppressive effect of GANP reduction reflects PAXT-mediated degradation of export-restricted transcripts. Consistently, our fractionated pA+ RNA-seq data displayed the coinciding cytoplasmic accumulation of these RNAs following LENG8 or ZFC3H1 depletion (Fig. 4b, right). Equivalent depletion experiments in HCT116 cells (Extended Data Fig. 7e,f) recapitulated these trends (Extended Data Fig. 8l). By contrast, GANP reduction did not significantly affect levels of nuclear retained multi-exonic (more than 4 exons) PAXT substrates (Extended Data Fig. 8m; note the LENG8 mRNA defying this trend, see also below).
To further elaborate on an apparent competition of TREX-2 and PAXT for their target transcripts, we monitored protein production in TREX-2- or PAXT-perturbed cells. As expected, GANP depletion impaired global protein synthesis, as revealed by decreased puromycin incorporation (Extended Data Fig. 9a). Less intuitive, but consistent with prior reports on ZFC3H111,28, depletion of LENG8 or ZFC3H1 also decreased synthesis of new protein, possibly owing to overloading of ribosomes by capped ncRNAs escaping from the nucleus42. We characterized these translational alterations further through the quantitative analysis of nascent protein synthesis using stable isotope labelling by amino acids in cell culture (SILAC) followed by mass spectrometry43,44 (Supplementary Table 5). Although only a subset of proteins matching PAXT-sensitive mRNAs were detectable, depletion of LENG8 or ZFC3H1 generally increased de novo peptide synthesis, whereas GANP depletion had little or opposing effects (Fig. 4c and Extended Data Fig. 9b). We note that this occurred despite the general conditions of decreased protein synthesis (Extended Data Fig. 9a). Proteins with increased synthesis upon LENG8 or ZFC3H1 depletion were enriched for RNA-binding proteins and RNA-processing regulators (Supplementary Table 5), which included the reciprocal increases of LENG8 and ZFC3H1 themselves (Fig. 4c and Extended Data Fig. 9b, top). Although only modestly increased in the SILAC experiment (Extended Data Fig. 9b, top right), LENG8 protein upregulation upon ZFC3H1 depletion was confirmed by western blotting analysis (Extended Data Fig. 9c,d). As the ZFC3H1 mRNA similarly did not pass the DESeq2 analysis threshold (Supplementary Table 2), we used quantitative PCR with reverse transcription (RT–qPCR) to reveal that rapid depletion of LENG8 or ZFC3H1 led to the nuclear and cytoplasmic accumulation of mRNAs encoding these proteins, whereas GANP depletion decreased accumulation of LENG8, ZFC3H1 and GANP (encoded by MCM3AP) mRNA (Fig. 4d). This supports the previous notion that retained mRNAs—such as LENG8 mRNA—are subject to regulation by the PAXT pathway.
In conclusion, two structurally and functionally similar modules, LENG8–PS and GANP–PS, are critical interpreters of UAP56-bound nuclear pA+ RNPs. Nucleoplasmic PAXT and NPC-associated TREX-2 utilize these equivalent biochemical modules to control nuclear pA+ RNA homeostasis by facilitating decay and export, respectively.
Discussion
Here we demonstrate that short and low exon content pA+ RNAs in UAP56-bound RNPs are highly susceptible to PAXT-mediated nuclear turnover. When PAXT function is impaired, NPC-associated TREX-2 seemingly grants export to these transcripts, demonstrating that LENG8–PS and GANP–PS can target similar pA+ RNPs. Based on these findings, we propose a general model for nuclear pA+ RNA fate determination (Fig. 4e). Both TREX-2 and PAXT may engage UAP56-bound pA+ RNPs. Owing to the nucleoplasmic localization of newly made pA+ RNPs, they would first encounter PAXT. However, being an adaptor of the 3′–5′ exonucleolytic exosome, PAXT targeting only translates into efficient decay if it occurs in the vicinity of the RNA 3′ end. This condition greatly sensitizes short transcripts. By contrast, for longer RNAs, which are compacted into larger RNPs with multiple UAP56s, a PAXT encounter may merely result in the release of an UAP56 molecule. This would counteract export indirectly, however, as long as a sufficient number of UAP56 molecules remains on the transcript, nuclear export via TREX-2 is still possible. In support of this model, a recent study found that the widespread interaction of ZFC3H1 with long and multiply spliced mRNAs did not affect transcript levels upon ZFC3H1 depletion15. We note that the targeting of LENG8–PS to decay-insensitive transcripts might have a role in maintaining sufficient levels of free nuclear UAP56, without which mRNA biogenesis defects, R-loop formation and genomic instability would prevail45. The importance of such UAP56 recycling might in fact explain why budding yeast, in which the PAXT complex has been lost, still harbours the LENG8–PS homologue23. Similarly, the cytoplasmic presence of a SAC3D1-containing TREX-2-like complex in higher eukaryotes might help remove residual UAP56 from RNA after export.
Although general, the proposed model may be bypassed by specific transcripts. For example, short functional pA+ mRNAs, including stress-induced transcripts, must overcome nuclear decay. How this is achieved remains an open area of research. Stress-induced mRNP reorganization might offer protection to the RNA 3′ end, or their robust transcriptional upregulation may ensure that, despite ample nucleoplasmic decay, a sufficient number of RNPs still reaches the NPC. Consistent with the latter notion, we find that PAXT-sensitive short mRNAs are expressed at higher levels than their ncRNA counterparts (Extended Data Fig. 7i). Finally, decay might also be short-circuited by gene gating, positioning a given locus in proximity to the NPC, as was recently demonstrated for MYC46,47. We also find a small subset of multi-exonic transcripts, that are sensitive to PAXT-mediated decay, violating the general nuclear pA+ RNA decay regime (Fig. 3g). We suggest that this sensitivity is enhanced by prolonged residence in the nucleus (Extended Data Fig. 7h,j), exacerbated by intron presence (Extended Data Fig. 7k). In line with nuclear retention of pre-mRNA being able to have a regulatory role33,34,35,36, our data provide evidence that the PAXT system itself is subject to such control (Fig. 4c). Thus the PAXT axis, which primarily targets short, non-functional pA+ RNAs, has been co-opted to regulate a subset of mRNAs that are sensitized either by low exon content or prolonged nuclear retention.
In summary, we propose that the major opposing fates of nuclear pA+ RNA—export and decay—exploit a shared molecular logic. At its centre is RNA-bound UAP56 and its highly regulated release from the pA+ RNP. This blurs previously held categorizations of proteins as specific RNA export or decay factors, and highlight the competition for common pA+ RNP features that ensures faithful gene expression.
Methods
DNA sequences
All oligonucleotide plasmid vectors are annotated in Supplementary Table 6.
Purification of UAP56 and UAP56(Δ1–43)
His-tagged UAP56 constructs (10×His–3C–UAP56 or 10×His–3C–UAP56(Δ1–43), residues 44–428) were expressed in Escherichia coli BL21 DE3 RIL using autoinduction medium at 37 °C for 16 h. Following collection, cells were resuspended in lysis buffer (25 mM HEPES pH 7.9, 5% glycerol, 300 mM NaCl, 20 mM imidazole, 0.05% Tween-20, and protease inhibitors), disrupted via sonication, and clarified by centrifugation. The supernatant was sequentially filtered through 1-µm and 0.45-µm filters before affinity purification on a HisTrap HP 5 ml column (Cytiva), equilibrated in buffer A (25 mM HEPES pH 7.9, 5% glycerol, 300 mM NaCl, 20 mM imidazole). After washing with buffer A supplemented with 70 mM imidazole, bound proteins were eluted using a linear gradient of imidazole (70–200 mM in buffer A). Peak fractions were diluted in buffer B (25 mM HEPES pH 7.9, 5% glycerol, 1 mM DTT) to reduce the NaCl concentration to 100 mM and subsequently subjected to anion-exchange chromatography on a HiTrapQ 5 ml column (Cytiva), pre-equilibrated with buffer B. Elution was performed with a linear NaCl gradient (100–500 mM). Fractions containing UAP56 were concentrated and further purified via size-exclusion chromatography using a HiLoad 16/600 Superdex 200 pg column (Cytiva), equilibrated in buffer C (25 mM HEPES pH 7.9, 5% glycerol, 100 mM NaCl, 1 mM DTT). Peak fractions containing the purified protein were pooled, concentrated, flash-frozen, and stored at −80 °C.
Purification of LENG8–PSM and SAC3D1–PSM
Expression constructs encoding LENG8–PSM (10×His–MBP–LENG8491–800, 3×V5–PCID2, SEM1), SAC3D1–PSM (10×His–MBP–SAC3D148–404, 3×V5–PCID2, SEM1), SAC3D1–PCID2-UAP56–UCM–N-UBM–SEM1 (10×His–MBP–SAC3D148–404, 3×V5–PCID2–UAP56–UCM–N-UBM, SEM1), LENG8–PCID2–UAP56–N-UBM, SEM1 (10×His–MBP–LENG8491–800, 3×V5–PCID2–UAP56–N-UBM, SEM1) and their respective mutants were introduced into E. coli BL21 DE3 RIL (UCM is a UAP56-clamping motif). Cultures were grown in LB medium at 37 °C to OD600 ~1.0, at which point expression was induced with 0.5 mM IPTG, followed by overnight incubation at 18 °C. Cells were collected, lysed by sonication, and clarified by centrifugation. The supernatant was filtered (1 µm and 0.45 µm) and loaded onto a HisTrap HP 5 ml column equilibrated with buffer A, followed by washing and elution using a linear imidazole gradient up to 300 mM. Peak fractions were diluted to 50 mM NaCl in buffer B and subjected to anion-exchange purification on a HiTrapQ HP 5 ml column. After washing, complexes were eluted with a NaCl gradient (100–500 mM). Size-exclusion chromatography using a HiLoad 16/600 Superdex 200 pg column (Cytiva) in buffer C containing 250 mM NaCl yielded the final purified complex, which was concentrated, flash-frozen, and stored at −80 °C.
Recombinant EIF4A3 was purified as described previously16.
His-tagged DDX19 was expressed in E. coli BL21 DE3 RIL using LB medium, induced with 0.5 mM IPTG and expressed at 37 °C for 3 h. Following collection, cells were resuspended in lysis buffer (25 mM HEPES pH 7.9, 5% glycerol, 300 mM NaCl, 20 mM imidazole, and protease inhibitors), disrupted via sonication, and the lysate was cleared by centrifugation. The supernatant was sequentially filtered through 1-µm and 0.45-µm filters before affinity purification on a HisTrap HP 5 ml column (Cytiva), equilibrated in buffer A. The column was washed with buffer A containing 30 mM imidazole and bound proteins were eluted using a linear gradient of imidazole (50–300 mM). The peak fractions were incubated with 3C protease to cleave off the tag, and after 3C cleavage the peak fractions were diluted in buffer B to reduce the NaCl concentration to 50 mM, filtered through a 0.22-µm filter and next subjected to anion-exchange chromatography on a HiTrapQ 5 ml column (Cytiva), pre-equilibrated with buffer B supplemented with 50 mM NaCl. The column was washed with buffer B supplemented with 50 mM NaCl following sample loading. Elution was performed with a linear NaCl gradient (50–500 mM). Peak fractions containing DDX19 were concentrated and further purified via size-exclusion chromatography using a HiLoad 16/600 Superdex 200 pg column (Cytiva), equilibrated in buffer C. Peak fractions containing the purified protein were pooled, concentrated, flash-frozen, and stored at −80 °C.
Analytical gel filtration
For each purified protein or complex an aliquot of 62.5 μg was loaded onto a Superdex 200 Increase 5/150 column (Cytiva), equilibrated in the respective gel filtration buffers. Peak fractions were analysed via SDS–PAGE (4–12% gradient) and visualized by Coomassie staining.
UAP56–LENG8 and UAP56–SAC3D1– PSM pulldown
MBP-tagged LENG8–PSM or SAC3D1–PSM was incubated with a fourfold molar excess of UAP56 or UAP56(Δ1–43) in buffer D (25 mM HEPES pH 7.9, 40 mM NaCl, 5% glycerol, 0.01% Igepal CA-630, 1 mM MgCl2, 1 mM TCEP), with or without 50 μM 15U RNA and 1 mM AMP-PNP. Reactions were mixed by rotation at 4 °C for 1 h before adding 30 μl of pre-equilibrated amylose resin (E8021S, NEB). After an additional 1-h incubation at 4 °C, unbound proteins were removed by centrifugation (1,500g, 2 min, 4 °C) and 3 washes with buffer D. Bound proteins were eluted by incubation at 4 °C for 1 h in buffer D supplemented with 100 mM maltose. Input and elution fractions were analysed via SDS–PAGE (4–12% gradient) and visualized by Coomassie staining.
LENG8–ZFC3H1 pulldown
Magnetic Streptavidin beads (50 μl in-house produced slurry per reaction) were equilibrated in wash buffer (25 mM HEPES pH 7.9, 100 mM NaCl, 5% glycerol, 1 mM MgCl2, 1 mM TCEP, 0.01% Igepal CA-630). Wild-type or mutant ZFC3H1 peptide (200 µg) with an N-terminal biotin and a C-terminal fluorescein, were added to the beads in a 100 μl reaction volume and incubated on a rotating wheel at room temperature for 60 min. To remove excess peptide, beads were washed three times with wash buffer. Subsequently, 15 μg recombinant LENG8(283–346) or LENG8(283–346) F301A in a 100 μl reaction volume were added to the beads and the reaction incubated on a rotating wheel for 60 min at 4 °C. Following the incubation, beads were washed three times with wash buffer before bound proteins were eluted by incubating the beads for 5 min with 200 mM glycine pH 2.5. Elutions were neutralized with Tris pH 10.4 and separated by SDS–PAGE. To detect the fluorescently labelled peptides, gels were imaged in the Fluorescein channel on a Bio-Rad Chemidoc Imager prior to Coomassie staining to visualize the proteins.
RNA unclamping assay
Biotinylated 15U RNA (33 µM) was mixed with recombinant UAP56 (10 µM) and 1 mM ATP in buffer E (20 mM HEPES pH 7.9, 40 mM KCl, 2 mM MgCl2, 5% glycerol, 0.1% Igepal CA-630). This mixture was incubated with 20 µl NeutrAvidin Agarose beads (29202, Thermo Scientific), pre-equilibrated in buffer E, for 30 min at room temperature. After washing to remove excess UAP56 and ATP, beads were resuspended in buffer E and aliquoted. LENG8–PSM or SAC3D1–PSM (2.2 µM or 0.44 µM) was added, followed by a 10-min incubation at room temperature. Unbound proteins were removed by sequential washes in high-salt buffer (buffer E with 500 mM KCl) and buffer E. RNA-bound proteins were eluted using 0.4 μg benzonase in buffer E for 10 min at room temperature, followed by SDS–PAGE analysis and quantification of remaining RNA-clamped UAP56 in Fiji.
Grating-coupled interferometry
Grating coupled interferometry measurements were conducted using a Creoptix WAVE system (Creoptix) with 4PCP WAVEchips (quasi-planar polycarboxylate surface). Chips were conditioned in borate buffer (100 mM sodium borate pH 9.0, 1 M NaCl) before immobilization of a monoclonal anti-V5 antibody (R960252, Invitrogen; 2 μg ml−1 in 10 mM sodium acetate pH 5.0) via amine coupling. The surface was then passivated with 0.5% BSA (in 10 mM sodium acetate pH 5.0) and quenched with 1 M ethanolamine pH 8.0. V5-tagged LENG8–PSM or SAC3D1–PSM complexes were captured to the desired density. UAP56 was injected as a 1:2 dilution series, starting at 5 µM, with or without 200 µM 15U RNA, in 25 mM HEPES pH 7.9, 50 mM KCl, 1 mM MgCl2, 1 mM TCEP, with and without 1 mM ATP at 25 °C. Blank injections were used for double referencing, and a DMSO calibration curve corrected for bulk refractive index effects. Data were processed using Creoptix WAVEcontrol software, applying x/y offset correction, DMSO calibration, and double referencing. A one-to-one binding model was used for fitting, and results were plotted in R.
ATPase assay
Steady-state ATPase activity of UAP56 was measured using an NADH-coupled enzymatic assay. Final reaction mixtures contained 5 U ml−1 rabbit muscle pyruvate kinase (Type III, Sigma-Aldrich), 5 U ml−1 rabbit muscle L-lactic dehydrogenase (Type XI, Sigma-Aldrich), 500 µM phosphoenolpyruvate, and 50 µM NADH. Reactions (10 µl) were assembled in 1,536-well plates using buffer F (25 mM HEPES pH 7.9, 40 mM KCl, 0.5 mM MgCl2, 5% glycerol, and 0.5 mM ATP), with either 2 µM UAP56 or 0.1 µM UAP56 in the presence of LENG8–PSM or SAC3D1–PSM, and 100 µM 15U RNA when indicated. The decrease in NADH fluorescence emission was monitored at 37 °C using a PHERAstar FS plate reader (BMG LABTECH). A calibration curve from a NADH dilution series (0.03–100 µM) was used for quantification. ATPase activity was determined by linear regression of the NADH decay curves, corrected for ATP consumption, and expressed as ATP hydrolysis rates (molecules of ATP hydrolysed per second per enzyme). Reaction components were analysed by SDS–PAGE (4–12% gradient) and visualized using Coomassie staining.
Cryo-EM sample preparation, imaging, and analysis
Cryo-EM sample preparation
For cryo-EM sample preparation we adopted a strategy previously used for UAP56–TREX-2M (ref. 7): We fused UAP56 to PCID2 to optimize complex stochiometry and further fused UAP56 to a UCM and N-UBM. The latter was done to further promote the RNA-clamped conformation of UAP56 and more accurately mimic the native mRNP-bound state of UAP56, where N-UBM and UCM are present at high local concentrations to engage RNA-bound UAP562. The N-UBM and UCM peptides are not observed in our cryo-EM structures and hence are not depicted or discussed in the main text. For cryo-EM grid preparation, LENG8–PCID2–UAP56–N-UBM–SEM1 (at 0.5 mg ml−1) or SAC3D1–PCID2–UAP56–UCM1–N-UBM–SEM1 (at 0.5 mg ml−1) were incubated in buffer G (25 mM HEPES pH 7.9, 5% glycerol, 1 mM MgCl2, 1 mM TCEP, 100 μM 15U RNA) with 1 mM AMP-PNP or 1 mM ATP on ice for 10 min. Cryo-EM grids were then prepared by applying 4 µl of the sample to glow-discharged Cu R1.2/1.3 200-mesh holey carbon grids (Quantifoil). Grids were blotted at 8 °C and 90% humidity and plunged into liquid ethane using a Leica EM GP2.
Cryo-EM data acquisition and processing of a UAP56–LENG8–PSM complex AMP-RNP
Data collection was performed on a Titan Krios G4 electron microscope operating at 300 kV, equipped with a cold field emission gun, a Selectris energy filter (5 eV slit width, ThermoFisher), and a Falcon 4i direct electron detector (ThermoFisher). The objective aperture was retracted, and a 50 µm C2 aperture was used. A total of 5,405 micrographs were recorded using EPU software in .eer format, a pixel size of 0.575 Å per pixel, a total electron dose of 50 e− Å−2, and defocus values ranging from −1 to −2.5 µm. On-the-fly preprocessing, including motion correction and contrast transfer function estimation, was performed using the CryoSPARC48 Live v113 workflow. Approximately 1.3 million particles were picked in WARP, extracted with a 400 Å box, binned to 1.8 Å per pixel, and subjected to 2D classification. Ab initio reconstructions of 45,345 particles selected from the 2D classification yielded a initial map for clamped UAP56 bound to LENG8–PS, which was further subjected to non-uniform refinement, from which 7,886 particles were selected by per-particle scale. These were then 3D refined in Relion 5.0 using BLUSH49, resulting in a 6.2 Å UAP56–RNA–LENG8–PS Map F.
Cryo-EM data acquisition and processing of LENG8–PSM and UAP56-NTD– LENG8– PSM complexes
Data were collected and pre-processed as outlined above. A total of 6,578 micrographs were recorded using EPU software in.eer format, a pixel size of 0.575 Å per pixel, a total electron dose of 50 e− Å−2, and defocus values ranging from −1 to −2.5 µm. Approximately 1.5 million particles were picked in WARP, extracted with a 400 Å box, binned to 1.8 Å per pixel, and subjected to 2D classification, yielding 183.858 LENG8–PS particles. Ab inito reconstruction considering high-resolution frequencies resulted in an interpretable LENG8–PS cryo-EM map from 82,873 particles. These were then re-extracted with a 400 Å box, binned to 0.90 Å per pixel and subjected to a non-uniform refinement yielding the 3.5 Å resolution LENG8–PS Map D. Further 3D classification in Relion 5.0 revealed a subset of 4,824 particles with the UAP56 NTD bound, which refined to 4.86 Å (UAP56-NTD–LENG8–PS, Map E).
Cryo-EM data acquisition and processing of and a SAC3D1–PSM a UAP56–SAC3D1–PSM complex
We collected three datasets with the same microscope specifications and settings as for UAP56–LENG8–PS. Dataset 1 consists of 11,743 micrographs, dataset 2 consists of 6,006 micrographs collected at a tilt angle of 30° and dataset 3 contains 4,543 micrographs. We again performed on-the-fly preprocessing (patch motion correction and contrast transfer function estimation) using the CryoSPARC live routine before picking 4.5, 1.4 and 0.5 million particles (datasets 1, 2 and 3, respectively) in WARP. For processing in CryoSPARC, particles were extracted with a 400 Å box and binned to 1.8 Å per pixel. After 2D classification we obtained 276,000, 47,000 and 93,000 UAP56–SAC3D1–PSM particles and conducted three rounds of heterogeneous refinement using ab initio models generated with the particles from dataset 1 (ref. 50). The resulting 129,495 particles were then re-extracted with a 400 Å box and binned to 0.90 Å per pixel and subjected to a non-uniform refinement yielding the 3.0 Å UAP56–SAC3D1–PS complex Map A. A further local refinement using a UAP56 mask resulted in the 2.6 Å UAP56–AMP-PNP–RNA Map B. The 2D-selected particles from dataset 3 (~93,000) were further subjected to ab initio reconstruction considering high-resolution information, yielding a readily interpretable cryo-EM map of a SAC3D1–PSM complex. Re-extraction with a 400 Å box and binning to 0.90 Å pixel-1, non-uniform refinement and selection of particles per scale yielded 17,936 particles, which allowed for the refinement of a SAC3D1–PSM complex cryo-EM map to 3.60 Å (Map C).
Model building
Structural modelling of all complexes began with Alphafold2 Multimer51 predictions of the respective complexes. The predicted models were docked into the respective maps and manually adjusted using COOT and ISOLDE in ChimeraX. Final refinements were performed in Phenix using the phenix.real_space_refine protocol, applying secondary structure and rotamer restraints to optimize fit and stereochemistry.
HeLa cell culture and cell line generation
HeLa Kyoto or HCT116 cells were grown in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin at 37 °C, 5% CO2. Transient transfections were performed using Lipofectamine 3000 (Invitrogen), according to the manufacturer’s instructions. CRISPR–Cas9 mediated genomic knock-ins using homology dependent repair donor vectors25 of C-terminal 3×Flag, 2×HA–FKBP12(F36V)–V(dTAG)27 in HeLa Kyoto and HCT116 cells was carried as described before8 with single guide RNAs (sgRNAs) and homology arms generated using primers listed in Supplementary Table 6 and cloned into tagging cassettes carrying Hygromycin or Neomycin resistance genes (plasmids listed in Supplementary Table 6). After transfection and antibiotic selection single cell clones were grown and tested by genomic PCR with primers flanking the insertion region, as well as with western blotting analysis. In the GANP–2×HA–dTAG cell line, we observed an additional band, which we interpreted as a truncated protein isoform localized to the cytoplasm. This isoform is produced from an RNA transcript that uses an early polyadenylation site appearing upstream of the tag insertion position. Since dTAGV-1 treatment led to rapid and substantial reduction of full-length GANP we opted to utilize this cell line.
To generate stably expressing LENG8–3×Flag and ZFC3H1–3×Flag constructs of wild-type and mutant variants, HeLa cells were transfected with pBAC vectors as described. Human LENG8 and ZFC3H1 cDNA constructs were cloned and inserted into piggyBAC (pBAC) vectors52 using NEBuilder HiFi DNA assembly (NEB). The LENG8 CDS was inserted into a doxycycline-inducible pBAC vector, harbouring a C-terminal 3×Flag tag and a puromycin selection marker. The ZFC3H1 CDS was inserted into a constitutively expressed pBAC vector, harbouring a C-terminal 3×Flag tag and a Blasticidin selection marker. Generated constructs are listed in Supplementary Table 6. LENG8- and ZFC3H1–2×HA–dTAG cells were transfected with the pBAC vectors along with a piggyBAC transposase expressing vector (pBASE) with Lipofectamine 3000. Cell pools were selected with puromycin or Blasticidin for 7–10 days until negative control cells died. For induction of expression of LENG8 pBAC constructs, cells were incubated for 24 h in culture medium supplemented with 1 mg ml−1 doxycycline (Sigma-Aldrich) before collection. Expression of the constructs was validated by western blotting analysis using antibodies against ZFC3H1 or Flag. Depletion of endogenous dTAG-tagged proteins was performed by the addition of dTAGV-1 to the culture medium for indicated time points at a concentration of 500 nM. Induction of expression of exogenous LENG8–Flag constructs was performed by adding 1 µg ml−1 doxycycline.
Western blotting analysis of whole-cell extracts
Whole-cell protein lysates were prepared using lysis buffer (20 mM Tris-HCl, 0.5% NP-40, 150 mM NaCl, 1.5 mM MgCl2, 10 mM KCl, 10% glycerol, 0.5 mM EDTA, pH 7.9) freshly supplemented with protease inhibitors (Roche). Samples were clarified by centrifugation at 20,000 rcf for 10 min. Sample concentrations were adjusted after Bradford measurement and denatured by the addition of NuPage Loading Buffer (Invitrogen) and NuPage Sample Reducing Agent (Invitrogen) before boiling at 95 °C for 5 min. SDS–PAGE was carried out on NuPage 4%–12% Bis-Tris (Invitrogen) gels migrated in NuPage MOPS Running Buffer (Thermo) and transferred onto PVDF membranes in NuPage Transfer buffer (Thermo) at 4 °C, 15 V overnight. Western blotting analysis was carried out according to standard protocols with the antibodies listed in the Supplementary Table 6 and HRP-conjugated secondary antibodies (Dako). Bands were visualized by Super Signal West Femto chemiluminescent ECL (Thermo) and captured using an ImageQuant 800 imaging systems (GE Healthcare). The uncropped gel images with reference to panels in main and Extended Data figures are presented in Supplementary Fig. 1.
Immunoprecipitation followed by western blotting analysis
Approximately 2 × 107 cells per immunoprecipitation were extracted in lysis buffer (20 mM Tris-HCl, 0.5% NP-40, 150 mM NaCl, 1.5 mM MgCl2, 10 mM KCl, 10% glycerol, 0.5 mM EDTA, pH 7.9) freshly supplemented with protease inhibitors and cleared by centrifugation at 20,000 rcf for 20 min. Clarified lysates were incubated overnight at 4 °C with Flag antibody and Protein G Dynabeads (Thermo). Beads were washed three times with HT150 extraction buffer, transferring beads to a fresh tube on the final wash. For benzonase-treated immunoprecipitations, samples were resuspended in HT150 buffer freshly supplemented with protease inhibitors and 2 mM MgCl2 and split in two. One half of each sample was mock-treated and the other incubated with 500 units of benzonase for 20 min at 25 °C, 12,000 rpm. Samples were washed twice for 5 min at room temperature in 20 mM Tris-HCl pH 8 freshly supplemented with 2 mM CaCl2. Proteins were eluted by boiling in 1× NuPage loading buffer (Invitrogen) for 5 min. Supernatants were mixed with 10× Reducing Agent (Invitrogen) and denatured for a further 5 min at 95 °C before proceeding with western blotting analysis.
Immunoprecipitations followed by mass spectrometry
All immunoprecipitations were performed label-free and in triplicates. GANP–3×Flag, PCID2–3×Flag, LENG8–mAID–3×Flag, and control HeLa Kyoto cells were collected as described above. Protein extractions were performed using material from 15 million cells per immunoprecipitation with 1 ml extraction buffer (20 mM Tris-HCl, 1% IGEPAL, 150 mM NaCl, 1.5 mM MgCl2, 10 mM KCl, 10% glycerol, 0.5 mM EDTA, pH 7.9) supplemented with 1× protease inhibitors cocktail (Roche). After brief sonication (3× 10 s, Amplitude 1, Branson Sonifier 250), the protein extracts were clarified by centrifugation (20,000 rcf for 10 min at 4 °C). Anti-Flag magnetic beads were prepared with anti-Flag M2 antibodies (Sigma F3165) conjugated to Dynabeads M-270 Epoxy (Invitrogen) as previously described53. Beads were washed three times with lysis buffer, for endogenous GANP–3×Flag, LENG8–3×Flag and PCID2–3×Flag immunoprecipitations, lysis buffer with additional NaCl to 450 mM final concentration was used (high stringency). For nuclease treatment beads were resuspended in 40 μl extraction buffer with 2 mM MgCl2, containing 1 μl Pierce Nuclease (for TREX-2 and TREX-2-like immunoprecipitations, Sigma E1014), Benzonase (for ZFC3H1 immunoprecipitations, Sigma) or as a control 1 μl of 1 mg ml−1 BSA (as indicated for the different experiments) and incubated with agitation at 25 °C for 20 min. Beads were washed with extraction buffer once and then proteins were eluted with SDS buffer (2% SDS, 100 mM Tris pH 6.8, 10% glycerol) at 25 °C, shaking for 5 min. Milder lysis and wash conditions using HT150 buffer (20 mM HEPES pH 7.4, 150 mM NaCl, 0.5% Triton X-100) were applied for 3×Flag immunoprecipitations of ZFC3H1—both endogenously and exogenously expressed—as well as for exogenously tagged LENG8–3×Flag. Mass spectrometry sample preparations were performed with the protein aggregation capture (PAC) procedure with proteolytic digestion on MagResyn HILIC beads using trypsin or chymotrypsin as indicated12. The peptides were purified and concentrated on C18 stage tips before subjected to liquid chromatography–mass spectrometry analysis with an Easy nanoLC system coupled directly to a Thermo Scientific Orbitrap Exploris 480 mass spectrometer. Mass spectrometry data were acquired by data dependent acquisition and searched against the UniProt protein sequence database using MaxQuant, with ‘match between runs’ and ‘label-free quantification’ enabled. The MaxQuant protein group output was analysed with the DEP package as previously described44,54,55.
Chemical fractionation of HeLa cells
Chemical fractionation was performed using a protocol adapted from ref. 56. In brief, cells collected using trypsin digestion were first lysed using cytosol extraction buffer (0.15% NP-40, 10 mM Tris pH 7.4, 150 mM NaCl). Then nuclei were separated from cytoplasmic fractions using centrifugation, followed by nuclei washes using PBS solution and extraction of protein using lysis buffer extraction buffer (20 mM Tris-HCl, 1% IGEPAL, 150 mM NaCl, 1.5 mM MgCl2, 10 mM KCl, 10% glycerol, 0.5 mM EDTA, pH 7.9) or RNA using TRIzol reagent according to the manufacturer’s instructions.
Immunofluorescence and colocalization analysis
Cells seeded on microscope coverslips were fixed with 4% paraformaldehyde in PBS for 20 min at room temperature, washed twice with PBS, and permeabilized with 0.1% Triton X-100 in PBS for 10 min at room temperature. Subsequently, cells were washed with PBS twice and blocked with 5% BSA in PBS-T for 1 h at room temperature. Coverslips were incubated for 1 h at room temperature with primary antibody dilution in 1% BSA, followed by three 5 min washes with PBS. Then, coverslips were incubated in a secondary antibody dilution with 1% BSA for 1 h at room temperature. Finally, cells were washed three times for 5 min with PBS, counterstained with DAPI and mounted onto glass slides using ProLong Gold Antifade Mountant. Images were acquired using a Zeiss LSM 980 confocal microscope equipped with Airyscan 2 under 40× or 63× oil-immersion Plan-Apochromat objectives. All images within the same experiment were taken with the same excitation power and exposure time and processed similarly using ZEN Blue 3.6 software. All antibodies and applied concentrations are listed in Supplementary Table 6. Pixel-based colocalization analyses were performed using the ZEN 3.6 (blue edition) colocalization module, with threshold setting based on the control background images and extracting the weighted colocalization coefficients for each image. For each cell line, the colocalization coefficient was calculated from six 40× images in two independent experiments, with at least 139 cells in total included in the analyses.
RNA extraction and RT–qPCR
HeLa and LENG8–2×HA–dTAG, ZFC3H1–2×HA–dTAG and GANP–2×HA–dTAG cells were treated with 500 nM of dTAGV-1 or untreated for 4 h. RNA was extracted using TRIzol (Invitrogen) and treated with TURBO DNase (Invitrogen) according to the manufacturer’s protocol. To measure RNA levels, reverse transcription was carried out with SuperScript III reverse transcriptase (Invitrogen) using 1 µg RNA and a mixture of 20 pmol random hexamer in a 20 µl reaction at 50 °C according to the manufacturer’s protocol. Subsequently, quantitative PCR (qPCR) was performed using Platinum SYBR Green qPCR SuperMix-UDG (Invitrogen) in a ViiA 7 Real-Time PCR machine (Life Technologies with the primers listed in Supplementary Table 6). Relative quantities were calculated by normalizing samples to GAPDH mRNA levels. For pA+ RNA-seq, RNA was quality checked on an Agilent 2100 Bioanalyzer (Agilent Technologies) for integrity before shipping to the sequencing provider.
pA+ RNA-seq library generations
All library construction and sequencing were paid services from the Beijing Genome Institute (BGI) in case of total pA+ RNA-seq and from Lexogen in case of the fractionated and exogenously expressing ZFC3H1 pA+ RNA-seq. Total RNA was extracted using TRIzol reagent according to the manufacturer’s instructions and transferred to BGI or Lexogen, which performed pA+ RNA selection using oligo-dT beads followed by strand-specific library preparation and sequencing.
3′ end-seq RNA library preparation
Triplicates of total 3′ end-seq libraries in presence or absence of EPAP, in EXOSC3–2×HA–dTAG cells, treated or not with dTAGV-1 for 4 h were generated and processed as before57. In brief, to discriminate pA+ from non-polyadenylated (pA−) RNA 3′ ends, 10 μg of RNA was split in two, subjecting one aliquot to in vitro polyadenylation by E. coli poly(A) polymerase (Invitrogen) in a 40 μl reaction at 30 °C (EPAP treated) according to the manufacturer’s protocol, while mock treating the other. Samples were then purified with the PureLink RNAmini kit (Invitrogen) and submitted for RNA 3′ end sequencing.
Analysis of RNA-seq data
Annotation of pA+ PTTs
Polyadenylated PTTs, displaying sensitivity to ZFC3H1 and/or LENG8 depletion, were annotated using a custom pipeline. In brief, starting from our transcriptome annotation of HeLa cells29, transcription units were filtered to be longer or equal to 10 kb. At these transcription units, the pA+ RNA-seq coverage for the ZFC3H1–2×HA–dTAG and LENG8–2×HA–dTAG cell lines treated with DMSO or dTAG, was measured from TSS to transcription end site (TES) with a bin of 50 bp using rtracklayer58. Gene bodies were then scaled to 2 kb, replicates averaged, a pseudocount of 1 added and the log2 fold change (LFC) dTAG/Mock performed for each cell line. Using data from each cell line separately, transcription units were then filtered to display increased signal (LFC > 0.2) within the first 200 bp of the scaled gene body and no such difference in the last 200 bp (LFC < 0.2). Of note, the lenient LFC reflected the accumulation of a PTT overlapping with the full transcription unit, and the present criteria filtered out cases where both the full-length transcription unit and the PTT displayed sensitivity to the specific depletions. Following this, the PTT-harbouring transcription units, identified in ZFC3H1 and LENG8 depletions, were pooled. For each of these the maximum LFC value within the first 200 bp was defined and the last bin reaching 80% of this value along the scaled gene body was used to define an area to screen for the PTT TES. For each transcription unit, in the defined region ±5% of the transcription unit length, we measured, without binning or gene body scaling, the coverage from 3′ end RNA-seq, non-EPAP treated data from ZFC3H1-mAID cells mock or AID treated28. LFCs were measured as before and 3′ end peaks with ZFC3H1-sensitivity (LFC > 1) over the areas of interest were called. In each area, the strongest peak was considered as the PTT TES. A manual curation of the identified PTTs was then performed to filter out artifacts.
Annotation of pA− PTTs
At first introns, the LFCs of normalized coverage at individual positions were calculated upon EXOSC3 depletion in EPAP- and non-EPAP-treated conditions (M.L.R. et al., unpublished observations; Supplementary Fig. 5a). To ensure annotation of unadenylated ends only, the LFC of the non-EPAP condition was subtracted from that of the EPAP condition. The most downstream position, displaying a residual LFC >0.2 was subsequently considered as the TES of the longest unadenylated PTT of the locus.
Processing of total and fractionated pA+ RNA-seq data
The raw sequence reads received from service providers were first quality checked using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads were then trimmed for adaptors and filtered using Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Trimmed reads were mapped to hg38, using Hisat2 in paired-end mode59. Mapped files were sorted and checked for pairing using SAMtools60. Reads were then deduplicated using MarkDuplicates (Picard; https://broadinstitute.github.io/picard/) and further filtered to keep only unique mappers by using SAMtools. Relative samples size was then estimated by generating coverage counts using htseq-count61 (HTSeq-counts) over the Gencode annotation to avoid any bias due to accumulation of short unstable transcripts present in our in-house annotation and then analysed by DESeq230 to define size factors. In the case of fractionated pA+ RNA-seq, size factors were measured separately for the nuclear and cytoplasmic fractions to avoid any compensation of compartment specific phenotypes. Finally, reads were converted to bigwig files normalized to size factors using bamCoverage (Deeptools)62. The RNA sample of HeLa 4-h dTAGV-1 replicate 4 from the nuclear fraction appeared to suffer from a strong technical issue arising from large ribosomal RNA contamination. This replicate was therefore eliminated from all analyses but is still listed as part of the Gene Expression Omnibus (GEO) dataset.
Differential expression analysis
RNA sensitivities to LENG8, ZFC3H1 and GANP depletion were defined based on DESeq2 differential expression analysis of total pA+ RNA-seq using untreated cells as controls. For each depletion transcription units with adjusted P values < 0.1 in DESeq230 analysis were considered as measurable and LFC over control >0.5 was counted as upregulated, while the LFC coverage over control < −0.5 was counted as downregulated. Owing to the strong correlation between coverage changes of LENG8 and ZFC3H1 depletion (Fig. 3c), ‘PAXT-sensitive’ transcription units were defined as upregulated in either of the two depletions. Plots exploring the relationship between exons or processed RNA lengths and PAXT sensitivities (Fig. 3e and Extended Data Figs. 6k and 8i,j) were based on our published in-house HeLa transcript annotation29 and LFC coverage for all transcription units with adjusted P values < 0.1 in DESeq2.
Nuclear to cytoplasmic ratios measurements
Nuclear to cytoplasmic ratios were calculated for each transcription unit using non-log transformed counts of nuclear and cytoplasmic pA+ RNA coverages. Zero value counts were filled with minimal values.
Transcription unit clustering based on fractionated pA+ RNA-seq behaviour
First the average LFC coverage, as measured by rtracklayers, was calculated for the total, nuclear and cytoplasmic fractions. For the fractionated sequencing, all proteins depletions were compared to the maternal HeLa cell line treated with dTAGV-1. The LFC upon ZFC3H1 depletion was then used separately in the nuclear and cytoplasmic fractions, at each transcription unit, to define a behaviour as ‘up’ (>0.5), ‘down’ (<−0.5), or ‘unaffected’. Nine clusters were then generated corresponding to all possible combinations (nuclear ‘up’/cytoplasmic ‘up’, nuclear ‘up’/cytoplasmic ‘unaffected’, etc.). Small clusters (with less than 200 transcription units) were removed from the final heat map in Extended Data Fig. 7c.
Analysis of transcripts with retained introns
For every intron unspliced reads spanning the 5′ and 3′ splice sites were counted using custom code relying on Samtools60, and every intron with at least one unspliced read at both junctions in unperturbed HeLa cells was considered as retained. The genomic coordinates of detained introns were obtained from Boutz et al.36. As this annotation originates from four combined cell lines, we first merged it with our in-house HeLa specific annotation29. Considering the generally unspliced nature of detained introns, we first filtered out these when overlapping totally or partially introns from our annotations. We then further filtered detained introns to be fully included in our exons to avoid overhang at the TSS or TES due to alternative isoforms. Finally, the few cases where a detained intron was starting or ending a transcription unit, without being preceded or followed by an exon, were filtered out. Similarly, reads spanning 5′ and 3′ splice junctions for both retained and detained introns were counted in total, nuclear and cytoplasmic fractions. Introns where dual splice junctions showed an increase upon ZFC3H1 or LENG8 depletion were counted as PAXT-sensitive.
Metagene profiles, heat maps and display of sequencing information
Metagene profiles and heat maps were produced using custom R and Python scripts. In brief, the rtracklayer R package was used to collect read coverage values for the window ±500 bp relative to the TSS or TES, or over specific exonic/intronic features. Coverage values were then binned in 50 nt bins and log2-transformed after the addition of a pseudocount of 1. This measurement of coverage was then used to compute LFC values and generate subsequent plots. Heat maps were made using custom R or Python code based on the R package ComplexHeatmap63 or Seaborn64, respectively. The mean of coverage values across transcription units over each bin were also computed and plotted as metagene profiles using custom R code. A 95% confidence interval of the mean coverage was displayed for each sample and was measured through 50 steps of bootstrap samplings with replacement. Aggregate plots and heat maps of sequencing data were generated based on BigWig files using customized R scripts. Genome browser views based on BigWig files were generated using the R package seqNdisplayR65.
In vivo RNA-binding assays
Cells expressing LENG8–3×Flag of wild-type or R563A mutant variants were induced with doxycycline for 24 h and crosslinked with 150 mJ cm−2 of 254 nm UV lamp using Stratalinker2400 (Stratagene). Lysate preparations, anti-Flag immunoprecipitations, RNAse I (Thermo Scientific), TurboDNAse treatments (Thermo Scientific), radiolabelling using γ-32P ATP (PerkinElmer) and PAGE of RNA–protein complexes were performed as described37. The phosphor imaging of gels with radiolabelled samples was performed using a Typhoon scanner (Amersham).
iCLIP experiments
iCLIP was performed as previously described37 with minor modifications. In brief, HeLa or HeLa LENG8–mAID–3×Flag cells at ~80% confluency were UV-crosslinked at 254 nm with a dose of 150 mJ cm−2 using a Stratalinker 2000. One (UAP56) or two (LENG8–3×Flag) 15-cm plates were used per immunoprecipitation and experiments were performed in duplicate. Whole-cell extracts were sonicated for 30 s and treated with TURBO DNase and RNase I prior to immunoprecipitation, using anti-UAP56 (E7W7M, Cell Signaling Technology) or anti-Flag M2 antibodies immobilized on Protein G Dynabeads. Protein–RNA complexes were subjected to high-salt washes, including freshly added 2 M urea in the wash buffer, and separated by PAGE. RNA was subsequently extracted and iCLIP libraries were constructed38 before sequencing on a NovaSeq platform (Lexogen).
Crosslink positions were mapped to the hg38 human genome using as previously described66. Counts were then generated over our in-house HeLa transcript annotation29 and normalized to transcript abundance through division by the respective transcripts mean log2(coverage) of average replicates of the no dTAGV-1 EXOSC3 control sample (M.L.R. et al., unpublished observations). To display iCLIP log2-transformed coverages across PAXT-sensitive or -insensitive transcript classes (ncRNAs, PROMPTs and PTTs), typical exosome targets were classified based on their sensitivity to ZFC3H1 depletion in pA+ RNA-seq using LFC (>0.5 for ZFC3H1-sensitive and <0.3 but >−0.3 for insensitive transcripts) and curated to remove overlapping transcripts. Comparisons were further restricted to monoexonic ncRNAs.
Puromycin labelling assays
HeLa and LENG8–2×HA–dTAG, ZFC3H1–2×HA–dTAG and GANP–2×HA–dTAG-expressing cells were grown in the presence of 500 µM dTAGV-1 for additional 0, 4 and 24 h. Before collection by snap freezing, 5 µg ml−1 of puromycin was added to cell medium for 30 min. Puromycin incorporation was assessed by western blotting analysis.
Whole-cell proteome analysis using pulsed SILAC
HeLa and LENG8–2×HA–dTAG, ZFC3H1–2×HA–dTAG and GANP–2×HA–dTAG cell lines were initially cultured in DMEM medium in the presence of 73 mg l−1 l-lysine HCl and 28 mg l−1 l-arginine HCl, (Sigma) (Lys0/Arg0 medium) for 24 h. Cells were then pre-treated with either 500 µM dTAGV-1 or an equivalent volume of DMSO for 4 h. Following this, the medium was switched to medium containing 73 mg l−1 l-lysine HCl and 28 mg l−1 l-arginine HCl l-lysine (13C615N2) and l-arginine (13C615N4), (for the Lys8/Arg10 medium) with either dTAGV-1 or DMSO, and cells were cultured for an additional 24 h under the same conditions. In parallel, a matched set of cells was maintained in Lys0/Arg0 medium under the same condition (dTAGV-1 or DMSO). After treatment, cells were collected by snap freezing and SILAC sample preparation and mass spectrometry were carried out as described43. SILAC ratios of Lys8/Arg10 versus Lys0/Arg0 peptides were calculated for each sample. To calculate differential protein expression, the DEP package44 was used to analyse mean LFQ intensities differences of Lys8/Arg10-labelled peptides.
Statistics and reproducibility
In addition to the built-in statistical tests provided by software packages such as Zen Blue, DESeq2 and DEP, further statistical analyses were performed using two-sided t-tests or Welch’s t-test when group sizes differed substantially. Pearson correlation coefficients were used to assess the correlation between LFCs following ZFC3H1, LENG8 or GANP depletion.
Box plots show the median (centre line), interquartile range (box limits), and whiskers represent distribution of most extreme data points within 1.5× the interquartile range; P values of statistical tests are indicated directly on the plots. Outlier dots were excluded from the visual display for clarity but were included in the statistical analysis.
All real-time qPCR assays, RNA-seq, IP–MS and SILAC whole-cell proteomics were performed using three independent biological replicates, each comprising multiple technical measurements. Immunofluorescence staining (at least 100 cells were analysed per condition) and RNA-binding assays were repeated two times using independent batches of cells and all attempts of replication were successful. iCLIP libraries were prepared using two biological replicates. All other experiments, except cryo-EM data collection and processing, were performed at least three times with similar results and all attempts of replication were successful.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Three-dimensional cryo-EM density maps of UAP56–RNA–SAC3D1–PSM have been deposited to the Electron Microscopy Data Bank under the accession numbers EMD-54282 (composite map), EMD-54283 (Map A) and EMD-54284 (Map B). The coordinate file of UAP56–RNA–SAC3D1–PSM has been deposited to the Protein Data Bank under the accession number 9RV1. Cryo-EM density maps of SAC3D1–PSM, LENG8–PSM, UAP56-NTD–LENG8–PSM and UAP56–RNA–LENG8–PSM have been deposited to the Electron Microscopy Data Bank under the accession numbers EMD-56930 (Map C), EMD-56931 (Map D), EMD-56932 (Map E) and EMD-56933 (Map F). The respective coordinate files of SAC3D1–PSM, LENG8–PSM, UAP56-NTD–LENG8–PSM and UAP56–RNA–LENG8–PSM have been deposited to the Protein Data Bank under the accession numbers 28WY, 28WZ, 28XA and 28XB. All newly generated RNA-seq data are available at GEO accession code GSE301785. All newly generated proteomics data are available at PRIDE accession code PXD076297. Raw immunofluorescence images have been deposited in the EMBL Bioimage Archive under accession code S-BIAD3166.
References
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
Article
CAS
ADS
PubMed
PubMed Central
Google Scholar
Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).
Article
CAS
ADS
PubMed
Google Scholar
Schmid, M. & Jensen, T. H. Controlling nuclear RNA levels. Nat. Rev. Genet. 19, 518–529 (2018).
Article
CAS
PubMed
Google Scholar
Wang, J. & Cheng, H. Out or decay: fate determination of nuclear RNAs. Essays Biochem. 64, 895–905 (2020).
Article
CAS
PubMed
Google Scholar
Luo, M. L. et al. Pre-mRNA splicing and mRNA export linked by direct interactions between UAP56 and Aly. Nature 413, 644–647 (2001).
Article
CAS
ADS
PubMed
Google Scholar
Umlauf, D. et al. The human TREX-2 complex is stably associated with the nuclear pore basket. J. Cell Sci. 126, 2656–2667 (2013).
CAS
PubMed
Google Scholar
Hohmann, U. et al. An ATP-gated molecular switch orchestrates human mRNA export. Nature 649, 1042–1050 (2026).
Article
CAS
ADS
PubMed
Google Scholar
Meola, N. et al. Identification of a nuclear exosome decay pathway for processed transcripts. Mol. Cell 64, 520–533 (2016).
Article
CAS
PubMed
Google Scholar
Garland, W. & Jensen, T. H. Nuclear sorting of short RNA polymerase II transcripts. Mol. Cell 84, 3644–3655 (2024).
Article
CAS
PubMed
Google Scholar
Kamieniarz-Gdula, K. & Proudfoot, N. J. Transcriptional control by premature termination: a forgotten mechanism. Trends Genet. 35, 553–564 (2019).
Article
CAS
PubMed
PubMed Central
Google Scholar
Ogami, K. et al. An Mtr4/ZFC3H1 complex facilitates turnover of unstable nuclear RNAs to prevent their cytoplasmic transport and global translational repression. Genes Dev. 31, 1257–1271 (2017).
Article
CAS
PubMed
PubMed Central
Google Scholar
Silla, T. et al. The RNA exosome adaptor ZFC3H1 functionally competes with nuclear export activity to retain target transcripts. Cell Rep. 23, 2199–2210 (2018).
Article
CAS
PubMed
PubMed Central
Google Scholar
Silla, T. et al. The human ZC3H3 and RBM26/27 proteins are critical for PAXT-mediated nuclear RNA decay. Nucleic Acids Res. 48, 2518–2530 (2020).
Article
CAS
PubMed
PubMed Central
Google Scholar
Polak, P. et al. Dual agonistic and antagonistic roles of ZC3H18 provide for co-activation of distinct nuclear RNA decay pathways. Cell. Rep. 42, 113325 (2023).
Article
CAS
PubMed
PubMed Central
Google Scholar
Fan, J. et al. Dual modes of ZFC3H1 confer selectivity in nuclear RNA sorting. Mol. Cell 84, 4297–4313.e7 (2024).
Article
CAS
PubMed
Google Scholar
Pacheco-Fiallos, B. et al. mRNA recognition and packaging by the human transcription–export complex. Nature 616, 828–835 (2023).
Article
CAS
ADS
PubMed
PubMed Central
Google Scholar
Xie, Y. et al. Structures and mRNP remodeling mechanism of the TREX-2 complex. Structure 33, 566–582.e6 (2025).
Article
CAS
PubMed
PubMed Central
Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article
CAS
ADS
PubMed
PubMed Central
Google Scholar
Bryant, P. et al. Improved prediction of protein–protein interactions using AlphaFold2. Nat Commun. 13, 1265 (2022).
Article
CAS
ADS
PubMed
PubMed Central
Google Scholar
Schmid, E. W. & Walter, J. C. Predictomes, a classifier-curated database of AlphaFold-modeled protein–protein interactions. Mol. Cell 85, 1216–1232.e5 (2025).
Article
CAS
PubMed
PubMed Central
Google Scholar
Andersen, P. R. et al. The human cap-binding complex is functionally connected to the nuclear RNA exosome. Nat. Struct. Mol. Biol. 20, 1367–1376 (2
View original source — Nature ↗
