Abstract
The recent revolution in genome sequencing and protein structure prediction has opened new frontiers in understanding, predicting and designing enzyme function1,2. Central to these efforts is the discovery and functional annotation of novel enzymes, which is essential for elucidating the connection between genotype and phenotype and for developing biocatalysts for industrial applications. However, accurately predicting enzymatic function remains a major challenge, and the discovery of new enzymes often relies on serendipity. Here we present a metal-coordination-guided strategy that uses atomic-level mechanistic principles to mine protein structure databases for the targeted discovery of metalloenzymes. We apply this framework to the AlphaFold2 Protein Structure Database to identify new members of the FeII/α-ketoglutarate-dependent halogenase family, which selectively functionalize unactivated C(sp3)-H-bonds, a crucial transformation in the production of pharmaceuticals and other high-value compounds3,4. These radical halogenases constitute a low-abundance class within the large and diverse cupin superfamily5. Owing to low sequence conservation, they have been especially challenging to find against the complex background of related family members, such as hydroxylases, desaturases and epimerases. Our metal-coordination mining methodology reveals several previously unrecognized radical halogenase families spanning diverse phylogenetic space, at minimal computational cost. Our predictions are validated by the experimental characterization of two new radical halogenases, AspX and BtnX. Notably, BtnX shows a substrate promiscuity that is unprecedented in radical halogenases, opening the way for a broad range of biocatalytic applications.
Similar content being viewed by others
Main
Enzymes have a central role in living systems, catalysing on the order of 105 chemical transformations that are needed to sustain cellular metabolism. This catalytic plasticity has been driven by the divergent evolution of large superfamilies containing members that are phylogenetically related yet have diverse functions6,7,8. Accurate functional annotation of individual enzymes is crucial for reaching the level of precision needed to enable both the discovery of new enzymes and the prediction of biological function at an organismal level. Sequence-similarity-based methods take advantage of the vast quantity of DNA sequence available9,10,11; however, the evolution of gene sequence is not dictated just by enzymatic function. Instead, it is confounded by many factors, including positive and negative selection, neutral drift, mutation rate, redundancy, dispensability, network organization, metabolic burden and expression level12. As a result, the primary sequence of an enzyme reflects a complex history of convergent and divergent evolution as well as interspecies transfer and phylogenetic relationships. By contrast, enzyme structure–function relationships are expected to be much more highly conserved because many elements, such as catalytic architecture, substrate-binding pockets, thermodynamic stability, protein dynamics and protein fold, need to be maintained for functionality7,13,14.
Advances in protein structure prediction have enabled the development of large databases, such as the AlphaFold2 (AF2) Protein Structure Database1 and the ESM Metagenomic Atlas15, that offer new opportunities for structure-based discovery. Indeed, domain-based discovery approaches can be applied successfully to specialized structural folds that have been co-opted for specific functions16,17,18. However, it is more common to find that such stable folds have been recruited repeatedly throughout evolution and can thus support an extremely broad range of functions, making it difficult to make precise predictions19. In this regard, distinct enzymatic functions within a structural fold should give rise to conserved differences in the chemical reaction mechanism that can be correlated with atomic-level structural divergence in the active site6,7,8,14. This conceptual framework suggests that key mechanistic information could be encoded in atomistic representations of the active-site chemistry and guide the large-scale mining of protein structure databases for targeted discovery of enzymes with desired functions.
Towards this goal, we have developed a pipeline for the targeted discovery of metalloenzymes. It is estimated that 25–50% of proteins require a metal ion for their function20, using the tunable redox, acid–base and structural properties of metal ions to expand the reaction space beyond what it possible with the standard amino acid functional groups. There has been great interest in the computational identification of these sites using approaches such as co-evolution or machine learning21,22,23,24, but these approaches are currently limited in their ability to predict specific enzymatic functions because of the lack of validated functional annotations, which are needed for accurate training. However, biological metal sites are governed by specific geometric constraints related directly to their catalytic mechanisms, such as those defined by distinct metal-coordination bonds with protein-derived ligands, second-sphere interactions and hydrogen-bond networks. Thus, these types of mechanistically relevant interactions establish three-dimensional (3D) motifs that can be used for the targeted structural mining of specific metal-binding sites as well as for their accurate functional classification, even within the context of highly diverse protein superfamilies. This metal-coordination structural mining approach was successfully implemented and experimentally validated towards the discovery of new FeII/α-ketoglutarate (αKG)-dependent radical halogenases25, which are low-abundance members of the cupin superfamily that catalyse a synthetically useful C–H functionalization reaction, replacing inert C(sp3)-H-bonds with native (X = Cl) and non-native (X = Br, N3) anions26,27 (Fig. 1a).
Metal-coordination mining
Certain protein architectures have been found to be highly recurrent, arising multiple times in evolution. These superfolds are used by many distinct families with diverse functions. The cupin superfold is found in all domains of life and exhibits an exceptionally wide range of biochemical functions—on par with the versatility of the triose phosphate isomerase (TIM) barrel superfold28. Its evolutionary history is complex, but seems to be centred around a 3His-1Glu motif located within a characteristic thermostable double-stranded β-helix (DSBH or jelly roll) barrel fold29. It has been proposed that cupins were originally part of a group of small-molecule-binding domains, recognizing sugars and nucleotides using the 3His-1Glu motif, before diverging into metal-independent members such as transcription factors and sugar epimerases and isomerases, as well as those with the ability to recruit metal ions using the same consensus motif. However, metal-binding abilities are thought to have also arisen through convergent evolution continually within the metal-independent lineage30,31. Thus, cupins contain a broad array of non-catalytic members, as well as metalloenzymes with different metals (Fe, Mn, Zn, Ni, Co, Mg or Cu) and coordination environments. Like many evolvable protein folds, this non-linear evolutionary path leads to a disconnect in which extremely close cupin members can be phylogenetically grouped but remain functionally distinct, with respect not only to their catalytic function, but also to their metal-binding properties32.
The FeII/αKG-dependent enzymes use the cupin superfold and participate in a broad array of metabolic transformations, from the modification of proteins to key steps in the biosynthesis of antibiotics and other natural products. These enzymes use a mononuclear non-haem FeII cofactor and the αKG co-substrate to bind to and activate O2 and generate a reactive FeIV-oxo intermediate33 (Fig. 1a). However, this shared intermediate can catalyse a wide range of reactions, such as hydroxylation, desaturation, cyclization and halogenation33 (Extended Data Fig. 1). The prototypical members of this family bind to FeII using a cupin-derived 2His-1Asp/Glu ‘facial triad’ motif and the αKG co-substrate to form the primary coordination sphere of the metal33. The radical halogenase members are distributed at low frequency within various families of FeII/αKG-dependent enzymes, but are mechanistically distinct with regard to their ability to transfer an Fe-bound anion to the substrate5. This step necessitates the presence of an open coordination site for halide coordination to FeII, which is observed as the conversion of the 2His-1Asp/Glu motif to the non-canonical 2His-1Gly/Ala motif in all previously reported members (Fig. 1a and Extended Data Fig. 2). Thus, we reasoned that this structure–function relationship, in which metal coordination dictates reaction outcomes, could serve as a bioinformatic marker for the identification of radical halogenases.
Sequence-based methods typically rely on pairwise comparisons that scale as N2 (where N is the number of sequences) and thus lack both the sensitivity and the scalability to search for a single amino acid change within a large superfamily where N > 105. To address these limitations, we turned to mechanism-guided structural bioinformatics, with the rationale of using the atomic-level 3D features of the metal coordination in radical halogenases: (i) the 2His coordination motif, which is shared across all members in the superfamily, as a simple structural representation to identify the Fe-binding site; and (ii) the absence of additional coordinating ligands, such as the canonical Asp or Glu residues, for the functional classification of a metal-binding structure as a radical halogenase (Fig. 1a). This approach addresses both the challenge of sensitivity, by using geometric clustering rules in 3D space to improve on the lower detection signal from untargeted alignment scores in one-dimensional (1D) space, and that of scalability, by using a structural motif search that scales as N1.
A structural database of the FeII/αKG-dependent enzyme family was compiled by first retrieving protein sequences annotated to contain the cupin domain (n = 1.8 million) from the approximately 220 million sequences in the InterPro database (Fig. 1b and Supplementary Table 2). After an initial filtering step to remove fragments, duplicates and non-enzymes, the corresponding structural models from the AF2 Protein Structure Database were retrieved (n = 530,814) and mined for the presence of a 2His binding site using the unique distance and orientation relationship that would allow two intra-strand His residues to act as a metal-binding site34. We found that three constraints were sufficient to describe this site, with two H-bonds between the backbone amide and carbonyl groups from HisA and HisB defining their location on adjacent β-strands, and the side-chain Nτ–Nτ distance defining their preorganization for metal coordination (Fig. 1a). Most of these predicted structures contained a 2His metal-binding site with an additional Asp/Glu residue in coordination proximity to the metal site (n = 456,585); a small minority lacked a nearby coordinating residue and instead contained an Ala/Gly residue (n = 946), suggesting radical halogenase activity (Fig. 1b).
The sequence-similarity network (SSN) of these predicted radical halogenases includes all previously known and experimentally verified radical halogenases5,25,35,36 (Fig. 1c), which serves as an initial positive validation of our methodology. This SSN captures radical halogenase clusters (greater than 30% sequence identity) with much higher coverage than what can be achieved using sequence-based methods alone, as illustrated by dechloroacutamine halogenase (DAH), a rare case of a eukaryotic halogenase37. The previously reported sequence homology searches identified only a single DAH homologue37, consistent with our own BLAST search coupled with multiple sequence alignment, which identified only three homologues (Supplementary Fig. 1). By contrast, our metal-coordination mining approach defined a DAH cluster comprising 40 putative radical halogenases from both eukaryotic (fungal and plant) and bacterial origins (Fig. 1c). More broadly, the SSN for the complete bioinformatic atlas of radical halogenases contains 70 clusters that seem to be completely unexplored, with no previous radical halogenase annotation, and which span a vast taxonomic and phylogenetic protein sequence space, including several new families of eukaryotic halogenases (Fig. 1c and Extended Data Fig. 3). This expanded and refined bioinformatic atlas of radical halogenases (Supplementary Data 1) provides a valuable resource for the wider scientific community that can enable novel chemical transformations to be investigated in unique biological contexts and can accelerate the discovery of new biocatalysts with existing experimental high-throughput pipelines38.
Discovery of new radical halogenases
Motivated to further validate our metalloenzyme discovery platform and apply it towards discovering FeII/αKG-dependent halogenases, we selected a new family from our SSN (cluster X) for experimental validation (Fig. 1c). Cluster X was intriguing because of the diverse genomic context of its putative radical halogenase members; some were co-clustered with acyl-carrier protein (ACP)-related genes, indicating that they could be ACP-dependent radical halogenases, whereas others lacked these neighbouring genes, suggesting that they use free-standing substrates (Supplementary Fig. 2). Sequence enrichment of cluster X with all phylogenetically related sequences through BLAST (E-value = 5), followed by generation of a more granular SSN (greater than 50% sequence identity)39, reveals many new subfamilies of radical halogenases (Fig. 1d). This expanded SSN shows a mixed distribution of halogenases and their non-halogenase counterparts, suggesting complex evolutionary trajectories that mirror the broader empirical trend across the FeII/αKG-dependent family. Although this complex evolution poses a considerable challenge in sequence-based enzyme discovery, given the differing locations of the three residues of the 2His-1Ala/Gly motif in the 1D sequence space, these issues are rapidly resolved in the 3D search space19.
An analysis of the genome neighbourhoods of radical halogenase candidates in cluster X revealed subfamilies in which the genes of their ACP-independent radical halogenase members are adjacent to those encoding amino acid transporters and amino-acid-modifying enzymes (for example, ATP-grasp ligases). On the basis of this observation, we hypothesized that some of these radical halogenases act on free amino acids. A putative radical halogenase from the marine pathogen Vibrio campbelli, AspX, was selected from one of these subclusters for biochemical validation (Fig. 2a and Supplementary Table 3). The in vitro reaction of AspX with a mixture of the 20 canonical l-amino acids analysed by liquid chromatography–quadrupole time-of-flight mass spectrometry (LC–QTOF) shows that AspX selectively chlorinates l-aspartic acid but does not act on any of the other amino acids (Fig. 2b and Supplementary Fig. 3). Nuclear magnetic resonance (NMR) characterization of the isolated AspX product (Supplementary Fig. 4) and steady-state kinetics show that AspX converts l-aspartic acid to 3S-chloro-l-aspartic acid (catalytic constant (kcat) = 33.3 ± 0.5 min−1, Michaelis constant (Km) = 0.64 ± 0.02 mM, turnover number = 780 ± 107; Extended Data Fig. 4). AspX can also transfer alternative anions (Br− and N3−) to produce bromo- and azido-l-aspartic acid derivatives (Extended Data Fig. 5a). The discovery of AspX extends the previously reported scope of free amino acid halogenases acting on positively charged (l-lysine and l-ornithine) and non-polar (l-leucine, l-isoleucine and l-norleucine) amino acids5 to the first case of a negatively charged (l-aspartic acid) amino acid.
Another subfamily of cluster X is located at the junction of ACP-dependent and free-standing halogenases (Fig. 1d). Although its members share some sequence similarity with the known ACP-dependent halogenase CurA (around 25–30% sequence identity)40, many of their associated biosynthetic gene clusters lack genes associated with ACP-dependent halogenases, raising the possibility that they accept alternative—and possibly large—substrates (Supplementary Fig. 2). A sequence homology search against the published literature (PaperBLAST41) for members of this subfamily revealed a possible biological role for one of its putative radical halogenase members, BtnX, which is found on the ‘killer plasmid’ of the marine bacterium Dinoroseobacter shibae42 (Fig. 2c). D. shibae exists in a symbiotic relationship with the dinoflagellate algae Prorocentrum minimum, in which the bacterium provides vitamins and other micronutrients to the algal host in return for energy, carbon and nitrogen. These bacterial–algal symbionts grow to form red toxic algal blooms that can expand to large geobiological scales and exert major negative effects on entire ecosystems42. The presence of the killer plasmid has been directly implicated in turning on a pathogenic phase in D. shibae, in which it kill its algal host through micronutrient starvation and adopts a metabolic strategy of feeding off algal senescence43. Although the biochemical and molecular origins of this biological phenomenon remain unclear, gene knockout experiments indicate that Dshi_3684, the gene that encodes BtnX, is positively associated with bacterial pathogenicity42. Notably, Dshi_3684 is co-localized with genes that encode a putative energy-coupling factor (ECF) biotin transporter complex (Fig. 2c). The combined evidence gave rise to the hypothesis that BtnX is a radical halogenase acting on biotin, also known as vitamin B7. The in vitro reaction of BtnX with biotin (Fig. 2d and Supplementary Table 3) and product characterization by NMR (Supplementary Fig. 5) show that BtnX converts biotin to 2R-chlorobiotin. Steady-state kinetics reveal that BtnX has a strong binding affinity for biotin (kcat = 0.96 ± 0.02 min−1, Km < 2 μM, turnover number = 22 ± 6; Extended Data Fig. 4b), consistent with the low levels of biotin in marine environments44 (0.7 nM to less than 1.2 pM). The stereoselectivity of the BtnX chlorination reaction was confirmed by X-ray crystallographic investigations on the aerobic product-bound structure of BtnX (Protein Data Bank (PDB) ID: 9Q04; Supplementary Table 4 and Extended Data Fig. 6). Similarly to AspX and other known radical halogenases5,26, BtnX can transfer alternative anions (Br− and N3−) to produce bromo- and azido-biotin derivatives (Extended Data Fig. 5b). Collectively, on the basis of its strong binding affinity as well as the genomic and biological context, we propose that biotin is the native substrate of BtnX. The discovery and characterization of BtnX increases the known substrate range of radical halogenases beyond previously identified classes to include the functionalization of vitamins.
A versatile radical halogenase
Whereas AspX and all other previously known radical halogenases exhibit relatively narrow substate scopes that are often limited to their respective native substrates5,36,45, BtnX seems to be a highly promiscuous radical halogenase, catalysing the chlorination of a wide range of non-native substrates (Fig. 3). Structure–activity studies reveal that the presence of a propionate head group is the key requirement for substrate recognition and halogenation by BtnX, whereas chemical and structural variation on the substrate ‘tail’ is highly tolerated (Supplementary Fig. 6). Therefore, BtnX can enable access to a wide range of regio- and stereoselectively functionalized molecules, from free and functionalized acids (including bile acids and fluorescent dyes) to amino acids, peptoids and peptides. These transformations are relevant to both the early-stage functionalization of simple molecules to produce α-halo acid precursors and the late-stage derivatization of pharmaceuticals, as demonstrated by the enzymatic halogenation of RLA8, a drug candidate for the treatment of nonalcoholic steatohepatitis and fibrosis46.
To identify the molecular basis for this promiscuity, we solved the anaerobic crystal structure of BtnX bound to biotin in the early stages of its catalytic reaction mechanism. The crystal structure of BtnX (PDB ID: 9PV1; Supplementary Table 5) was solved at 1.20 Å in the presence of its native ligands (biotin, αKG and chloride), and FeII was added under anaerobic conditions. The asymmetric unit contains a BtnX dimer in which the active site of chain A contains αKG and biotin and the active site of chain B contains FeII, αKG, Cl and biotin (Fig. 4a, Extended Data Fig. 7 and Supplementary Fig. 7). This structure reveals two key features of the ability of BtnX to accept such varied substrates. First, the carboxylate head group of biotin seems to form specific interactions within the substrate cavity, through multiple H-bonds with active-site residues (Ser120, Asn119 and Arg248) that position the pro-2R H atom in close proximity to the metal site for stereoselective C–H functionalization (Fig. 4a). Indeed, perturbations of the H-bonding network around Ser120 by site-directed mutagenesis resulted in a decrease in both enzymatic reactivity and chemoselectivity (Extended Data Fig. 8). Second, the rest of the biotin substrate is located in an extended solvent-exposed protein channel and forms mostly non-specific hydrophobic interactions (Fig. 4b and Supplementary Fig. 8e). Together, the combination of specific H-bonding interactions surrounding the target C–H site with a solvent-exposed channel forms the molecular basis of the substrate promiscuity of BtnX.
This protein architecture of BtnX is distinct from that of other radical halogenases, which typically contain buried active-site pockets and make multiple specific interactions across the entire substrate5,47 (Supplementary Fig. 8). The BtnX subcluster exists at a distinct phylogenetic branch point between ACP-dependent radical halogenases (CurA)40 and radical halogenases that act on free-standing small molecules (AspX; Fig. 4c). Structural comparison reveals that, unlike CurA and AspX, the unique solvent-exposed channel of BtnX enables its promiscuous reactivity (Fig. 4d and Supplementary Fig. 8). By contrast, the substrate scope of the more typical members of the radical halogenase family has been more difficult to alter, given the tight geometric constraints enforced by the halogenation mechanism48. As such, BtnX could represent an unusually evolvable radical halogenase that is amenable to laboratory evolution for engineered applications, as well as serving as a template for the discovery of other promiscuous members across the broader FeII/αKG-dependent non-haem iron enzyme family.
The unique promiscuity of BtnX makes it a promising tool for applications in chemoenzymatic synthesis and biocatalysis4. In addition to the expanded substrate scope for chlorination, the chemical space accessed by BtnX extends even further when considering that BtnX can also transfer alternative anions (Br and N3); brominated products are valuable precursors for further simple downstream modifications, such as amination reactions to produce a wide range of non-canonical amino acids. In addition to the biocatalytic utility of the wild-type BtnX, the introduction of G117D or G117E point mutations converts it into a hydroxylase for the production of α-hydroxy acids (Extended Data Fig. 9).
Conclusion
Although enzymes are universally recognized for their high selectivity as catalysts, their evolution is thought to be driven by widespread promiscuity with respect to substrate selectivity and reaction outcomes. This functional divergence is often controlled by a small number of specific residues near the catalytic centre, and cannot be captured by clustering methods based on total sequence similarity. As such, targeted enzyme discovery remains a long-standing challenge, despite its essential role in uncovering new biosynthetic potential in nature as well as in developing tools for biocatalysis and biotechnology. The emergence of accurate protein structure prediction algorithms, such as AF2, has generated large protein structure databases that now allow the use of specific 3D information to search and annotate enzymatic function16,17. Here we show how fundamental mechanistic insights into enzyme function can be distilled into precise atomic-level constraints that describe functionally conserved physical features for mining the AF2 Protein Structure Database.
Despite the absence of metals and other cofactors in the AF2 structure database and in other related databases, the approach presented here uses the intrinsic preorganization of enzyme active sites for metal binding to mine predicted structures of metalloproteins in their fully apo forms. More broadly, metal-coordination-based mining provides a powerful route to uncover novel or rare enzymatic functions that have evolved through complex phylogenetic trajectories within large and diverse enzyme superfamilies. This capability enables the systematic discovery of uncommon catalytic activities that are difficult to access through conventional experimental approaches alone. In the case of FeII/αKG-dependent radical halogenases, 6 enzyme families have been discovered over the past 2 decades through natural products research, whereas our active-site mapping approach rapidly identified 70 previously unrecognized families of putative halogenases. Enzymatic halogenation of unactivated C(sp3)-H sites through radical chemistry is relatively rare in biology, where in addition to the FeII/αKG-dependent halogenases only two other classes of enzyme are known to catalyse such transformations: the binuclear iron49 and the binuclear copper50 halogenases (Extended Data Fig. 2).
Beyond targeted enzyme discovery, our approach makes it possible to uncover new metal-binding motifs and active sites using a fundamental chemical understanding of metal–protein interactions. To demonstrate that potential, we have extended our targeted discovery methodology to examine the diversity and distribution of 2His-Xn metal sites within the cupin superfold, which allows us to uncover both known and new types of putative biological metal site, with a wide range of coordination environments (Extended Data Fig. 10 and Supplementary Table 6). Notably, this methodology is scalable, owing to its algorithmic efficiency that does not require pairwise comparisons, and is thus ideal for managing the current explosion of sequence and structural information. Beyond enzymes, this approach can be extended to any protein with a distinct set of functionally important atomic interactions, such as those involved in recognition or signal transduction, bringing us one step closer to understanding sequence–structure–function relationships at large scale.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Accession codes and sequences for AspX (UniProt ID A0AAC9SM19), BtnX (UniProt ID A8LT50) and all other proteins analysed in this study are provided in Supplementary Table 3. The structures of BtnX under anaerobic conditions in the presence of biotin, αKG, chloride and FeII (PDB ID 9PV1) and after aerobic turnover (PDB ID 9Q04) have been deposited in the Protein Data Bank (PDB). Pfam IDs used in the bioinformatic analysis are listed in Supplementary Table 2. All protein sequence and structure data were obtained from the open-access InterPro database and AlphaFold Protein Structure Database. Source data are provided with this paper.
Code availability
The complete Python (v.3.9.6) code used for the bioinformatic analyses described in this study is available as a Jupyter notebook via Zenodo at https://doi.org/10.5281/zenodo.19737459 (ref. 34). The core functionality for identifying and annotating 2His and 2His-1Asp/Glu metal-binding motifs is also provided as a stand-alone Python package, which is freely available at GitHub (https://github.com/yannikipouros/hal-discovery). The repository includes installation instructions, usage documentation and a demo dataset to facilitate reproducibility and reuse of the workflow.
References
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).
Article
CAS
PubMed
Google Scholar
Devine, P. N. et al. Extending the application of biocatalysis to meet the challenges of drug development. Nat. Rev. Chem. 2, 409–421 (2018).
Article
Google Scholar
Kissman, E. N. et al. Expanding chemistry through in vitro and in vivo biocatalysis. Nature 631, 37–48 (2024).
Article
ADS
CAS
PubMed
Google Scholar
Neugebauer, M. E. et al. A family of radical halogenases for the engineering of amino-acid-based products. Nat. Chem. Biol. 15, 1009–1016 (2019).
Article
CAS
PubMed
Google Scholar
Khersonsky, O., Roodveldt, C. & Tawfik, D. Enzyme promiscuity: evolutionary and mechanistic aspects. Curr. Opin. Chem. Biol. 10, 498–508 (2006).
Article
CAS
PubMed
Google Scholar
Gerlt, J. A. & Babbitt, P. C. Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. Annu. Rev. Biochem. 70, 209–246 (2001).
Article
CAS
PubMed
Google Scholar
Aharoni, A. et al. The ‘evolvability’ of promiscuous protein functions. Nat. Genet. 37, 73–76 (2005).
Article
CAS
PubMed
Google Scholar
Lee, D., Redfern, O. & Orengo, C. Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 8, 995–1005 (2007).
Article
CAS
PubMed
Google Scholar
Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).
Article
ADS
CAS
PubMed
Google Scholar
Altschul, S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article
CAS
PubMed
PubMed Central
Google Scholar
Pál, C., Papp, B. & Lercher, M. J. An integrated view of protein evolution. Nat. Rev. Genet. 7, 337–348 (2006).
Article
PubMed
Google Scholar
Tokuriki, N. & Tawfik, D. S. Protein dynamism and evolvability. Science 324, 203–207 (2009).
Article
ADS
CAS
PubMed
Google Scholar
Davidi, D., Longo, L. M., Jabłońska, J., Milo, R. & Tawfik, D. S. A bird’s-eye view of enzyme evolution: chemical, physicochemical, and physiological considerations. Chem. Rev. 118, 8786–8797 (2018).
Article
CAS
PubMed
Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article
ADS
MathSciNet
CAS
PubMed
Google Scholar
Yoon, P. H. et al. Structure-guided discovery of ancestral CRISPR–Cas13 ribonucleases. Science 385, 538–543 (2024).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Nomburg, J. et al. Birth of protein folds and functions in the virome. Nature 633, 710–717 (2024).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Gaschignard, G. et al. AlphaFold2-guided description of CoBaHMA, a novel family of bacterial domains within the heavy-metal-associated superfamily. Proteins 92, 776–794 (2024).
Article
CAS
PubMed
Google Scholar
Babbitt, P. C. et al. A functionally diverse enzyme superfamily that abstracts the α protons of carboxylic acids. Science 267, 1159–1161 (1995).
Article
ADS
CAS
PubMed
Google Scholar
Waldron, K. J. & Robinson, N. J. How do bacterial cells ensure that metalloproteins get the correct metal?. Nat. Rev. Microbiol. 7, 25–35 (2009).
Article
CAS
PubMed
Google Scholar
Hekkelman, M. L., De Vries, I., Joosten, R. P. & Perrakis, A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat. Methods 20, 205–213 (2023).
Article
CAS
PubMed
Google Scholar
Cheng, Y. et al. Co-evolution-based prediction of metal-binding sites in proteomes by machine learning. Nat. Chem. Biol. 19, 548–555 (2023).
Article
ADS
CAS
PubMed
Google Scholar
Dürr, S. L., Levy, A. & Rothlisberger, U. Metal3D: a general deep learning framework for accurate metal ion location prediction in proteins. Nat. Commun. 14, 2713 (2023).
Article
ADS
PubMed
PubMed Central
Google Scholar
Laveglia, V., Bazayeva, M., Andreini, C. & Rosato, A. Hunting down zinc(II)-binding sites in proteins with distance matrices. Bioinformatics 39, btad653 (2023).
Article
CAS
PubMed
PubMed Central
Google Scholar
Vaillancourt, F. H., Yeh, E., Vosburg, D. A., O’Connor, S. E. & Walsh, C. T. Cryptic chlorination by a non-haem iron enzyme during cyclopropyl amino acid biosynthesis. Nature 436, 1191–1194 (2005).
Article
ADS
CAS
PubMed
Google Scholar
Matthews, M. L. et al. Direct nitration and azidation of aliphatic carbons by an iron-dependent halogenase. Nat. Chem. Biol. 10, 209–215 (2014).
Article
CAS
PubMed
PubMed Central
Google Scholar
Gomez, C. A., Mondal, D., Du, Q., Chan, N. & Lewis, J. C. Directed evolution of an iron(II)- and α-ketoglutarate-dependent dioxygenase for site-selective azidation of unactivated aliphatic C−H bonds. Angew. Chem. 135, e202301370 (2023).
Article
ADS
Google Scholar
Dunwell, J. M., Culham, A., Carter, C. E., Sosa-Aguirre, C. R. & Goodenough, P. W. Evolution of functional diversity in the cupin superfamily. Trends Biochem. Sci. 26, 740–746 (2001).
Article
CAS
PubMed
Google Scholar
Dunwell, J. M., Purvis, A. & Khuri, S. Cupins: the most functionally diverse protein superfamily?. Phytochemistry 65, 7–17 (2004).
Article
CAS
PubMed
Google Scholar
Galperin, M. Y. & Koonin, E. V. Divergence and convergence in enzyme evolution. J. Biol. Chem. 287, 21–28 (2012).
Article
CAS
PubMed
Google Scholar
Iyer, L. M., Abhiman, S., De Souza, R. F. & Aravind, L. Origin and evolution of peptide-modifying dioxygenases and identification of the wybutosine hydroxylase/hydroperoxidase. Nucleic Acids Res. 38, 5261–5279 (2010).
Article
CAS
PubMed
PubMed Central
Google Scholar
Uberto, R. & Moomaw, E. W. Protein similarity networks reveal relationships among sequence, structure, and function within the cupin superfamily. PLoS One 8, e74477 (2013).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Krebs, C., Galonić Fujimori, D., Walsh, C. T. & Bollinger, J. M. Non-heme Fe(IV)–oxo intermediates. Acc. Chem. Res. 40, 484–492 (2007).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Kipouros, I. & Chang, M. yannikipouros/hal-discovery: Metal-coordination mining pipeline for radical halogenases (v1.0). Zenodo https://doi.org/10.5281/zenodo.19737459 (2026).
Hillwig, M. L. & Liu, X. A new family of iron-dependent halogenases acts on freestanding substrates. Nat. Chem. Biol. 10, 921–923 (2014).
Article
CAS
PubMed
Google Scholar
Zhao, C. et al. An Fe2+ - and α-ketoglutarate-dependent halogenase acts on nucleotide substrates. Angew. Chem. Int. Ed. 59, 9478–9484 (2020).
Article
CAS
Google Scholar
Kim, C. Y. et al. The chloroalkaloid (−)-acutumine is biosynthesized via a Fe(II)- and 2-oxoglutarate-dependent halogenase in Menispermaceae plants. Nat. Commun. 11, 1867 (2020).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Glasser, N. R., Cui, D., Risser, D. D., Okafor, C. D. & Balskus, E. P. Accelerating the discovery of alkyl halide-derived natural products using halide depletion. Nat. Chem. 16, 173–182 (2024).
Article
CAS
PubMed
PubMed Central
Google Scholar
Zallot, R., Oberg, N. & Gerlt, J. A. The EFI web resource for genomic enzymology tools: leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways. Biochemistry 58, 4169–4182 (2019).
Article
CAS
PubMed
PubMed Central
Google Scholar
Khare, D. et al. Conformational switch triggered by α-ketoglutarate in a halogenase of curacin A biosynthesis. Proc. Natl Acad. Sci. USA 107, 14099–14104 (2010).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Price, M. N. & Arkin, A. P. PaperBLAST: text mining papers for information about homologs. mSystems 2, e00039-17 (2017).
Article
PubMed
PubMed Central
Google Scholar
Mansky, J. et al. The influence of genes on the “killer plasmid” of Dinoroseobacter shibae on its symbiosis with the dinoflagellate Prorocentrum minimum. Front. Microbiol. 12, 804767 (2022).
Article
PubMed
PubMed Central
Google Scholar
Wang, H. et al. Identification of genetic modules mediating the Jekyll and Hyde interaction of Dinoroseobacter shibae with the dinoflagellate Prorocentrum minimum. Front. Microbiol. 6, 1262 (2015).
Article
PubMed
PubMed Central
Google Scholar
Wienhausen, G. et al. The overlooked role of a biotin precursor for marine bacteria—desthiobiotin as an escape route for biotin auxotrophy. ISME J. 16, 2599–2609 (2022).
Article
CAS
PubMed
PubMed Central
Google Scholar
Matthews, M. L. et al. Substrate positioning controls the partition between halogenation and hydroxylation in the aliphatic halogenase, SyrB2. Proc. Natl Acad. Sci. USA 106, 17723–17728 (2009).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Li, M. H. et al. RLA8—a new and highly effective quadruple PPAR-α/γ/δ and GPR40 agonist to reverse nonalcoholic steatohepatitis and fibrosis. J. Pharmacol. Exp. Ther. 369, 67–77 (2019).
Article
ADS
CAS
PubMed
Google Scholar
Mitchell, A. J. et al. Structural basis for halogenation by iron- and 2-oxo-glutarate-dependent enzyme WelO5. Nat. Chem. Biol. 12, 636–640 (2016).
Article
CAS
PubMed
PubMed Central
Google Scholar
Büchler, J. et al. Algorithm-aided engineering of aliphatic halogenase WelO5* for the asymmetric late-stage functionalization of soraphens. Nat. Commun. 13, 371 (2022).
Article
ADS
PubMed
PubMed Central
Google Scholar
Nakamura, H., Schultz, E. E. & Balskus, E. P. A new strategy for aromatic ring alkylation in cylindrocyclophane biosynthesis. Nat. Chem. Biol. 13, 916–921 (2017).
Article
CAS
PubMed
Google Scholar
Chiang, C.-Y. et al. Copper-dependent halogenase catalyses unactivated C−H bond functionalization. Nature 638, 126–132 (2025).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Download references
Acknowledgements
M.C.Y.C. acknowledges support from the National Institutes of Health (NIH; R01 GM13427 and R35 GM161243) and Princeton University. I.K. acknowledges support from the Miller Institute for Basic Research in Science at the University of California, Berkeley. We thank P. Jeffrey (NSLS-II beamline, Brookhaven National Laboratory) and E. N. Kissman (ALS beamline 8.3.1, Lawrence Berkeley National Laboratory) for assistance with X-ray diffraction data collection, and I. Pelczer and K. Conover for assistance with NMR characterization.
Funding
This work was funded by the NIH (R01 GM13427 and R35 GM161243), Princeton University and the Miller Institute for Basic Research in Science at the University of California, Berkeley.
Author information
Authors and Affiliations
Department of Chemistry, University of California, Berkeley, Berkeley, CA, USA
Ioannis Kipouros
Miller Institute for Basic Research in Science, University of California, Berkeley, Berkeley, CA, USA
Ioannis Kipouros
Department of Chemistry, Princeton University, Princeton, NJ, USA
Ioannis Kipouros & Michelle C. Y. Chang
Authors
Ioannis Kipouros
Michelle C. Y. Chang
Contributions
I.K. conceptualized the study, designed experiments, collected and analysed data and wrote the manuscript. M.C.Y.C. conceptualized the study, designed experiments, analysed data, supervised the project and wrote the manuscript.
Corresponding authors
Correspondence to
Ioannis Kipouros or Michelle C. Y. Chang.
Ethics declarations
Competing interests
A patent application related to biocatalytic applications of BtnX has been filed by Princeton University, on which I.K. and M.C.Y.C. are inventors.
Peer review
Peer review information
Nature thanks Sandra Castillo, Yi Tang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Fig. 1 Consensus catalytic mechanism and reactivity scope of FeII/αKG-dependent enzymes.
The consensus mechanism of FeII/αKG-dependent enzymes involves O2 binding and activation to form a reactive high-spin (S = 2) FeIV-oxo intermediate that enables a diverse range of chemical transformations. Most enzymes in this family use the 2His-1Asp/Glu facial triad motif (blue box). However in radical halogenases, the Asp/Glu ligand is replaced by a non-coordinating residue (Ala/Gly) that enables anion coordination (X = Cl), a key requirement for halogenation (red box).
Extended Data Fig. 2 Representative members of FeII/αKG-dependent and other types of radical halogenases.
Fe/αKG-dependent radical halogenases from the previously characterized families act on substrates that are either bound to acyl carrier proteins as part of PKS, NRPS, or mixed PKS/NRPS assemblies or in their free-standing forms. The binuclear Fe halogenases (CylC) and binuclear Cu halogenases (ApnU) are also proposed to use radical reaction mechanisms.
Extended Data Fig. 3 SSN of the bioinformatic atlas of radical halogenases.
The SSN for radical halogenases (Fig. 1c) is shown with the complete cluster numbering (1–68) and with taxonomic annotations (superkingdom and phylum) for each node.
Extended Data Fig. 4 Steady-state kinetic characterization of AspX and BtnX.
a, The NADH-coupled assay used for measuring the steady-state kinetics of FeII/αKG-dependent enzymes by monitoring succinate formation (CoA, coenzyme A; SCS, succinyl-CoA synthetase; PEP, phosphoenolpyruvate; PK, pyruvate kinase; LDH, lactate dehydrogenase). b, Michaelis–Menten analysis for the reaction of AspX with l-aspartic acid ([AspX]o = 5.0 μM, pH = 7.5, 25 °C). Data points (black open circles) represent mean ± s.d. (n = 3 technical replicates, red open circles). kcat, Km and kcat/Km were calculated by non-linear curve fitting to the Michaelis–Menten equation. Error in kcat/Km is obtained by propagation from the individual kinetic terms. c, The Vmax dependence on [BtnX]o for the reaction of BtnX with biotin ([biotin]o = 0.2 mM » Km, pH = 7.5, 25 °C) allows for the estimation of its kcat parameter (<2 μM). Note that due to its low Km value, the reaction of BtnX with biotin does not exhibit the typical Michaelis–Menten saturation curve within the required [BtnX]o regime required for the kinetic assay. Data points (black open circles) represent mean ± s.d. (n = 3 technical replicates, red open circles). d, Michaelis–Menten analysis for the reaction of BtnX with octanoic acid ([BtnX]o = 2.5 μM, pH = 7.5, 25 °C). The greater-than-tenfold increase in the binding affinity of BtnX for biotin over octanoic acid implicates the role of the ureido ring of biotin in the substrate–protein interactions which is consistent with the substrate-bound structure (Supplementary Fig. 8e) as well as with our proposal that biotin is the native substrate of BtnX. kcat, Km and kcat/Km were calculated by non-linear curve fitting to the Michaelis–Menten equation (n = 2 technical replicates, red circles). Error in kcat/Km is obtained by propagation from the individual kinetic terms.
Source data
Extended Data Fig. 5 C–H functionalization with alternative anions by AspX and BtnX.
a, In addition to chlorination, AspX also catalyses the bromination and azidation of l-aspartic acid when chloride is replaced by bromide (100 mM) or azide (10 mM), respectively (2 h, room temperature). The reactions were derivatized with Fmoc-Cl and analysed by LC-QTOF in negative mode using a C18 column (Fmoc-aspartic acid, Fmoc-Asp, m/zcalc = 354.0983; Fmoc-bromoaspartic acid, Fmoc-Br-Asp, m/zcalc = 432.0088; Fmoc-azidoaspartic acid, Fmoc-N3-Asp, m/zcalc = 395.0997). b, In addition to chlorination, BtnX also catalyses the bromination and azidation of biotin when chloride is replaced by bromide (100 mM) or azide (10 mM), respectively (2 h, room temperature). The reactions were analysed by LC-QTOF in negative mode using a C18 column. (Biotin, m/zcalc = 243.0809; Bromobiotin, Br-Biotin, m/zcalc = 320.9914; Azido-biotin, N3-Biotin, m/zcalc = 284.0823). Chromatograms are individual traces extracted for the m/zcalc values and are representative of n = 3 technical replicates.
Source data
Extended Data Fig. 6 The aerobic BtnX structure contains the bound product formed in the single-turnover in-crystallo reaction.
a, The aerobic BtnX structure (1.96 Å, PDB ID: 9PV1) forms a dimer, equivalent to that of the anaerobic BtnX structure (Fig. 4a) and with the same dimeric interface (Supplementary Fig. 7). b, Two neighbouring loops (residues 50–57 and residues 275–276, red) near the αKG-binding site are disordered in chain B (missing in the region outlined with dashed red lines), but not in chain A (residues showed in red). c,d, The active sites in each chain of this structure reveal crystallographic snapshots of product release intermediates: chain A contains the bound product, 2R-chlorobiotin and αKG (omit map, 2.0 σ), but lacks the Fe cofactor and the succinate co-product (c), and chain B contains Fe, succinate and a bound H2O ligand (omit map, 2.0 σ), but lacks the bound 2R-chlorobiotin product (d). e, Proposed mechanism for chlorination of biotin and product release from this crystal structure of BtnX. The starting structure (left) represents the anaerobic crystallographic intermediate (Fig. 4b). After formation of the putative product-bound intermediate (dashed box), exchange of ligands as part of the product release can result in the aerobic crystallographic intermediates from chains A (blue box) and B (grey box).
Extended Data Fig. 7 Extended analysis of the anaerobic crystal structure of BtnX bound to FeII, αKG, chloride and biotin.
a, Representative BtnX crystal in a hanging drop (2.5 μl) before addition of FeII. b, The anomalous diffraction map (4.5 σ) in chain B is consistent with the presence and position of FeII, the chloride ligand and the sulfur atom in the biotin substrate. c, The single-wavelength anomalous scattering confirms the assignment of the bound metal as Fe. d, In chain B, the presence of a low-occupancy water ligand (w1, 16% occupancy) that is weakly bound to FeII (Fe–O = 2.3 Å, left) is consistent with the consensus model for radical halogenases, where binding of the primary substrate and the αKG co-substrate triggers the release of the water ligand, opening a metal-coordination site for O2 binding and activation (right). Owing to its low occupancy, the w1 ligand is omitted for clarity from the related main text and supplementary figures. e, Structural comparison of the BtnX active sites in chain A (biotin, αKG, FeII, and Cl; omit map 2.0 σ) versus chain B (biotin and αKG; omit maps in Fig. 4) reveals that binding of FeII and Cl to the BtnX/biotin/αKG complex leads to displacement of a structured water (w2) by the chloride ligand and conformational change of the bound αKG co-substrate.
Extended Data Fig. 8 Site-directed mutagenesis of BtnX to examine the contributions of active-site H-bonds in reactivity and chemoselectivity.
a, The active site of BtnX (chain B, PDB ID: 9PV1) shows an extended H-bonding network (dashed yellow lines) that involves the carboxylate of the bound biotin substrate, the αKG co-substrate, and nearby residues (Y64, Q119, S120, R248). The H-bond network between biotin/S120/Y64/R248 (shaded in blue) seems to be crucial for biotin binding and orientation of its α-C(sp3)-H-bonds. b, Site-directed mutagenesis of BtnX shows that the biotin/S120/Y64 H-bonding network is crucial for both reactivity and chemoselectivity (halogenation versus hydroxylation). In vitro reactions of wild-type (wt) and mutants of BtnX with biotin (5 mol% enzyme, 1 h, room temperature) were analysed by LC-QTOF (−, no enzyme control; Biotin, m/zcalc = 243.0809; OH-Biotin, m/zcalc = 259.0758; Cl-Biotin, m/zcalc = 277.0419). Data represent mean ± s.d. of technical replicates (n = 3). Statistical significance was assessed using two-sided unpaired Welch’s t-tests comparing each variant to wild type within each product channel (m/z), with Benjamini–Hochberg correction for multiple comparisons. Adjusted P values are reported.
Source data
Extended Data Fig. 9 Engineering BtnX into an α-carboxylic acid radical hydroxylase.
a, The active site of BtnX (chain B, PDB ID: 9PV1) shows that G117 creates space for chloride binding to the FeII centre. b, Mutation of G117 to Asp or Glu would introduce a carboxylate side chain that is expected to coordinate to the FeII centre and prevent chloride coordination. This defines a protein engineering framework for converting BtnX from a halogenase to a hydroxylase. c, Site-directed mutagenesis of BtnX shows that introduction of the G117D or G117E mutation fully suppresses halogenation while increasing hydroxylation by 40 ± 9-fold and 18 ± 6-fold, respectively. In vitro reactions of wild-type (wt) and mutants of BtnX with biotin (5 mol% enzyme, 1 h, room temperature) were analysed by LC-QTOF (−, no enzyme control; Biotin, m/zcalc = 243.0809; OH-Biotin, m/zcalc = 259.0758; Cl-Biotin, m/zcalc = 277.0419). Data represent mean ± s.d. of technical replicates (n = 3). Statistical significance was assessed using two-sided unpaired Welch’s t-tests comparing each variant to wild type within each product channel (m/z), with Benjamini–Hochberg correction for multiple comparisons. Adjusted P values are reported.
Source data
Extended Data Fig. 10 Diversity and distribution of 2His-Xn sites.
The SSN from representative sampling across (nSSN = 3,000; for groups with >10 members from Supplementary Information Table 5) across the total 2His metal-binding sites in the cupin superfold (ntotal = 538,160) shows a wide diversity and distribution of 2His-Xn metal sites with different number (n = 0, 1, 2, 3, 4) and identities (X = Asp, Glu, His, Asn, Gln, Cys, Met, Tyr, or none) ligands.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
About this article
Cite this article
Kipouros, I., Chang, M.C.Y. Targeted enzyme discovery using metal-coordination mining.
Nature (2026). https://doi.org/10.1038/s41586-026-10716-z
Download citation
Received: 04 September 2025
Accepted: 27 May 2026
Published: 01 July 2026
Version of record: 01 July 2026
DOI: https://doi.org/10.1038/s41586-026-10716-z
View original source — Nature ↗


