References
Shanahan, M., McDonell, K. & Reynolds, L. Role play with large language models. Nature 623, 493–498 (2023).
Article
ADS
CAS
PubMed
Google Scholar
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics 15, 102–114 (2021).
Article
ADS
CAS
Google Scholar
Bernstein, L. et al. Single-shot optical neural network. Sci. Adv. 9, eadg7904 (2023).
Article
CAS
PubMed
PubMed Central
Google Scholar
Zheng, H. et al. Multichannel meta-imagers for accelerating machine vision. Nat. Nanotechnol. 19, 471–478 (2024).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Zheng, H. et al. Meta-optic accelerators for object classifiers. Sci. Adv. 8, eabo6410 (2022).
Article
PubMed
PubMed Central
Google Scholar
Luo, M. et al. Meta-optics based parallel convolutional processing for neural network accelerator. Laser Photonics Rev. 18, 2300984 (2024).
Article
ADS
Google Scholar
Liu, C. et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron. 5, 113–122 (2022).
Article
Google Scholar
Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photon. 11, 441–446 (2017).
Article
ADS
CAS
Google Scholar
Ashtiani, F., Geers, A. J. & Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature 606, 501–506 (2022).
Article
ADS
CAS
PubMed
Google Scholar
Feldmann, J. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 52–58 (2021).
Article
ADS
CAS
PubMed
Google Scholar
Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science 361, 1004–1008 (2018).
Article
ADS
MathSciNet
CAS
PubMed
Google Scholar
Zhou, T. et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics 15, 367–373 (2021).
Article
ADS
CAS
Google Scholar
Antonik, P., Marsal, N., Brunner, D. & Rontani, D. Human action recognition with a large-scale brain-inspired photonic computer. Nat. Mach. Intell. 1, 530–537 (2019).
Article
Google Scholar
Wang, T. et al. Image sensing with multilayer nonlinear optical neural networks. Nat. Photon. 17, 408–415 (2023).
Article
ADS
CAS
Google Scholar
Xia, F. et al. Nonlinear optical encoding enabled by recurrent linear scattering. Nat. Photon. 18, 1067–1075 (2024).
Article
ADS
CAS
Google Scholar
Luo, X. et al. Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible. Light Sci. Appl. 11, 158 (2022).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Huang, C. et al. A silicon photonic–electronic neural network for fibre nonlinearity compensation. Nat. Electron. 4, 837–844 (2021).
Article
CAS
Google Scholar
Fu, T. et al. Photonic machine learning with on-chip diffractive optics. Nat. Commun. 14, 70 (2023).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Dong, B. et al. Partial coherence enhances parallelized photonic computing. Nature 632, 55–62 (2024).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Xu, Z. et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science 384, 202–209 (2024).
Article
ADS
CAS
PubMed
Google Scholar
McMahon, P. L. The physics of optical computing. Nat. Rev. Phys. 5, 717–734 (2023).
Article
Google Scholar
Yildirim, M., Dinc, N. U., Oguz, I., Psaltis, D. & Moser, C. Nonlinear processing with linear optics. Nat. Photon. 18, 1076–1082 (2024).
Article
ADS
CAS
Google Scholar
Goi, E. et al. Nanoprinted high-neuron-density optical linear perceptrons performing near-infrared inference on a CMOS chip. Light Sci. Appl. 10, 40 (2021).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Chen, Y. et al. All-analog photoelectronic chip for high-speed vision tasks. Nature 623, 48–57 (2023).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature 588, 39–47 (2020).
Article
ADS
CAS
PubMed
Google Scholar
Feng, H. et al. Integrated lithium niobate microwave photonic processing engine. Nature 627, 80–87 (2024).
Article
ADS
CAS
PubMed
Google Scholar
Xu, X. et al. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature 589, 44–51 (2021).
Article
ADS
CAS
PubMed
Google Scholar
Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 10012–10022 (IEEE, 2021).
Cui, K. et al. Spectral convolutional neural network chip for in-sensor edge computing of incoherent natural light. Nat. Commun. 16, 81 (2025).
Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Wei, K. et al. Spatially varying nanophotonic neural networks. Sci. Adv. 10, eadp0391 (2024).
Article
PubMed
PubMed Central
Google Scholar
Qu, G. et al. All-dielectric metasurface empowered optical-electronic hybrid neural networks. Laser Photonics Rev. 16, 2100732 (2022).
Article
ADS
CAS
Google Scholar
Rahimi, A. & Recht, B. Random features for large-scale kernel machines. In Proc. 21st International Conference on Neural Information Processing Systems (NIPS’07) 1177–1184 (Curran Associates, 2007).
Choromanski, K. M. et al. Rethinking attention with performers. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).
Zhang, Y. et al. Image super-resolution using very deep residual channel attention networks. In Proc. European Conference on Computer Vision (ECCV) 286–301 (CVF, 2018).
Wang, Q. et al. ECA-net: efficient channel attention for deep convolutional neural networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11534–11542 (CVF, 2020).
Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (NIPS’17) 6000–6010 (Curran Associates, 2017).
Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).
Cordts, M. et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3213–3223 (CVF, 2016).
Perazzi, F. et al. A benchmark dataset and evaluation methodology for video object segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 724–732 (CVF, 2016).
Jocher, G. Ultralytics YOLOv5. https://github.com/ultralytics/yolov5 (2020).
Zhu, X. et al. Deformable DETR: deformable transformers for end-to-end object detection. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).
Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention Mask Transformer for universal image segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1290–1299 (CVF, 2022).
Pan, H., Hong, Y., Sun, W. & Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes. IEEE Trans. Intell. Transp. Syst. 24, 3448–3460 (2022).
Article
Google Scholar
Xie, E. et al. SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021).
Google Scholar
Ranftl, R., Bochkovskiy, A. & Koltun, V. Vision transformers for dense prediction. In Proc. IEEE/CVF International Conference on Computer Vision (ICCV) 12179–12188 (CVF, 2021).
Bhat, S. F., Alhashim, I. & Wonka, P. AdaBins: depth estimation using adaptive bins. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 4009–4018 (CVF, 2021).
Yang, L. et al. Depth anything: unleashing the power of large-scale unlabeled data. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10371–10381 (CVF, 2024).
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K. & Koltun, V. Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1623–1637 (2020).
Article
ADS
Google Scholar
Zitova, B. & Flusser, J. Image registration methods: a survey. Image Vis. Comput. 21, 977–1000 (2003).
Article
Google Scholar
Bergevin, R., Soucy, M., Gagnon, H. & Laurendeau, D. Towards a general multi-view registration technique. IEEE Trans. Pattern Anal. Mach. Intell. 18, 540–547 (1996).
Article
ADS
Google Scholar
Ravi, N. et al. Sam 2: Segment anything in images and videos. In Proc. International Conference on Learning Representations (ICLR 2025) (ICLR, 2025).
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Article
ADS
Google Scholar
Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).
Schüldt, C., Laptev, I. & Caputo, B. Recognizing human actions: a local SVM approach. In Proc. 17th International Conference on Pattern Recognition (ICPR 2004) Vol. 3, 32–36 (IEEE, 2004).
Zheng, Z., Wei, Y. & Yang, Y. University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In Proc. 28th ACM International Conference on Multimedia 1395–1403 (ACM, 2020).
Berman, M., Triki, A. R. & Blaschko, M. B. The Lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4413–4421 (CVF, 2018).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: inverted residuals and linear bottlenecks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4510–4520 (CVF, 2018).
Han, K. et al. GhostNet: more features from cheap operations. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1580–1589 (CVF, 2020).
Han, K. et al. Model Rubik’s cube: twisting resolution, depth and width for tinynets. Adv. Neural Inf. Process. Syst. 33, 19353–19364 (2020).
Google Scholar
Tan, M. & Le, Q. EfficientNet: rethinking model scaling for convolutional neural networks. In Proc. 36th International Conference on Machine Learning 6105–6114 (PMLR, 2019).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2016).
Article
ADS
PubMed
Google Scholar
He, K., Gkioxari, G., Dollár, P. & Girshick, R. B. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision (ICCV) 2961–2969 (CVF, 2017).
Lin, T.-Y., Goyal, P., Girshick, R. B., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. IEEE International Conference on Computer Vision (ICCV) 2980–2988 (CVF, 2017).
Tan, M., Pang, R. & Le, Q. V. EfficientDet: scalable and efficient object detection. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10781–10790 (2020).
Liu, S. et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. In Proc. European Conference on Computer Vision (ECCV 2024) 38–55 (Springer, 2025).
Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015) 234–241 (Springer, 2015).
Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2881–2890 (CVF, 2017).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision (ECCV) 801–818 (CVF, 2018).
Eigen, D., Puhrsch, C. & Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proc. 28th International Conference on Neural Information Processing Systems (NIPS’14) 2366–2374 (MIT Press, 2014).
Wofk, D., Ma, F., Yang, T.-J., Karaman, S. & Sze, V. FastDepth: fast monocular depth estimation on embedded systems. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6101–6108 (IEEE, 2019).
Hazirbas, C., Ma, L., Domokos, C. & Cremers, D. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In Proc. Asian Conference on Computer Vision (ACCV 2016) 213–228 (Springer, 2017).
Peng, J. Code for optical metasurfaces for general vision processing on the edge. Zenodo https://doi.org/10.5281/zenodo.19382032 (2026).
Download references
View original source — Nature ↗


