AI Research Radar

A summary of recent AI research papers, research-tool releases, and lab updates.

I made this small AI research radar to keep up with AI news related to research. It autonomously collects signals such as new papers, lab announcements, model and developer-tool updates, research-writing tools, AI-for-science work, mathematical reasoning, literature-search systems, robotics, hardware, and selected company news. The results are filtered by a set of research-oriented keywords.

UpdatedJune 1, 2026, 12:12 AM ET

Items20

Priority cutoff15

20 visible items

arXiv — formal proof search May 21, 2026

Advancing Mathematics Research with AI-Driven Formal Proof Search

This paper evaluates LLM-driven formal proof search in Lean for open mathematics. Its strongest agent solved 9 of 353 open Erdős problems, proved 44 of 492 OEIS conjectures, and shows how AI-aided proof search can start contributing to real mathematical research.

Keywords: evaluation, formal proof, formal proof search, Lean, LLM · score 102

Source Mix

Interleaved papers, lab updates, and research-tool signals.

NVIDIA Developer Blog May 26, 2026

Run Key Genomics and Protein Folding Workloads Faster with NVIDIA RTX PRO 4500 Blackwell

NVIDIA describes how its BioNeMo stack, Parabricks, and RTX PRO 4500 Blackwell Server Edition GPU accelerate precision-medicine workloads. Parabricks moves genomic analysis tasks such as alignment and variant calling from hours to minutes; the RTX PRO 4500 Blackwell gives roughly 2x gains for tools including Minimap2, fq2bam, and DeepVariant. For protein work, OpenFold3 sees up to 2.4x speedups, while Smith-Waterman…

Keywords: Blackwell, life sciences, molecular, NVIDIA, protein · score 47

Google Research Blog May 19, 2026

Empirical Research Assistance (ERA): From Nature publication to catalyzing Computational Discovery

Google Research's ERA is a Gemini-based research coding system that searches literature, writes code, explores solutions, and evaluates results for scientific problems. The Nature-published work reports expert-level performance across genomics, public health, satellite imagery, neuroscience, time-series forecasting, and mathematics, and feeds into the Computational Discovery trusted-tester tool.

Keywords: computational discovery, empirical research assistance · score 26.6

NVIDIA Blog May 28, 2026

NVIDIA Research Advances Robotics From Simulation to the Real World

NVIDIA summarizes eight ICRA 2026 research papers focused on moving robotics policies from simulation into real-world deployment. The work spans GPU-accelerated multi-arm planning, Isaac Lab-trained navigation policies that transfer across robot bodies, cluttered-object grasping, deformable-object manipulation, precise assembly, and vision-language-action reliability. Reported results include 3x faster multi-arm pla…

Keywords: NVIDIA, robotics · score 21.2

Hugging Face Blog May 27, 2026

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Artificial Analysis and IBM introduce ITBench-AA, an agentic enterprise IT benchmark focused first on Kubernetes SRE incident response. Frontier models inspect logs, traces, metrics, and topology to identify root-cause entities; the launch report says all frontier models scored below 50%, with Claude Opus 4.7 and GPT-5.5 near the top.

Keywords: agentic, benchmark · score 20.2

Microsoft Research Blog May 21, 2026

MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on everyday tasks.

Keywords: agentic, workflow · score 19.1

OpenAI News May 28, 2026

How Endava builds an agentic organization with Codex

Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.

Keywords: agentic · score 15.1

Anthropic News May 28, 2026

Introducing Claude Opus 4.8 Product May 28, 2026 An upgrade to our Opus class of models, with stronger performance across coding, agentic tasks, and professional work, and the consistency to handle long-running work.

Anthropic introduces Claude Opus 4.8 as an upgrade to its Opus model family, emphasizing stronger coding, agentic-task, and professional-work performance. The release is relevant for research workflows because it targets long-running, consistency-sensitive work such as coding, analysis, document-heavy tasks, and tool-using agent workflows.

Keywords: agentic · score 12.1

Apple Machine Learning Research Apr 23, 2026

ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel

Apple researchers introduce ParaRNN, a framework that makes nonlinear RNNs trainable in parallel by recasting recurrent computation as a system solved with Newton-style iterations and parallel scans. The ICLR 2026 Oral work reports a 665x speedup over sequential training, enables 7B-parameter GRU/LSTM-style language models competitive with transformers and Mamba, and includes a public codebase for experimenting with…

Keywords: source/tag match · score 6

arXiv — formal proof and mathematical reasoning May 22, 2026

Agentic Proving for Program Verification

This paper tests Claude Code as an agentic prover for Lean 4 program verification on the CLEVER benchmark. It reports high rates of valid specifications, implementation certification, and end-to-end verified generation, suggesting current program-verification benchmarks may be too easy for modern agentic proving systems.

Keywords: agentic, benchmark, evaluation, Lean, Lean 4, theorem proving · score 57.1

NVIDIA Developer Blog May 27, 2026

NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance

Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to...

Keywords: agentic, Blackwell, LLM, NVIDIA, RAG, retrieval augmented generation · score 40.9

Google Research Blog May 28, 2026

A New Era of Innovation: Google Research at I/O 2026

Google Research summarizes I/O 2026 research launches across science, agents, open models, hardware, weather, and quantum. Highlights include Gemini for Science, ERA and Co-Scientist work, Antigravity 2.0 for multi-agent development, Gemma V4, and research moving into product and scientific workflows.

Keywords: source/tag match · score 5

Hugging Face Blog May 23, 2026

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA's Nemotron-Labs Diffusion models bring diffusion-style text generation to LLM workflows. The family supports autoregressive, diffusion, and self-speculation modes, with open 3B, 8B, and 14B text models plus training code; NVIDIA reports substantially higher token-per-forward-pass throughput while keeping familiar deployment paths.

Keywords: diffusion · score 6.7

Microsoft Research Blog May 28, 2026

Data Formulator 0.7: AI-powered data analytics for enterprise data

Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into actionable insights.

Keywords: agents · score 11

arXiv — formal proof and mathematical reasoning May 25, 2026

Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4

This Lean 4 paper targets a bottleneck in parallel tactic search: each branch often re-runs expensive elaboration instead of reusing the proof state. The authors introduce proof-state snapshotting in the Lean language server so many search branches can reuse one elaborated state, making portfolio-style proof search more practical.

Keywords: Lean, Lean 4, proof search, theorem proving · score 54

NVIDIA Developer Blog May 19, 2026

NVIDIA-Verified Agent Skills Provide Capability Governance for AI Agents

Autonomous AI agents are becoming more capable. Open models, Model Context Protocol (MCP)-connected tools, and portable skills are also making agents easier to...

Keywords: agentic, agents, MCP, Model Context Protocol, NVIDIA · score 38.2

arXiv — formal proof and mathematical reasoning May 28, 2026

Formalizing Mathematics at Scale

AutoformBot is a multi-agent Lean 4 system for autoformalizing textbook mathematics at scale. It produced Atlas, a verified library with more than 45,000 declarations and 500,000 lines of Lean code from 26 open-access graduate-level textbooks, suggesting large-scale autoformalization is becoming technically and economically feasible.

Keywords: agents, formal verification, Lean, Lean 4, LLM, open-source · score 51

NVIDIA Developer Blog May 26, 2026

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates

NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...

Keywords: CUDA, GPU, NVIDIA · score 28.3

arXiv — formal proof and mathematical reasoning May 27, 2026

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning

This paper studies when Lean can be trusted as a judge for natural-language math answers, finding that proof success is highly coverage-dependent and sometimes unfaithful. It introduces COVCAL, a selective-risk method that accepts answers only when Lean-trace diagnostics support a calibrated accuracy guarantee, otherwise abstaining.

Keywords: autoformalization, Lean, mathematical reasoning · score 44

arXiv — AI/ML/CL/CV/stat.ML May 28, 2026

SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?

SoundnessBench tests whether LLMs can judge the methodological soundness of machine-learning research ideas before costly experiments. Across 12 frontier models, the benchmark finds a strong optimism bias toward weak proposals, suggesting current AI scientist systems still need better proposal-evaluation tools.

Keywords: agents, AI scientist, benchmark, hypothesis generation, scientific discovery · score 41.8