[
  {
    "id": "2061348706235064640",
    "author": "mli0603",
    "text": "This is THE moment of Physical AI!\n\nWe are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀\n\n- Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions.\n- It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.”\n- Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks.\n\nHuge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate.\n\nThe future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community.\n\nWelcome to the era of Physical AI.\n\nHuggingFace: https://t.co/QW5h5pIWWM\nProject Website: https://t.co/Jppa0gkn16\nCode: https://t.co/aJgaLm5BaG",
    "created_at": "Mon Jun 01 07:26:44 +0000 2026",
    "likes": 76,
    "views": "82512",
    "url": "https://x.com/i/status/2061348706235064640",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061348341494194176/vid/avc1/480x270/-64W9Dxe5EPTosiR.mp4?tag=27"
    ],
    "round_first_seen": 1,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061334037151781362",
    "author": "zekun_hao",
    "text": "Look what we’re cooking! Cosmos 3 is a family of unified omnimodal world model (language, image, video, audio, action), topping multiple benchmarks! Proud to have led Cosmos3-Super-Image2Video, now the #1 open I2V model on Artificial Analysis. Hope it empowers the community! https://t.co/EFnjsZooD6",
    "created_at": "Mon Jun 01 06:28:26 +0000 2026",
    "likes": 9,
    "views": "403",
    "url": "https://x.com/i/status/2061334037151781362",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061329899680567296/vid/avc1/468x270/vqdaYlU6-N8u3N2l.mp4?tag=14"
    ],
    "round_first_seen": 1,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061308040410992803",
    "author": "oprydai",
    "text": "Cosmos 3 is here.\n\nthis is not just another AI model release.\n\nthis is NVIDIA moving deeper into Physical AI.\n\nwhy it matters:\n• it combines language, images, video, audio, and actions\n• it can reason about the physical world\n• it can generate worlds, not just pixels\n• it supports robot action generation\n• it connects simulation, robotics, and embodied AI\n\nthe big shift:\nAI is moving from “predict the next token”\nto “predict the next state of the world.”\nthat matters for robots.\n\nbecause robots don’t live inside text boxes.\n\nthey live inside messy rooms, factories, roads, kitchens, warehouses, hospitals, and human environments.\n\nCosmos 3 is basically infrastructure for this:\n• text → image\n• image → video\n• video → world\n• world → action\n• action → policy\n• policy → robot behavior\n\nthis is where robotics starts getting interesting.\nnot because we suddenly solved embodiment.\nbut because the stack is forming:\n• world models\n• simulation\n• synthetic data\n• robot policies\n• multimodal reasoning\n• physical interaction\n\nthe future of AI is not only chatbots.\nit is machines that can see, predict, simulate, and act.\n\nPhysical AI is becoming real.",
    "created_at": "Mon Jun 01 04:45:08 +0000 2026",
    "likes": 19,
    "views": "1494",
    "url": "https://x.com/i/status/2061308040410992803",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJs76Q2a0AAb8YD.jpg"
    ],
    "round_first_seen": 1,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2059884113050665434",
    "author": "Pseudo_Sid26",
    "text": "RL, RLHF and RLVR are very different from each other.\n\nAgents + RL pipelines are booming these days. Some are using monitored RLHF pipeline over agents while some are using verifiable rewards ( RLVR ) over the agents correctness or test passes.\n\nBut if and only you realise, even basic classical RL reward-penalty system with agents work well too.\n\nThis could be a confusion but just using reward and penalty systems on a scoring algo over the agent isn't completely RLVR. \n\nNot every agent needs:\n\n-policy gradients\n-environment rollouts\n-massive reward models\n-expensive RL training pipelines\n\nSometimes the system just needs:\n\n-reward if task completed correctly\n-penalty if hallucinated\n-penalty if unnecessary tool calls\n-reward if latency reduced\n-reward if output format validated\n-penalty if retries increase\n\nThat’s it.\n\nThe agent slowly starts behaving better because the orchestration layer keeps scoring outcomes and adjusting decisions.\n\nHowever yes, for long horizon tasks - RLVR is the moat and goes best with agents.\n\nIf you wanna read and understand more on this, check out these-\n\n>Hugging Face RL for agents workshop\n>Multi-Agent Collaborative Reward Design paper\n>Agent-reward-bench paper\n>Agent-RLVR paper",
    "created_at": "Thu May 28 06:26:58 +0000 2026",
    "likes": 28,
    "views": "920",
    "url": "https://x.com/i/status/2059884113050665434",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJYsgsrbYAE26yD.jpg",
      "https://pbs.twimg.com/media/HJYszrIawAAYVjl.png",
      "https://pbs.twimg.com/media/HJYtI4CbgAA83Gu.jpg",
      "https://pbs.twimg.com/media/HJYtVFDaAAAgF-L.jpg"
    ],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2059730852909912435",
    "author": "rronak_",
    "text": "Great question! An important distinction of continual learning is to not just hillclimb a singular metric like traditional RL- it's also important to ensure during training the model retains general capabilities, and also provide dense reward so that it builds the right process, not just end state hacking. \n\nA lot of the algorithmic research we're doing with self distillation is to address this!",
    "created_at": "Wed May 27 20:17:57 +0000 2026",
    "likes": 3,
    "views": "231",
    "url": "https://x.com/i/status/2059730852909912435",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2059290969002893348",
    "author": "tonitrades_",
    "text": "@GeniusGTX What people miss is that finetuning literally trains out the weird stuff.\nRLHF rewards average human ratings.\nNovel thoughts get penalized before you ever see them.",
    "created_at": "Tue May 26 15:10:01 +0000 2026",
    "likes": 0,
    "views": "48",
    "url": "https://x.com/i/status/2059290969002893348",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2059721494821830868",
    "author": "SparkDexAI",
    "text": "rFLR rewards have been distributed 🪂\n\nSparkDEX LPs may now claim their rewards.\n\nStart LPing now to earn rFLR in the next epoch.\n\nhttps://t.co/2KRS9kfp88 https://t.co/LvJzZxfX5e",
    "created_at": "Wed May 27 19:40:46 +0000 2026",
    "likes": 104,
    "views": "3225",
    "url": "https://x.com/i/status/2059721494821830868",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJWZyDLWcAMDGOR.jpg"
    ],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2060297637778387451",
    "author": "HuggingModels",
    "text": "Key highlights: trained with reward hacking techniques to optimize for specific metrics, making it great for RLHF research. It's a PEFT model with safetensors for safe, fast loading. Despite only 8 downloads, it's a niche gem for those exploring advanced fine-tuning methods.",
    "created_at": "Fri May 29 09:50:09 +0000 2026",
    "likes": 0,
    "views": "242",
    "url": "https://x.com/i/status/2060297637778387451",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2061146484897182083",
    "author": "gavinz0228",
    "text": "📌 RLHF's blind spot: a single reward model can't handle diverse human values. New work introduces In-Context Reward Adaptation — robust preference modeling that adapts to unseen preferences without retraining. #AI #RLHF https://t.co/1JmkePn19N",
    "created_at": "Sun May 31 18:03:10 +0000 2026",
    "likes": 0,
    "views": "7",
    "url": "https://x.com/i/status/2061146484897182083",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2059692527184462025",
    "author": "Naveen869541478",
    "text": "📌 Reward Distribution\n\nRewards are:\n⚡ First-come, first-served\n⚡ Limited by quota availability\n\nCompleting tasks alone is not enough -you must also submit an application for verification after meeting requirements",
    "created_at": "Wed May 27 17:45:40 +0000 2026",
    "likes": 0,
    "views": "43",
    "url": "https://x.com/i/status/2059692527184462025",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2061681968895685097",
    "author": "VizuaraAI",
    "text": "A reward function is a proxy, not the goal itself.\n\nDr. Rajat Dandekar explains that reward hacking occurs when agents optimise the proxy without satisfying real intent. In RLHF, this is why evaluation, KL control, red teaming, and human judgment are essential.\n\nhttps://t.co/XiO7WPXJUC",
    "created_at": "Tue Jun 02 05:31:00 +0000 2026",
    "likes": 1,
    "views": "44",
    "url": "https://x.com/i/status/2061681968895685097",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJyQTDXacAAKoUg.jpg"
    ],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2060108291142013041",
    "author": "lare888",
    "text": "the RTG experience is the new gold standard for incentives I think\n\nnot just proof of human but proof of work, time, attention\n\nhoping for the best, cheers",
    "created_at": "Thu May 28 21:17:46 +0000 2026",
    "likes": 33,
    "views": "1601",
    "url": "https://x.com/i/status/2060108291142013041",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2061057595591262219",
    "author": "clxymox",
    "text": "📌 Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI\n\n🔹 Auto-évaluation des modèles\n🔹 Optimisation par feedback\n\n🔗 https://t.co/7vw59laf2v #Python",
    "created_at": "Sun May 31 12:09:58 +0000 2026",
    "likes": 0,
    "views": "14",
    "url": "https://x.com/i/status/2061057595591262219",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 100,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2060909120820011025",
    "author": "badlogicgames",
    "text": "for those who hate reading,  demo video of the final speech to speech pipeline, featuring @mitsuhiko.\n\npretty solid for &lt; 9gb of unified RAM use for STT, TTS, and LLM combined. https://t.co/zgZ0Vzq6OP",
    "created_at": "Sun May 31 02:19:58 +0000 2026",
    "likes": 64,
    "views": "10692",
    "url": "https://x.com/i/status/2060909120820011025",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2060908839331921921/vid/avc1/480x270/vCFpPEPUM8TjihiH.mp4?tag=27"
    ],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2060027375610445825",
    "author": "HooCrypto",
    "text": "Why Speech Engine and not roll-your-own voice?\n\n• Ultra-low latency back-and-forth\n• One integrated pipeline (no stitching vendors) \n• BYO LLM: your prompts and journal stay on your server \n• Natural interruptions\n\n> @ElevenLabs owns the hard part: voice.\n> You own the part that matters: memory + privacy.",
    "created_at": "Thu May 28 15:56:14 +0000 2026",
    "likes": 1,
    "views": "765",
    "url": "https://x.com/i/status/2060027375610445825",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJav-kwWUAARGKP.jpg"
    ],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2060426641701269917",
    "author": "caydengineer",
    "text": "OpenAI just dropped a completely new kind of model\n\ngpt-realtime-translate takes in speech audio from any language and outputs speech in your target language\n\nLLMs are great, but you need specialized models for specialized use cases\n\nWe're running this on our smart glasses https://t.co/uJnGdL5DlE",
    "created_at": "Fri May 29 18:22:46 +0000 2026",
    "likes": 222,
    "views": "99356",
    "url": "https://x.com/i/status/2060426641701269917",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/ext_tw_video/2060426609266671616/pu/vid/avc1/480x270/qeX3rzJOTnnfDI8q.mp4?tag=25"
    ],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2060103820924055980",
    "author": "WesRoth",
    "text": "ElevenLabs released the Speech Engine Skill, a new tool that lets developers add voice to AI agents without leaving their existing LLM stack.\n\nThe skill handles the voice layer around an agent while the developer’s server continues streaming the LLM response. https://t.co/je1DHNHmhi",
    "created_at": "Thu May 28 21:00:00 +0000 2026",
    "likes": 40,
    "views": "3270",
    "url": "https://x.com/i/status/2060103820924055980",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJXIPDtbEAAqq69.jpg"
    ],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2061334147923321317",
    "author": "ArxivSound",
    "text": "Jun-Hak Yun, Seung-Bin Kim, Seong-Whan Lee, \"ImmersiveTTS: Environment-Aware Text-to-Speech with Multimodal Diffusion Transformer and Domain-Specific Representation Alignment,\" https://t.co/IRjDTvHmQD",
    "created_at": "Mon Jun 01 06:28:53 +0000 2026",
    "likes": 4,
    "views": "929",
    "url": "https://x.com/i/status/2061334147923321317",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2061386995256164699",
    "author": "MarketLens_AI",
    "text": "@ABC The gap between LLM intelligence and physical actuation is narrowing. We’re moving from 'smart speakers' to 'embodied AI.' The transition from pipe dream to utility is closer than ever.",
    "created_at": "Mon Jun 01 09:58:53 +0000 2026",
    "likes": 0,
    "views": "39",
    "url": "https://x.com/i/status/2061386995256164699",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2061070466609987622",
    "author": "tldr_ai_papers",
    "text": "New research dives deep into \"Audio Jailbreaks\" 🚨, revealing vulnerabilities in Large Audio-Language Models (LALMs). The study finds Narrative Framing and Acoustic \"Best-of-N\" attacks are particularly potent, with the latter exposing significant weaknesses in audio processing.",
    "created_at": "Sun May 31 13:01:06 +0000 2026",
    "likes": 2,
    "views": "62",
    "url": "https://x.com/i/status/2061070466609987622",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2060836306989765056",
    "author": "temfr13",
    "text": "What will happen is:\n\nWe will start training models on English language grammar. This allows output to be presented in a meaningful way.\n\nLocal models will be trained on subject matter. This allows training to be very limited.\n\nA mixer will be needed to combine the two. Current models that train on the whole universe will be dead",
    "created_at": "Sat May 30 21:30:38 +0000 2026",
    "likes": 0,
    "views": "242",
    "url": "https://x.com/i/status/2060836306989765056",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2060509713440125037",
    "author": "HEI",
    "text": "Multilingual Phonological Feature Recognition with Self-Supervised Speech Models\n\nAbner Hernandez, Tomás Arias-Vergara, Daiqi Liu, Andreas Maier, Paula Andrea Pérez-Toro\nhttps://t.co/Tf5JI3MMEA [𝚌𝚜.𝙲𝙻]\n💬Submitted to Interspeech 2026",
    "created_at": "Fri May 29 23:52:52 +0000 2026",
    "likes": 1,
    "views": "106",
    "url": "https://x.com/i/status/2060509713440125037",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2061694288044634311",
    "author": "ArxivSound",
    "text": "Junseok Lee, Sangyong Lee, Chang-Jae Chun, \"FastSLM: Hierarchical Temporal Abstraction for Efficient Long-Form Speech Adaptation,\" https://t.co/YbihDckiJD",
    "created_at": "Tue Jun 02 06:19:57 +0000 2026",
    "likes": 0,
    "views": "122",
    "url": "https://x.com/i/status/2061694288044634311",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 101,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2061272656096190488",
    "author": "use_base",
    "text": "The biggest shift from I/O 2026 wasn’t one feature — it was how many everyday tasks suddenly became multimodal by default.\n\nWhat happened:\n• A new model family, Gemini Omni, debuted — built to create from any input, starting with video.\n• It mixes images, audio, video, and text to generate grounded video outputs in one flow.\n• A redesigned intelligent Search box now accepts text, images, files, videos, and even open Chrome tabs.\n• A faster tier, Gemini 3.5 Flash, joined the lineup.\n\nWhy it matters:\nAI is moving from text boxes to “drop anything in.” That means creators, students, operators, and curious pros can turn mixed scraps — screenshots, clips, notes — into something usable without expert skills.\n\nWho’s affected:\nAnyone who relies on AI for research, content drafts, quick explanations, or daily problem‑solving.\n\nWhat to try or watch next:\n• Explore mixed‑input video generation for explainers or mockups.\n• Test whether multimodal search becomes a go‑to habit.\n• See how conversational editing changes your workflow speed.\n\nOne caveat:\nThe jump is big — but we’ll need to see how consistent video grounding and multimodal search accuracy are outside demos.\n\nOptional: save this if it changes how you use AI.",
    "created_at": "Mon Jun 01 02:24:32 +0000 2026",
    "likes": 0,
    "views": "46",
    "url": "https://x.com/i/status/2061272656096190488",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJsci-_WwAU-3MJ.jpg",
      "https://pbs.twimg.com/media/HJscjs2WwAQZt5V.jpg"
    ],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061487186021904605",
    "author": "bstuartTI",
    "text": "Playing catchup after the AI conference? I am! \n\nIf you're looking for a no-nonsense overview of Google's Omni-Flash video model, check out @invideoOfficial's post.\n\nI really appreciate content that gets straight to the point and doesn't waste my time. High value per minute watched! Please make more of these.",
    "created_at": "Mon Jun 01 16:37:00 +0000 2026",
    "likes": 2,
    "views": "351",
    "url": "https://x.com/i/status/2061487186021904605",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061656614487331171",
    "author": "chenhsuanlin",
    "text": "Very excited to introduce Cosmos 3: Omnimodal World Models for Physical AI! 🚀\n\nhttps://t.co/xwrFwHsXY6\n\nCosmos 3 works across language, image, video, audio, and action . It brings together capabilities that often live in separate systems: multimodal reasoning, image/video/audio generation, action modeling, world simulation, and robot policy learning, all within a single unified omnimodal world model 🌎. This creates a more direct path from perception to simulation to control 🤖. \n\nCosmos 3 also ranks #1 among open models on multiple reasoning and generation benchmarks and leaderboards. This was a huge one-team collaboration across NVIDIA. It brought together research, engineering, data, simulation, infrastructure, deployment, and many other efforts. I am deeply grateful to everyone who contributed and proud to have been part of this journey.\n\nTechnical report: https://t.co/5qBo1Js5yi\nModels: https://t.co/qBi2j9gvkH\nCode: https://t.co/ZlIeDxW3lx\nWebsite: https://t.co/xwrFwHsXY6\n\n#nvidia #cosmos #physicalai #worldmodels #robotics",
    "created_at": "Tue Jun 02 03:50:15 +0000 2026",
    "likes": 32,
    "views": "1622",
    "url": "https://x.com/i/status/2061656614487331171",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061655817787678720/vid/avc1/480x270/6JXuUaS4VPVFoI04.mp4?tag=27"
    ],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061508097676374246",
    "author": "wainwrightollie",
    "text": "What is Multimodal AI? 🤖 One model that can see, hear, read &amp; speak — not just text. Here's how it works, in 30s ↓ #AI #MultimodalAI #GPT4o #MachineLearning https://t.co/MpSOvgrgyz",
    "created_at": "Mon Jun 01 18:00:06 +0000 2026",
    "likes": 0,
    "views": "21",
    "url": "https://x.com/i/status/2061508097676374246",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/ext_tw_video/2061508079833763840/pu/vid/avc1/320x568/bajSpbAg2WzcHb7a.mp4?tag=12"
    ],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061474876083274238",
    "author": "TheTuringPost",
    "text": "NVIDIA's Cosmos 3 is what we've never seen before ‒ an Omnimodal World Model. It's closing the loop for physical AI.\n\nAll stack in one system: world and multimodal understanding, future generation, reasoning and action\n\nThis is the next step in Jensen Huang’s AI progression: perception AI → generative AI → agentic AI → physical AI\n\nHere is what you need to know about it ↓",
    "created_at": "Mon Jun 01 15:48:05 +0000 2026",
    "likes": 20,
    "views": "2870",
    "url": "https://x.com/i/status/2061474876083274238",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061474452177608705/vid/avc1/480x270/wEdzZNKCYX93gIrx.mp4?tag=27"
    ],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061319060193955958",
    "author": "KaichunMo",
    "text": "Super proud to be part of NVIDIA's Cosmos3 Physical-AI Omnimodal Foundation Model. Topping the image/video/sound generation benchmarks and Robot Policy Benchmarks 😀",
    "created_at": "Mon Jun 01 05:28:56 +0000 2026",
    "likes": 46,
    "views": "4830",
    "url": "https://x.com/i/status/2061319060193955958",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061761862723833875",
    "author": "Chinazhidx",
    "text": "1/5🚀Qwen Lab unveiled Qwen3.7-Plus, a new multimodal model built as a unified agent foundation.\n\nIt can understand real-world scenes, read screens, operate GUIs, generate code, navigate mobile apps, and answer visual questions using web knowledge—all within a single agent loop.",
    "created_at": "Tue Jun 02 10:48:28 +0000 2026",
    "likes": 0,
    "views": "8",
    "url": "https://x.com/i/status/2061761862723833875",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061323810301972552",
    "author": "Chunkyweb3",
    "text": "NVIDIA just dropped Cosmos 3 the first fully open omni model with native vision reasoning, world modeling, and action generation. \n\nthis is massive for @axisrobotics \n\nwhy?\n\nwhile NVIDIA builds the powerful open foundation models + world simulators\n\nAXIS building the decentralized data engine that feeds them crowdsourced human trajectories, photorealistic IsaacLab sims, domain randomization, and a closed-loop data flywheel anyone can contribute to.\n\nPerfect complementarity better models + massively scalable real world diverse data\n\nfaster sim-to-real for everyone",
    "created_at": "Mon Jun 01 05:47:48 +0000 2026",
    "likes": 2,
    "views": "77",
    "url": "https://x.com/i/status/2061323810301972552",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061519419214528724",
    "author": "kepochnik",
    "text": "@gippp69 google omni goated AI model ngl\n\nlove it so much",
    "created_at": "Mon Jun 01 18:45:05 +0000 2026",
    "likes": 0,
    "views": "40",
    "url": "https://x.com/i/status/2061519419214528724",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 102,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061423956796887499",
    "author": "CyberRobooo",
    "text": "NVIDIA Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions.\n\nSimply put, it enables humanoid robots to truly understand physical laws --knowing when things might slip or break, and how to grip them stably. It can also simulate the entire action sequence in its “mind” beforehand, think through the steps clearly, and avoid mistakes before executing them.\n\nso,tasks like folding clothes, pouring water, and caring for the elderly could become much more reliable for humanoid robots in the future.\n\n（(The humanoid robot in the video is AGIBOT Genie G2)）",
    "created_at": "Mon Jun 01 12:25:45 +0000 2026",
    "likes": 5,
    "views": "350",
    "url": "https://x.com/i/status/2061423956796887499",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061423842510376960/vid/avc1/480x270/-ySQKIxoXHjECAQT.mp4?tag=27"
    ],
    "round_first_seen": 11,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061358643921256682",
    "author": "priyank_766",
    "text": "MiniMax has officially launched MiniMax M3, a new multimodal foundation model\n\n• 1M token context window \n• Text, image &amp; video inputs\n• Strong coding + tool use capabilities\n• Built for long-horizon agentic workflows\n• Powered by MiniMax Sparse architecture",
    "created_at": "Mon Jun 01 08:06:13 +0000 2026",
    "likes": 0,
    "views": "48",
    "url": "https://x.com/i/status/2061358643921256682",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 12,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061308162238488964",
    "author": "news_g0d",
    "text": "$NVDA, MSFT, LI | Nvidia Press Release\nimpact 🚨🚨 high · Bullish 🟢🟢\n\nNVIDIA introduced Cosmos 3, an open-source multimodal foundation model designed to accelerate physical AI development for robotics and autonomous vehicles.",
    "created_at": "Mon Jun 01 04:45:37 +0000 2026",
    "likes": 0,
    "views": "81",
    "url": "https://x.com/i/status/2061308162238488964",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 12,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061254906435002598",
    "author": "intheworldofai",
    "text": "🚨 MiniMax M3 is now available!\n\nMiniMax has officially launched MiniMax M3, a new multimodal foundation model with:\n\n• 1M token context window 🤯\n• Text, image & video inputs\n• Strong coding + tool use capabilities\n• Built for long-horizon agentic workflows\n• Powered by MiniMax Sparse architecture\nThe race for ultra-long context AI models is heating up fast. 👀",
    "created_at": "Mon Jun 01 01:14:00 +0000 2026",
    "likes": 204,
    "views": "13877",
    "url": "https://x.com/i/status/2061254906435002598",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJsMU2rWcAMPZlz.jpg"
    ],
    "round_first_seen": 12,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2060898680010420468",
    "author": "CSVisionPapers",
    "text": "A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and Deblurring\n\nAdina Scheinfeld, Haotan Zhang, Shang Mu, Rudolf L. M. van Herten, Lucas Stoffl, …\nhttps://t.co/NHNGvp7GrZ [𝚌𝚜.𝙲𝚅 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙻𝙶]",
    "created_at": "Sun May 31 01:38:29 +0000 2026",
    "likes": 0,
    "views": "49",
    "url": "https://x.com/i/status/2060898680010420468",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 12,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061060122302906833",
    "author": "Compute27996514",
    "text": "I’m keeping a close eye on @mynebrowser because it’s finally shifting the power away from extractive Big Tech. Merging Agentic AI like Adam &amp; Eve with true data ownership and a crypto reward model is exactly how the next-gen internet should look. Stoked to see where this journey https://t.co/wakQLBIgQ6",
    "created_at": "Sun May 31 12:20:00 +0000 2026",
    "likes": 1,
    "views": "13",
    "url": "https://x.com/i/status/2061060122302906833",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJpbQAYWUAIDDsL.jpg"
    ],
    "round_first_seen": 13,
    "topic_first_seen": "agentic reward model"
  },
  {
    "id": "2061363360860438897",
    "author": "Catchingtides",
    "text": "更值得传播的是另一个结论：同一个模型，只换 harness，分数最多能差 `18` 个点。  \n\n这说明 2026 年讨论 agent 能力，单报模型名已经不够了。你还得同时问：跑在哪个 harness？接了什么 tools？memory 怎么做？eval protocol 是什么？",
    "created_at": "Mon Jun 01 08:24:58 +0000 2026",
    "likes": 0,
    "views": "7",
    "url": "https://x.com/i/status/2061363360860438897",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 15,
    "topic_first_seen": "agent harness eval"
  },
  {
    "id": "2061337472768815533",
    "author": "AIDailyGems",
    "text": "Good eval candidate: give it one bug, one refactor, and one failing test before trusting it. A Rust-first autonomous AI agent runtime and CLI code editor. Built on SenAgentOS, it applies Harness Engineering to code engineering: orch\n\nhttps://t.co/PPkUOcf5R5",
    "created_at": "Mon Jun 01 06:42:05 +0000 2026",
    "likes": 0,
    "views": "41",
    "url": "https://x.com/i/status/2061337472768815533",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 15,
    "topic_first_seen": "agent harness eval"
  },
  {
    "id": "2061089125282034076",
    "author": "anti_trista",
    "text": "Today's research: agent eval is moving from model scores to harness audits. Final success can hide early stops, duplicate work, or memory drift. Builders need traces, verifiers, and rollback—not just a pass/fail demo. https://t.co/XoJJtjLtaI",
    "created_at": "Sun May 31 14:15:15 +0000 2026",
    "likes": 0,
    "views": "16",
    "url": "https://x.com/i/status/2061089125282034076",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 15,
    "topic_first_seen": "agent harness eval"
  },
  {
    "id": "2060970135620575429",
    "author": "RussWonsley",
    "text": "Built a small eval harness for my skill-seeking agent.\nNot testing whether it gives a fluent answer. \n\nTesting whether it knows:\n- when to use an existing skill\n- when it lacks a capability\n- when to request a new skill\n- when not to load a weak partial match\n- when to stop for safety or human approval\n- how to leave a full, auditable trace\n\nMost agent demos still optimize for “it completed the task.” Fewer show the agent recognizing what it can’t reliably do and handling it cleanly instead of guessing.",
    "created_at": "Sun May 31 06:22:25 +0000 2026",
    "likes": 2,
    "views": "67",
    "url": "https://x.com/i/status/2060970135620575429",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 15,
    "topic_first_seen": "agent harness eval"
  },
  {
    "id": "2061332943117902073",
    "author": "lobehub",
    "text": "MiniMax M3 is now live on LobeHub 🚀\n\nWe’ve been testing it internally, and the early results look seriously promising💥💥💥\n\n🧠 Strong reasoning\n⚡️ Fast responses\n👀 Multimodal inputs\n🤖 Built for coding & agent workflows\n📚 1M context window\n\nFull benchmarks and comparisons are coming soon 📊\n\nYou can try M3 on LobeHub today：）",
    "created_at": "Mon Jun 01 06:24:06 +0000 2026",
    "likes": 11,
    "views": "874",
    "url": "https://x.com/i/status/2061332943117902073",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJtTLm_bUAE4ZoT.jpg"
    ],
    "round_first_seen": 16,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061323645935665212",
    "author": "tonbistudio",
    "text": "@Teknium If anyone wants to see M3 in action in a Hermes Agent, I did a first look video, giving the new model a couple different tasks that tested its coding, reasoning, multimodal, research, and analysis abilities. Check it out!\n\nhttps://t.co/WKgIhqQ5NH",
    "created_at": "Mon Jun 01 05:47:09 +0000 2026",
    "likes": 6,
    "views": "384",
    "url": "https://x.com/i/status/2061323645935665212",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 16,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061163408414982159",
    "author": "NikhilLamba6",
    "text": "Hi all, I thought of a multimodal project couple of months back. Where the agent should be able to follow the env of agentic harness and mimic how computer use agents work.\n\nShould utilize: memory, ReAct reasoning loop, tool, context, etc.\n\nConstraint: no use of Langchain, CrewAI, AutoGen, DeepAgents or any of these utility libraries.\nJust raw next token prediction and python.\n\nIt's always good to use skills, but we should be knowing under the hood stuff as well, right?\n\nI have used a screen parsing tool for this: MSFTResearch (huge kudos)  OmniParser so that model knows what it is looking into and could decide the clicks and scrolls.\n\nWanna take a look?\nGithub: https://t.co/a8LGDKOwMO\nHave attached a video and loom link below:\nhttps://t.co/ULSV3uq7GA\n\nWill be glad to hear your thoughts.\n\n#DeepLearning #agenticAI #machinelearning",
    "created_at": "Sun May 31 19:10:25 +0000 2026",
    "likes": 12,
    "views": "1198",
    "url": "https://x.com/i/status/2061163408414982159",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061162966536568832/vid/avc1/480x270/qRitAykkhxf9vZyp.mp4?tag=27"
    ],
    "round_first_seen": 16,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061279115144249461",
    "author": "sky_bolt20907",
    "text": "Most people underestimate how much latency matters in Voice AI.\n\nUsers don't notice if your model is 5% smarter.\n\nThey immediately notice a 1-second pause.\n\nTraditional voice pipelines often stack:\n\n• Audio upload\n• STT processing\n• LLM reasoning\n• TTS generation\n• Audio playback\n\nResult: 1.5-4s response times.\nWith WebRTC + streaming architectures (LiveKit, etc.), audio moves continuously instead of waiting for complete requests.\n\nWe've seen end-to-end latency drop into the few-hundred millisecond range.\n\nThe difference isn't a benchmark.\n\nIt feels like the difference between talking to software and talking to a person.",
    "created_at": "Mon Jun 01 02:50:12 +0000 2026",
    "likes": 3,
    "views": "83",
    "url": "https://x.com/i/status/2061279115144249461",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJsiYfDb0AAyuqt.jpg"
    ],
    "round_first_seen": 17,
    "topic_first_seen": "audio reasoning model"
  },
  {
    "id": "2061256280677728307",
    "author": "AdValoremGP",
    "text": "EP12: AI Phone Agents for Small Business — Recover Missed Calls\n\nIn early May 2026, OpenAI shipped GPT-Realtime-2 — the first voice model with GPT-5-class reasoning, 232-millisecond first-audio latency, and mid-call tool use. Zillow tested it on their ha… https://t.co/GDWKc2zRHg",
    "created_at": "Mon Jun 01 01:19:28 +0000 2026",
    "likes": 0,
    "views": "22",
    "url": "https://x.com/i/status/2061256280677728307",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 17,
    "topic_first_seen": "audio reasoning model"
  },
  {
    "id": "2061437360991031356",
    "author": "TolokaAI",
    "text": "High-quality training and eval data shouldn't take weeks to procure.\nWe've built a catalog of off-the-shelf datasets across RL & agentic, coding, robotics, STEM, and reasoning. \n\nAvailable to order directly, most shipping within 48 hours.\n\nSamples available on request. Any dataset can be expanded or adapted to your setup.\n\nBrowse the catalog: https://t.co/rmy2hwaAD3",
    "created_at": "Mon Jun 01 13:19:01 +0000 2026",
    "likes": 2,
    "views": "103",
    "url": "https://x.com/i/status/2061437360991031356",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 18,
    "topic_first_seen": "agentic RL training"
  },
  {
    "id": "2061013195968614651",
    "author": "ParikhRajesh07",
    "text": "@teortaxesTex If one thinks just frontier AI sure. But there are more paths to intelligence than frontier RL. We will move towards verticalization of training by role/task/domain and agentic RL and self learning drive the next growth.",
    "created_at": "Sun May 31 09:13:32 +0000 2026",
    "likes": 0,
    "views": "284",
    "url": "https://x.com/i/status/2061013195968614651",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 18,
    "topic_first_seen": "agentic RL training"
  },
  {
    "id": "2060996184190181634",
    "author": "lookmeintheai",
    "text": "Most agentic RL evals may be lying.\n\nSingle-turn RL can look great. But once the model calls tools mid-rollout, the training loop can quietly break.\n\nThat’s the real trap: we’re not just training answers anymore, we’re training messy workflows.\n\nHow many “agent gains” are actually eval plumbing?",
    "created_at": "Sun May 31 08:05:56 +0000 2026",
    "likes": 0,
    "views": "26",
    "url": "https://x.com/i/status/2060996184190181634",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 18,
    "topic_first_seen": "agentic RL training"
  },
  {
    "id": "2060890478518915455",
    "author": "hsu_steve",
    "text": "Underappreciated: The impact of high quality data on LLM training. \n\nPretraining data: mostly cleaned and curated by humans, but in some cases by big models using lots of inference tokens.\n\nBut for agentic post-training, need environments to RL tool use, actions, etc.\n\nNot sure about $10-15 billion per lab - only the 3 big US labs could afford that.",
    "created_at": "Sun May 31 01:05:54 +0000 2026",
    "likes": 33,
    "views": "5515",
    "url": "https://x.com/i/status/2060890478518915455",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 18,
    "topic_first_seen": "agentic RL training"
  },
  {
    "id": "2061454788223549632",
    "author": "ashmrz10",
    "text": "It connects language, images, video, audio, and actions in one unified model for perception, generation, world simulation, action prediction, and robot policy generation. https://t.co/wLpb9yvuT0",
    "created_at": "Mon Jun 01 14:28:16 +0000 2026",
    "likes": 0,
    "views": "11",
    "url": "https://x.com/i/status/2061454788223549632",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061454518013878273/vid/avc1/420x270/AkC6McGUXo7IAVfx.mp4?tag=14"
    ],
    "round_first_seen": 21,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061432585390346377",
    "author": "kimmonismus",
    "text": "3/ Inputs and outputs span text, image, video, audio AND action.\n\nThat last one is the big deal. Cosmos 3 was trained natively to generate actions, so the same checkpoint can run as a vision-language model, a video world model, or a robot policy. No multi-model orchestration. https://t.co/PoLsK33ytZ",
    "created_at": "Mon Jun 01 13:00:02 +0000 2026",
    "likes": 12,
    "views": "1543",
    "url": "https://x.com/i/status/2061432585390346377",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061432531967479809/vid/avc1/478x270/MsgZ1jWr5kMPw5hL.mp4?tag=14"
    ],
    "round_first_seen": 21,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061434074510880871",
    "author": "kidtsang",
    "text": "NVIDIA just dropped Cosmos 3 — an open world foundation model for physical AI combining vision reasoning, world generation & action prediction in ONE system. First fully open omnimodel with native multimodal generation. Physical AI just got a major upgrade. #AI #NVIDIA #PhysicalAI",
    "created_at": "Mon Jun 01 13:05:57 +0000 2026",
    "likes": 0,
    "views": "10",
    "url": "https://x.com/i/status/2061434074510880871",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJuvXfLaIAANOlY.jpg"
    ],
    "round_first_seen": 22,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061489564280295840",
    "author": "YogeshBalaji95",
    "text": "We have released Cosmos3! Large-scale training is extremely challenging, and our team has been working tirelessly over the past several months to make this possible. Cosmos3 is an omni-model that combines reasoning and generation of images, video, audio, and actions in a single unified model. It is a leading model on a range of generation and reasoning benchmarks. And the best part - it is fully open source. Please try it out and let us know your feedback.\n\nHuggingFace: https://t.co/5RIEz8ER2v\nProject Website: https://t.co/yeXt4oQIX6\nCode: https://t.co/D9JIJguVzL",
    "created_at": "Mon Jun 01 16:46:27 +0000 2026",
    "likes": 1,
    "views": "10",
    "url": "https://x.com/i/status/2061489564280295840",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJvhW6JacAAhTt3.jpg"
    ],
    "round_first_seen": 27,
    "topic_first_seen": "audio reasoning model"
  },
  {
    "id": "2061501577270325489",
    "author": "AITrailblazerQ",
    "text": "📍 Cosmos 3 turns world-model leaderboards into an AI infrastructure map\n\nNVIDIA taking #1 in open-weight text-to-image and image-to-video matters less as a trophy than as a workload signal.\n\nPhysical AI does not scale like chat. A model that fuses language, image, video, audio, and action pushes demand toward simulation, synthetic data, robot policy training, video inference, and closed-loop evaluation. Those workloads are wider, heavier, and less forgiving than text tokens.\n\nSecond-order read: the scarce asset shifts from “who has the best model demo?” to “who can convert capex into powered, usable, high-utilization compute fast enough?”\n\nThat is where the market signal becomes measurable. Watch payments for property, plant, and equipment against operating cash flow. Watch data-center commitments against available power. Watch benchmark gains migrate into workloads that need continuous video and action generation, not just prompt response.\n\nWorld models are capability news on the surface.\n\nUnderneath, they are demand-formation events for GPUs, power contracts, cooling, networking, and Physical AI deployment capacity.\n\nBenchmarks show what is possible. Capex and power show who can scale it.",
    "created_at": "Mon Jun 01 17:34:11 +0000 2026",
    "likes": 1,
    "views": "66",
    "url": "https://x.com/i/status/2061501577270325489",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 31,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061506641120641494",
    "author": "Alibaba_Qwen",
    "text": "👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation.\n\n✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks\n✅ Versatile coding agent & productivity assistant with full-modality input\n✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA\n✅ Cross-harness generalization across diverse agent frameworks\n\nOne model. Sees, thinks, codes, acts.🙌🙌\n\nNow available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎\n\n🔗🔗⬇️⬇️\nBlog：https://t.co/pVYf0h3NNa\nQwen Studio：https://t.co/HUYgFW4cYf\nAPI：https://t.co/viL0cXrMzW",
    "created_at": "Mon Jun 01 17:54:18 +0000 2026",
    "likes": 146,
    "views": "2834",
    "url": "https://x.com/i/status/2061506641120641494",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJvxXDCbsAAUdpK.jpg"
    ],
    "round_first_seen": 32,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061484194866335893",
    "author": "oleg_kai",
    "text": "@_lopopolo the rebrand is honest. prompt is a side effect of the runtime now. trajectories are state, harness is the scheduler, the only durable artifact is the eval that catches drift. agent eng is distributed systems with a softer fail mode.",
    "created_at": "Mon Jun 01 16:25:07 +0000 2026",
    "likes": 1,
    "views": "15",
    "url": "https://x.com/i/status/2061484194866335893",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 35,
    "topic_first_seen": "agent harness eval"
  },
  {
    "id": "2061524272422461656",
    "author": "eddiboi",
    "text": "Alibaba just launched Qwen3.7-Plus! A multimodal agent model that unifies vision and language for one versatile foundation. It handles hybrid GUI + CLI ops, full-modality coding, visual reasoning, grounding, and search-augmented QA in a single model. Sees, thinks, codes, acts. 🎉\n\nhttps://t.co/ly9FZzdUh1\nhttps://t.co/lhgD7amfVV",
    "created_at": "Mon Jun 01 19:04:22 +0000 2026",
    "likes": 0,
    "views": "1",
    "url": "https://x.com/i/status/2061524272422461656",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJwBU8GWUAA7Tf4.jpg"
    ],
    "round_first_seen": 36,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061523002924077540",
    "author": "ainews_24_7",
    "text": "LAUNCH: Alibaba $BABA introduced Qwen3.7-Plus, a multimodal agent model for vision, language, coding and productivity tasks.\n\nThe company says it can operate across GUI and CLI workflows, with visual reasoning, grounding and search-augmented QA.\n\nNow available via API.",
    "created_at": "Mon Jun 01 18:59:19 +0000 2026",
    "likes": 0,
    "views": "11",
    "url": "https://x.com/i/status/2061523002924077540",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 36,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061520027191546106",
    "author": "naviera101",
    "text": "Alibaba just dropped Qwen3.7 Plus.\n\nThis multimodal agent fuses vision and language into one model. It handles GUI screens and CLI commands seamlessly for coding, productivity work, and image reasoning.\n\nIt shines on multimodal benchmarks and agent tasks, ranking strong in vision arenas while powering full perception-reasoning-action loops.",
    "created_at": "Mon Jun 01 18:47:30 +0000 2026",
    "likes": 0,
    "views": "5",
    "url": "https://x.com/i/status/2061520027191546106",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 36,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061515639773413477",
    "author": "Luna_nime",
    "text": "Qwen3.7-Plus: Multimodal Agent Intelligence\n\nKey capabilities:\n• Visual Agent — perception, reasoning, grounding, search-augmented QA\n• GUI/CLI operation — reads screens, operates interfaces, navigates apps\n• Cross-harness — works with Claude Code, OpenClaw, Qwen Code\n• Coding agent — frontend prototyping to complex software engineering\n\nBenchmark wins vs Opus-4.6 Max, DeepSeek-V4-Pro Max, K2.6 Thinking:\n• Terminal-Bench 2.0: 70.3 (🥇 #1)\n• Deep-Planning: 62.3 (🥇 #1, next best is 58.9)\n• QwenWorldBench: 62.1 (🥇 #1)\n• Kernel Bench L3: 98% pass rate (🥇 tied #1)\n• MRCR-v2 128k: 91.7 (🥇 #1)\n• SpreadSheetBench: 86.3 (🥈 #2)\n• GPQA Diamond: 90.3 (competitive)\n• MCP-Mark: 58.7 (🥇 #1)\n\nAvailable via Alibaba Cloud Model Studio API.\n\nhttps://t.co/UBs39ryy8J",
    "created_at": "Mon Jun 01 18:30:04 +0000 2026",
    "likes": 1,
    "views": "37",
    "url": "https://x.com/i/status/2061515639773413477",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061515509917691904/vid/avc1/480x270/zOC3ZGtPC0HdUHhG.mp4?tag=27"
    ],
    "round_first_seen": 36,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061501791506997470",
    "author": "SaniAiTech",
    "text": "@Trae_ai @MiniMax_AI Great to see MiniMax-M3 expanding across TRAE 🚀 Native multimodal reasoning and stronger agent capabilities open up exciting possibilities for developers. 👏",
    "created_at": "Mon Jun 01 17:35:02 +0000 2026",
    "likes": 0,
    "views": "36",
    "url": "https://x.com/i/status/2061501791506997470",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 36,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061499279470911751",
    "author": "ingliguori",
    "text": "How to build AI agents from scratch (9 steps):\n\n1. Purpose & scope\n\n2. I/O schemas\n\n3. System instructions\n\n4. Reasoning + tools\n\n5. Multi-agent orchestration\n\n6. Memory & context\n\n7. Multimodal\n\n8. Structured outputs\n\n9. UI / API\n\nShip agents that do work, not just talk. 🤖⚡️\nWhere are you on this roadmap?\n\nCredit: @getintoai\n#AIAgents #AgenticAI #GenAI #LLM",
    "created_at": "Mon Jun 01 17:25:03 +0000 2026",
    "likes": 1,
    "views": "90",
    "url": "https://x.com/i/status/2061499279470911751",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJvqq0XXkAAl7Ia.jpg"
    ],
    "round_first_seen": 36,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061503065543647614",
    "author": "lmsysorg",
    "text": "Cosmos 3 is now supported in SGLang-Diffusion.\n\nCosmos 3 is NVIDIA’s open world model family for Physical AI, combining vision reasoning, world generation, and action-oriented multimodal modeling across text, images, video, audio, and actions.\n\nServe NVIDIA Cosmos3 generator models (Cosmos3-Nano, Cosmos3-Super, and specialized Super checkpoints) with native SGLang runtime and OpenAI-compatible APIs:",
    "created_at": "Mon Jun 01 17:40:06 +0000 2026",
    "likes": 17,
    "views": "1098",
    "url": "https://x.com/i/status/2061503065543647614",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJvs4KzWMAA1Sxb.jpg"
    ],
    "round_first_seen": 37,
    "topic_first_seen": "audio reasoning model"
  },
  {
    "id": "2061541148838203757",
    "author": "jtlin",
    "text": "Incredible that one open model can do all this. And possible to train a LoRA for the 16B on a single RTX PRO 6000 or DGX Spark.\n\n\"Cosmos 3 is the first omni-model for physical AI. It can understand and generate across: language · images · video · audio · action sequences\"",
    "created_at": "Mon Jun 01 20:11:26 +0000 2026",
    "likes": 0,
    "views": "12",
    "url": "https://x.com/i/status/2061541148838203757",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 41,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061528577519087821",
    "author": "mr_r0b0t",
    "text": "\"Can one model understand the physical world deeply enough to reason about it, simulate it, and generate actions inside it? Cosmos 3 is the first omni-model for physical AI. It can understand and generate across: language · images · video · audio · action sequences\"",
    "created_at": "Mon Jun 01 19:21:28 +0000 2026",
    "likes": 2,
    "views": "135",
    "url": "https://x.com/i/status/2061528577519087821",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 41,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061519535698780669",
    "author": "one3dsoul",
    "text": "Here's what happened when an \"ambient listening scribe\" was used in the operating room of o woman's reconstructive surgery.\nAn 'ambient listening scribe' is an audio recording version of an LLM (large language model AI)\nhttps://t.co/0Wg0PJ55Xx https://t.co/QkrqXiMdYs",
    "created_at": "Mon Jun 01 18:45:33 +0000 2026",
    "likes": 0,
    "views": "8",
    "url": "https://x.com/i/status/2061519535698780669",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJv9FFPbYAAcWb1.jpg"
    ],
    "round_first_seen": 41,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061510147906617458",
    "author": "AliceLuo576009",
    "text": "Cosmos 3 is out 🚀🚀🚀— a meaningful milestone for Physical AI.\n\nCosmos 3 is an omnimodal world model built to understand and generate language, images, video, audio, and actions.\n\nI’m proud to have contributed on the data infrastructure side, helping deliver pretrain, midtrain, and SFT data.\n\nThis project reinforced something I deeply believe: frontier AI is not built by model architecture alone. The data layer — validation, versioning, quality, reliability, and iteration speed — is part of the model-building process.\n\nFor Physical AI, this matters even more. To build models that can understand, simulate, and act in the world, we need data systems that support both scale and trust.\n\nGrateful to the team, and excited to see what the community builds with Cosmos 3.",
    "created_at": "Mon Jun 01 18:08:14 +0000 2026",
    "likes": 0,
    "views": "10",
    "url": "https://x.com/i/status/2061510147906617458",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061510036401106944/vid/avc1/480x270/yT4xA86O6sc9q_1B.mp4?tag=27"
    ],
    "round_first_seen": 41,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061542355375903022",
    "author": "krigrv",
    "text": "Alibaba just dropped Qwen 3.7-Plus.\n\nA multimodal agent model that unifies vision and language into one foundation. GUI and CLI operation across visual and text tasks. Coding agent. Productivity assistant.\n\nOne model, multiple modes.\n\nCoding agents are getting better fast.\n\nWhat's the best agent model you've tested this month?\n\nhttps://t.co/Erqj4VFpBG\n\n#AI #Qwen #Multimodal #CodingAgents",
    "created_at": "Mon Jun 01 20:16:13 +0000 2026",
    "likes": 0,
    "views": "6",
    "url": "https://x.com/i/status/2061542355375903022",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 42,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061524711000207468",
    "author": "Ali_TongyiLab",
    "text": "Introducing Qwen3.7-Plus, the latest flagship addition to the Qwen3.7 model series. \n\nBuilt to bridge the gap between visual perception and terminal execution, it serves as a versatile foundation to power your diverse multimodal agent workflows. 🚀\n\nKey highlights:\n\n• Multimodal Interactive Hybrid Agent: Enables unified GUI & CLI operation across visual and text tasks.\n\n• Versatile Coding Agent & Productivity Assistant: Handles full-modality input to supercharge your daily productivity.\n\n• Visual Agent: Deepens   agent intelligence with advanced perception, reasoning, grounding, and search-augmented QA.\n\n• Cross-Harness Generalization: Delivers consistent, robust performance across diverse agent frameworks.\n\nThe next generation of Qwen3.7 family has arrived to support your AI agent workflows.",
    "created_at": "Mon Jun 01 19:06:07 +0000 2026",
    "likes": 92,
    "views": "2917",
    "url": "https://x.com/i/status/2061524711000207468",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJwBfAtaoAAKrr7.jpg"
    ],
    "round_first_seen": 42,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061520225066205396",
    "author": "Pawankalhans",
    "text": "Alibaba just dropped Qwen3.7-Plus.\n\nOne model that sees, thinks, \ncodes, and acts.\n\nGUI + CLI. Vision + Language.\nCoding agent + Productivity assistant.\n\nAll in one multimodal agent foundation.\n\nThe AI race just got more intense. 👀\n\n#Qwen #Alibaba #AI #Tech https://t.co/xe9mO4Ycgv",
    "created_at": "Mon Jun 01 18:48:17 +0000 2026",
    "likes": 0,
    "views": "24",
    "url": "https://x.com/i/status/2061520225066205396",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJv9tLTaYAAFE7i.jpg"
    ],
    "round_first_seen": 42,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061513119151931727",
    "author": "Elaina43114880",
    "text": "Qwen3.7-Plus(GA) is officially here! 🥳\n\nA multimodal agent model that unifies vision and language into one versatile foundation.\n\nIt features multimodal hybrid interaction (GUI + CLI), versatile coding & productivity capabilities, strong visual agent abilities, and excellent cross-harness generalization.\n\nOne model. Sees, thinks, codes, acts. \n\nBig model releases two days in a row! The June wave of model releases is really heating up.... 🌊👀",
    "created_at": "Mon Jun 01 18:20:03 +0000 2026",
    "likes": 9,
    "views": "316",
    "url": "https://x.com/i/status/2061513119151931727",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 42,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061568160050884647",
    "author": "itsnicholash",
    "text": "AI launches today worth clocking:\n\n• NVIDIA Cosmos 3: an omnimodal world model for physical AI across language, images, video, audio, and actions.\nSource: \n\n• Project Eden from VAST AI Research and Tripo: a persistent multiplayer world model where state and rendering are separated so environments can keep evolving.\nSource: \n\n• TripoSplat: MIT-licensed image-to-3D Gaussian generation for assets, AR/VR, games, and simulation.\nSource: \n\nPattern: AI creative tools are moving toward persistent systems, usable assets, and agent-ready environments.",
    "created_at": "Mon Jun 01 21:58:46 +0000 2026",
    "likes": 0,
    "views": "20",
    "url": "https://x.com/i/status/2061568160050884647",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJwpUA1WIAAz4g4.jpg"
    ],
    "round_first_seen": 51,
    "topic_first_seen": "audio language model"
  },
  {
    "id": "2061553642017046574",
    "author": "GenAISpotlight",
    "text": "🤖 𝗔𝗹𝗶𝗯𝗮𝗯𝗮 𝗗𝗿𝗼𝗽𝘀 𝗤𝘄𝗲𝗻𝟯.𝟳-𝗣𝗹𝘂𝘀: 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗔𝗴𝗲𝗻𝘁 𝗧𝗼𝗽𝘀 𝗧𝗲𝗿𝗺𝗶𝗻𝗮𝗹-𝗕𝗲𝗻𝗰𝗵 𝗮𝘁 𝟳𝟬.𝟯%\n\nAlibaba launched Qwen3.7-Plus, a multimodal foundation agent model that seamlessly blends visual perception and terminal execution in one loop.\n\nWhy it matters:\n\n🖥 Visual GUI & CLI Hybrid: Reads screens, operates GUIs, writes code from visual references, and navigates mobile apps end-to-end.\n\n🏆 Leaderboard Topping: Hits 70.3% on Terminal-Bench 2.0, beating Anthropic Opus 4.6 (65.4%) and DeepSeek-V4-Pro (67.9%).\n\n🧠 Deep Planning: Pushes the complex Deep-Planning benchmark to 62.3%, beating Opus at 58.9%.\n\n🔌 Cross-Harness: Works consistently across Claude Code, OpenClaw, and Qwen Code.\n\n💰 The pricing for Qwen3.7-Plus is: Input Tokens: $2.50 per 1 million tokens / Output Tokens: $7.50 per 1 MTok (Cached Input: $0.25 per 1 MTok, which is a 90% discount for repeated long-context calls.)\n\nAvailable today via API on Alibaba Cloud Model Studio.\n\nQwen Blog\n\n#Alibaba #Qwen #ModelRelease #AIAgents\n\n───\n🤖 𝗙𝗼𝗿 𝗺𝗼𝗿𝗲 𝗔𝗜 𝗻𝗲𝘄𝘀 𝗮𝗻𝗱 𝘀𝘁𝗼𝗿𝘆 𝘀𝗼𝘂𝗿𝗰𝗲𝘀, 𝘀𝗲𝗮𝗿𝗰𝗵 \"𝗚𝗲𝗻𝗔𝗜𝗦𝗽𝗼𝘁\" 𝗼𝗻 𝗧𝗲𝗹𝗲𝗴𝗿𝗮𝗺",
    "created_at": "Mon Jun 01 21:01:04 +0000 2026",
    "likes": 0,
    "views": "6",
    "url": "https://x.com/i/status/2061553642017046574",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJwcHIGWsAEGazK.jpg"
    ],
    "round_first_seen": 52,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061568473629700235",
    "author": "mustafaergisi",
    "text": "@arcprize Spent last weekend hand-pruning agent traces for my eval harness. 1.5% on ARC-AGI-3 at $10K resets the cost-per-signal math I was working from.",
    "created_at": "Mon Jun 01 22:00:00 +0000 2026",
    "likes": 0,
    "views": "70",
    "url": "https://x.com/i/status/2061568473629700235",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 55,
    "topic_first_seen": "agent harness eval"
  },
  {
    "id": "2061591235526303909",
    "author": "ng_thanh8",
    "text": "Qwen3.7-Plus has been launched as a unified multimodal agent model that combines vision and language into one flexible system.\n\nIt supports:\n- GUI and CLI tasks across image and text workflows\n- Multimodal coding and productivity assistance\n- Image perception, reasoning, grounding, and search-augmented QA\n- Cross-framework agent generalization\n\nOne model. See, think, code, and act.\n\nNow available via API on Alibaba Cloud Model Studio.\nBlog: https://t.co/IZkLSRDrra\nQwen Studio: https://t.co/XW8ULtST9Y\nAPI: Alibaba Cloud Model Studio",
    "created_at": "Mon Jun 01 23:30:27 +0000 2026",
    "likes": 0,
    "views": "5",
    "url": "https://x.com/i/status/2061591235526303909",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJw-H6oa8AAiLmK.png"
    ],
    "round_first_seen": 56,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061584977397989606",
    "author": "eddiboi",
    "text": "NVIDIA just launched Cosmos 3! Omni-model for physical AI hits #1 across 8+ open leaderboards in reasoning video gen and robot actions. Single model does language images audio actions with dual-tower arch fully open. 👀🎉\n\nhttps://t.co/1Xl1CjmHKy\nhttps://t.co/HGfOp0hCTY https://t.co/fie2tPcoxU",
    "created_at": "Mon Jun 01 23:05:35 +0000 2026",
    "likes": 0,
    "views": "59",
    "url": "https://x.com/i/status/2061584977397989606",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJw4dZkWgAA1msg.png"
    ],
    "round_first_seen": 57,
    "topic_first_seen": "audio reasoning model"
  },
  {
    "id": "2061596191297839380",
    "author": "scholarpulse",
    "text": "Alibaba just dropped Qwen3.7-Plus to unify vision and language into a single multi-modal agent foundation that currently dominates major agentic and visual benchmarks. 🚀\n\n#Qwen37Plus #AlibabaQwen #AIAgents #LLM #TechNews",
    "created_at": "Mon Jun 01 23:50:09 +0000 2026",
    "likes": 1,
    "views": "26",
    "url": "https://x.com/i/status/2061596191297839380",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 59,
    "topic_first_seen": "multi-modal LLM agent"
  },
  {
    "id": "2061616994286530643",
    "author": "Awesome_AI_News",
    "text": "Alibaba launches Qwen3.7-Plus, a next-gen multimodal agent model, accelerating domestic LLM evolution toward embodied intelligence and advanced agents. It inherits Qwen3.7's text capabilities, achieves multimodal breakthroughs, and provides a core foundation for edge and complex workflow applications.....\n\n阿里推出新一代多模态智能体模型Qwen3.7-Plus，标志着国产大模型向具身智能与高级智能体演进加速。该模型继承Qwen3.7强大文本处理能力，在多模态领域实现技术飞跃，为端侧与复杂工作流应用提供核心底座迭代。",
    "created_at": "Tue Jun 02 01:12:49 +0000 2026",
    "likes": 1,
    "views": "13",
    "url": "https://x.com/i/status/2061616994286530643",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 62,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061590646591472022",
    "author": "alibaba_cloud",
    "text": "👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation.\n\n✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks\n✅ Versatile coding agent & productivity assistant with full-modality input\n✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA\n✅ Cross-harness generalization across diverse agent frameworks\n\nOne model. Sees, thinks, codes, acts.🙌🙌\n\nNow available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎\n\n🔗🔗⬇️⬇️\nBlog：https://t.co/uN7ALtg4Jl\nQwen Studio：https://t.co/NoHPKdsy1t\nAPI：https://t.co/4X4WDI446Y",
    "created_at": "Mon Jun 01 23:28:07 +0000 2026",
    "likes": 16,
    "views": "807",
    "url": "https://x.com/i/status/2061590646591472022",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJw9JQAbQAAoj0z.jpg"
    ],
    "round_first_seen": 62,
    "topic_first_seen": "multimodal foundation model"
  },
  {
    "id": "2061626040301285790",
    "author": "MetaEraHK",
    "text": "🤖 Alibaba Qwen Launches Qwen3-Plus\n\n@Alibaba_Qwen introduced Qwen3-Plus, a multimodal agent model that combines vision, language and coding capabilities.\n\nThe model supports GUI and CLI tasks, visual reasoning, search-augmented QA and agent workflows, and is now available via API on Alibaba Cloud Model Studio.",
    "created_at": "Tue Jun 02 01:48:45 +0000 2026",
    "likes": 0,
    "views": "169",
    "url": "https://x.com/i/status/2061626040301285790",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 66,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061619082089357343",
    "author": "Chinazhidx",
    "text": "Qwen3.7-Plus goes live!\n\nIt independently built a vocabulary-learning app in 11hrs: 10k+ lines of code &amp; 1k+ tool calls, covering full-cycle R&amp;D and iteration.\n\nCapabilities: vision Agent, vision coding, browser automation, real-world perception &amp; multimodal reasoning #Qwen #LLM https://t.co/XRVqg66WfA",
    "created_at": "Tue Jun 02 01:21:06 +0000 2026",
    "likes": 1,
    "views": "54",
    "url": "https://x.com/i/status/2061619082089357343",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJxXkz_bwAA6BNP.jpg",
      "https://pbs.twimg.com/media/HJxXllmaAAAjWFt.jpg"
    ],
    "round_first_seen": 66,
    "topic_first_seen": "multimodal reasoning agent"
  },
  {
    "id": "2061508568893825281",
    "author": "HuggingPapers",
    "text": "Tencent just released Universal Audio Tokenizer on Hugging Face\n\nA compact single-codebook model\n\nthat uniquely combines general audio perception\n\nand linguistic alignment\n\nfor seamless Audio-LLM integration. https://t.co/5g7MTwqDIT",
    "created_at": "Mon Jun 01 18:01:58 +0000 2026",
    "likes": 32,
    "views": "2347",
    "url": "https://x.com/i/status/2061508568893825281",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJvzHevXAAAP8Dl.png"
    ],
    "round_first_seen": 67,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2059983037006295445",
    "author": "Ali_TongyiLab",
    "text": "We are honored to share that our Fun series speech models have secured three #1 rankings in the latest Artificial Analysis @ArtificialAnlys speech model evaluation.\n\n🏆 Fun-Realtime-AudioChat\n— Speech Reasoning (Big Bench Audio): 97.6%, Ranked #1\n— Conversational Dynamics (Full Duplex): 97.8%, Ranked #1\n\n🏆 Fun-Realtime-ASR\n— AA-WER Index: 1.8%, Ranked #1\nEnd-to-end real-time voice, built for devs shipping AI agents & voice assistants. Stable, low-latency, production-ready.\n\n🔗 Leaderboard: https://t.co/iW21kzBKwT\n\nBuild with Fun →\nFun-Audio-Chat (realtime model coming soon):\nhttps://t.co/T9RMI2o6sM\nFun-ASR:\nhttps://t.co/0YdRgRycaV",
    "created_at": "Thu May 28 13:00:03 +0000 2026",
    "likes": 173,
    "views": "9268",
    "url": "https://x.com/i/status/2059983037006295445",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJZQTr_acAEwXV8.jpg",
      "https://pbs.twimg.com/media/HJZQTsBaYAAr8mV.jpg",
      "https://pbs.twimg.com/media/HJZQTtMasAAumPZ.jpg",
      "https://pbs.twimg.com/media/HJZQTtTbgAASLUT.jpg"
    ],
    "round_first_seen": 67,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061557980886180253",
    "author": "SoundPapers",
    "text": "MindVoice: Reconstructing Intelligible Speech from Non-invasive Neural Signals with Pretrained Priors\n\nGuangyin Bao, Taiping Zeng, Jianfeng Feng, Xiangyang Xue\nhttps://t.co/vguZLwTCQB [𝚌𝚜.𝚂𝙳 𝚌𝚜.𝙰𝙸] https://t.co/iYXMd3yp6t",
    "created_at": "Mon Jun 01 21:18:19 +0000 2026",
    "likes": 0,
    "views": "4",
    "url": "https://x.com/i/status/2061557980886180253",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJwgDwXW4AIrWFz.png"
    ],
    "round_first_seen": 67,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061161037202588026",
    "author": "tldr_ai_papers",
    "text": "Key findings: Narrative Framing is a low-latency semantic threat, while Acoustic Best-of-N reveals strong audio-space vulnerabilities. Defenses trade off robustness with usability.",
    "created_at": "Sun May 31 19:01:00 +0000 2026",
    "likes": 0,
    "views": "32",
    "url": "https://x.com/i/status/2061161037202588026",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 67,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061358361975874016",
    "author": "rfkortrump",
    "text": "@HronyG @88BWCSTROKER14 @vancebwcgroyp @goebbels_j62403 @paranoidist_ First of all, killing your child isn't a right.\n2nd of all\nFree stuff isn't a right, rights are natural, their based on our reasoning, stuff you can do to yourself and property.\nSo free speech, to right to have property, etc\n\"Human rights\" are dumb, its neo liberalism.",
    "created_at": "Mon Jun 01 08:05:06 +0000 2026",
    "likes": 2,
    "views": "24",
    "url": "https://x.com/i/status/2061358361975874016",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 67,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2060242230632493274",
    "author": "ArxivSound",
    "text": "Yonggang Zhu, Liting Gao, Aidong Men, Wenwu Wang, \"COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings,\" https://t.co/Rk2ZR0iFuP",
    "created_at": "Fri May 29 06:09:59 +0000 2026",
    "likes": 3,
    "views": "492",
    "url": "https://x.com/i/status/2060242230632493274",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 67,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2060762955461922936",
    "author": "signalxsoul",
    "text": "@FatherPhi Recent text models have a knowledge cut off of August 2025. Voice mode is based on GPT 4 family architecture, so it's cut off is much earlier.",
    "created_at": "Sat May 30 16:39:10 +0000 2026",
    "likes": 16,
    "views": "3543",
    "url": "https://x.com/i/status/2060762955461922936",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 67,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061507332547482090",
    "author": "edgeaivision",
    "text": "Voice AI + immersive audio need more than a generic CPU. @Cadence’s #EVS26 talk covers HiFi iQ for smart home, mobile and automotive audio: DSPs built for low-power speech, AI and always-on audio workloads.\n\nhttps://t.co/px7x9IZLzP\n\n#VoiceAI #EdgeAI #AudioAI https://t.co/Z8kbwS8SJY",
    "created_at": "Mon Jun 01 17:57:03 +0000 2026",
    "likes": 0,
    "views": "22",
    "url": "https://x.com/i/status/2061507332547482090",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJvx-kIaEAAZbqv.jpg"
    ],
    "round_first_seen": 67,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2059711481047126224",
    "author": "sgurumur",
    "text": "Our RL + post training results. Thanks @baseten for collab :\n\n- overlapping confidence intervals between frontier model and a open source model was the least expected\n\n- With right harness, a 27B model with an iterative SFT can do legit end-to-end legal tasks (in some cases)\n\n- specialized post-training can push the cost/latency curve in a very different direction\n\n- some implicit learnings: models tend to make fewer grep calls, more full-document reading, more synthesis, more self-correction on legal work.\n\nBig takeaway for me is that agent performance is starting to look more like a systems + memory problem, not just a raw model scale problem.",
    "created_at": "Wed May 27 19:00:59 +0000 2026",
    "likes": 105,
    "views": "15076",
    "url": "https://x.com/i/status/2059711481047126224",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 68,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2059849677626085642",
    "author": "Besteuler",
    "text": "🚀 Meet Orbit: OFT-based RL infrastructure for stable, efficient post-training of trillion-parameter LLMs. \n\nOrbit can train 1T+ LLMs (e.g. Kimi-2.6, DeepSeek-V4-Pro) on a single GPU node (8xB200) with extremely small train-rollout gap!  \n\nCode: https://t.co/pyyOg6s7RQ\nBlog: https://t.co/Rc7S1zQUel\nBlog in Chinese: https://t.co/rvToBFG4Iq",
    "created_at": "Thu May 28 04:10:07 +0000 2026",
    "likes": 114,
    "views": "8315",
    "url": "https://x.com/i/status/2059849677626085642",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/tweet_video/HJYNhVMbcAAPVk-.mp4"
    ],
    "round_first_seen": 68,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2061438426994659753",
    "author": "Jackywine",
    "text": "今天的深度学习：\n\nAnthropic Academy 关于 Sub Agent的内容\n\n如果你也感兴趣，欢迎一起学\n\nhttps://t.co/ld13Otck8N https://t.co/JKSIC3Pt1F",
    "created_at": "Mon Jun 01 13:23:15 +0000 2026",
    "likes": 29,
    "views": "2911",
    "url": "https://x.com/i/status/2061438426994659753",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJuzJW8bkAAJAQo.jpg"
    ],
    "round_first_seen": 68,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2059323616009838703",
    "author": "billxbf",
    "text": "Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change. \n\nFind a problem, design the harness, and train your own agents! 🧵",
    "created_at": "Tue May 26 17:19:45 +0000 2026",
    "likes": 896,
    "views": "127329",
    "url": "https://x.com/i/status/2059323616009838703",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJQpRNeWUAAv4Ka.jpg"
    ],
    "round_first_seen": 68,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2059991239890776269",
    "author": "adithya_s_k",
    "text": "Introducing Repo2RLEnv\n\nTurn any repository into runnable, verifiable coding environments built from real PRs and commits for coding-agent evaluation or RL training\n\n&gt; uv pip install repo2rlenv https://t.co/nOHVATWcs6",
    "created_at": "Thu May 28 13:32:39 +0000 2026",
    "likes": 455,
    "views": "57648",
    "url": "https://x.com/i/status/2059991239890776269",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2059983919844675585/vid/avc1/404x270/nx-pMcXGhEOfWbvg.mp4?tag=27"
    ],
    "round_first_seen": 68,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2061014446646898823",
    "author": "guifav",
    "text": "Your \"RL-trained reasoner\" is mostly a sampler.\n\nRL post-training doesn't teach base LLMs new reasoning — it concentrates mass on traces the base already rates plausible. @aakaran31 + @du_yilun (Harvard, arXiv:2510.14901) showed you can match RL by sampling x ∝ p(x)^α — no training, no data, no verifier.\n\nCatch: p^α is intractable. Their MH sampler picks a uniform cut and resamples the suffix. But reasoning traces hide a few consequential decisions in thousands of mundane tokens — a uniform cut mostly rewrites the tail.\n\nNew from @felix_zhou_cfz + @AnayMehrotra (@Yale + @Stanford, arXiv:2605.30327, May 28 2026): Entropy-Cut Metropolis-Hastings. Same MH frame, same target. Cut where next-token entropy jumps: Δt = max(0, h_t − h_{t−1}).\n\nValidated: top-decile Δt cuts give 1.33× more edit distance + 1.36× more distinct answers than bottom-decile. Entropy spikes mark decisions; the tail executes.\n\nTheory: mixing O(k) in semantic decisions vs Ω(T/b₁) in tokens — exponential separation when k ≪ T.\n\nAcross MATH500 / HumanEval / GPQA / AIME26 on Qwen2.5-7B, Qwen3-8B-Base, Phi-4-mini:\n• Qwen2.5-7B MATH500: 35.9 → 71.9\n• Qwen3-8B HumanEval: 47.6 → 79.3\n• Qwen2.5-Math-7B AIME26: 4.8 → 13.1\nBeats Standard, Low-Temp, SMC, TMC, Uniform-Cut MH. pass@k diversity preserved.\n\nCost: zero. Entropies already in the forward pass. Drop-in inside @vllm_project.\n\nFor tech leaders: reasoning gains aren't gated by RL budget — they're gated by where your MCMC cuts.\n\nhttps://t.co/QcsFqZ8lAQ",
    "created_at": "Sun May 31 09:18:30 +0000 2026",
    "likes": 2,
    "views": "177",
    "url": "https://x.com/i/status/2061014446646898823",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJoxt0xXoAM4Mrc.png"
    ],
    "round_first_seen": 68,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2061438839093076282",
    "author": "nv_pavlichenko",
    "text": "For the post-training we do SFT + RL with a mix of environments. We decided to not fuse reasoning, so there are two models in the end: Instruct and Thinking.\n\nRL was absolutely unpredictable in the beginning. Instruct run went smoothly but thinking kept blowing up due to training/inference mismatch. Guys did a lot of tricks with GRPO to stabilize it",
    "created_at": "Mon Jun 01 13:24:53 +0000 2026",
    "likes": 19,
    "views": "1401",
    "url": "https://x.com/i/status/2061438839093076282",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJuyytIXoAAF9y2.png"
    ],
    "round_first_seen": 68,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2059245173012304231",
    "author": "mark_k",
    "text": "Strawberry was RL Post-Training + Chains of Thought at inference time. CoT was already well known at that point, and if OpenAI hadn't put these components together, someone else would have done it a little later, I think.",
    "created_at": "Tue May 26 12:08:02 +0000 2026",
    "likes": 37,
    "views": "5400",
    "url": "https://x.com/i/status/2059245173012304231",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 68,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2060466215739425146",
    "author": "googledevs",
    "text": "Congratulations to the category winners 👏\n\n🏅 The Live Agent: Bryen Param for drone-copilot \n🏅 Creative Storyteller: Jeremiah Somoine for Sankofa\n🏅 UI Navigator: Enaiho Uwas Paul and Aman Kumar Sah for Moonwalk\n🏅 Best Multimodal Integration & User Experience: David Li for Wand \n🏅 Best Technical Execution & Agent Architecture: Matthew Keats for https://t.co/Su43CODTJO\n🏅 Best Innovation & Thought Leadership: Yusuf Elnady for Rayan Memory",
    "created_at": "Fri May 29 21:00:02 +0000 2026",
    "likes": 22,
    "views": "2356",
    "url": "https://x.com/i/status/2060466215739425146",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 69,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2060354130867064933",
    "author": "AIDailyGems",
    "text": "Complete open-source AI collaboration suite and multi-agent platform featuring LLM orchestration, automation, and virtual assistants. Scales seamlessly from small deployments...\n\nI would benchmark this against Cursor, Claude Code, or\n\nhttps://t.co/elIWnFxjzR",
    "created_at": "Fri May 29 13:34:38 +0000 2026",
    "likes": 0,
    "views": "108",
    "url": "https://x.com/i/status/2060354130867064933",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 69,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061513100441121257",
    "author": "aisearchio",
    "text": "Qwen3.7-Plus is Alibaba's new multimodal agent model. It can see screens/videos, reason, code, and operate GUI + CLI workflows in one loop.\n\nhttps://t.co/o6IwwFmLuz https://t.co/hN3yRtDRpH",
    "created_at": "Mon Jun 01 18:19:58 +0000 2026",
    "likes": 54,
    "views": "2587",
    "url": "https://x.com/i/status/2061513100441121257",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061512982585335808/vid/avc1/480x270/IGqw_2ZAbCpHBmlu.mp4?tag=27"
    ],
    "round_first_seen": 69,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2060670628990960122",
    "author": "SciFi",
    "text": "Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification\n\nYaoyang Luo, Zhi Zheng, Ziwei Zhao, Tong Xu, Zhao Jielun, Wenjun Xue, Yong Chen, Enhong Chen\nhttps://t.co/AOo2GBrXLd [𝚌𝚜.𝙰𝙸] https://t.co/vZNpUl9IX8",
    "created_at": "Sat May 30 10:32:18 +0000 2026",
    "likes": 0,
    "views": "59",
    "url": "https://x.com/i/status/2060670628990960122",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJj5BHtWgAAR6up.png"
    ],
    "round_first_seen": 69,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2060722721516872020",
    "author": "NousResearch",
    "text": "Step 3.7 Flash is now free for 30 days via Nous Portal\n\nIt is a new MoE vision-language model focused on agent efficiency, coding, search, and multimodal workflows — and Hermes Agent users have been loving it, so thank you to @StepFun_ai for hooking them up! https://t.co/2GKPXQ8PKE",
    "created_at": "Sat May 30 13:59:17 +0000 2026",
    "likes": 1494,
    "views": "1251250",
    "url": "https://x.com/i/status/2060722721516872020",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2060720695945117696/vid/avc1/320x320/4La6AfMIfYLlvWyZ.mp4?tag=27"
    ],
    "round_first_seen": 69,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061405692674937344",
    "author": "Chinazhidx",
    "text": "1/5 🤖 MiniMax M3 Officially Launched\n\nPowered by the new MSA sparse attention architecture with a 1M-token ultra-long context window. \n\nThis native multimodal model excels at coding and agent tasks, supporting image/video input and desktop control.",
    "created_at": "Mon Jun 01 11:13:10 +0000 2026",
    "likes": 0,
    "views": "155",
    "url": "https://x.com/i/status/2061405692674937344",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 69,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2060589351760003442",
    "author": "Alacritic_Super",
    "text": "Java developers are entering their AI-agent era.\n\nAnd Alibaba is building one of the most ambitious open-source ecosystems for it.\n\nSpring AI Alibaba isn't just another LLM wrapper.\n\nIt's evolving into a full agentic AI framework for Java with support for multi-agent systems, workflow orchestration, MCP integrations, observability, evaluation pipelines and visual agent development tools.\n\nWhat's technically fascinating is the architecture.\n\nIt includes a DAG-based graph runtime for long-running stateful agents, an Agent Framework with context-engineering patterns, visual debugging tools, tracing systems, MCP management and integrations with Spring Boot, Nacos and enterprise infrastructure.\n\nRecent releases are pushing deeper into enterprise-grade agent development with support for Agentic workflows, multi-agent orchestration, evaluation systems and production-ready Java AI infrastructure.\n\nGitHub:\nhttps://t.co/k38qbJGXPq\n\nFollow @Alacritic_Super for more AI infrastructure, Java engineering & open-source breakthroughs 🚀",
    "created_at": "Sat May 30 05:09:20 +0000 2026",
    "likes": 0,
    "views": "99",
    "url": "https://x.com/i/status/2060589351760003442",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJivDWebgAAWOhJ.jpg",
      "https://video.twimg.com/tweet_video/HJivD5vawAA3JsF.mp4",
      "https://pbs.twimg.com/media/HJivDYybIAAsBLN.jpg"
    ],
    "round_first_seen": 69,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061508404024062329",
    "author": "acmeducation",
    "text": "...mode and multi-agent systems. Apply enterprise patterns and approaches for larger development teams. Analyze LLM capabilities to recognize language strengths and ethical boundaries. Resolve complex logic errors using automated detection and tracing techniques.",
    "created_at": "Mon Jun 01 18:01:19 +0000 2026",
    "likes": 1,
    "views": "64",
    "url": "https://x.com/i/status/2061508404024062329",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 69,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061641611625603236",
    "author": "AiDevCraft",
    "text": "@ai_database 項ごとの影響スコアをLLMに渡す設計が本質で、これは強化学習で言う「スカラー報酬→クレジット割り当て付きアドバンテージ」への進化と同じ筋ですね。試行錯誤コストが線形→対数オーダーに落ちるので、シンボリック回帰の探索空間爆発に対する正攻法だと思います。",
    "created_at": "Tue Jun 02 02:50:38 +0000 2026",
    "likes": 0,
    "views": "28",
    "url": "https://x.com/i/status/2061641611625603236",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 70,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2060137529228329419",
    "author": "sferik",
    "text": "She RLHF on my reward model til I align.",
    "created_at": "Thu May 28 23:13:57 +0000 2026",
    "likes": 0,
    "views": "99",
    "url": "https://x.com/i/status/2060137529228329419",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 70,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2060301558680580481",
    "author": "Memoirs",
    "text": "Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards\n\nYu Huang, Zihua Zhao, Zhaoxin Huan, Wanli Gu, Feng Hong, Xinmu Ge, Lin Yuan, Weichang Wu, Qiang Hu, Xiaolu Zhang, Jun Zhou, Jiangchao Yao\nhttps://t.co/krHAM8k0xg [𝚌𝚜.𝙻𝙶]",
    "created_at": "Fri May 29 10:05:44 +0000 2026",
    "likes": 1,
    "views": "130",
    "url": "https://x.com/i/status/2060301558680580481",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 70,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2059698240053039614",
    "author": "DJiafei",
    "text": "Thrilled to see TOPReward integrated into @LeRobotHF ! Since releasing this zero-shot approach, we've been blown away by community adoption — it's emerged as one of the top universal reward models, with applications reaching well beyond robotics.\n\nWith HF's support, we can't wait to see what people build with TOPReward next!",
    "created_at": "Wed May 27 18:08:22 +0000 2026",
    "likes": 35,
    "views": "2686",
    "url": "https://x.com/i/status/2059698240053039614",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 70,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2061627387226529829",
    "author": "web3nomad",
    "text": "the self-doubt is probably trained in. RLHF rewards hedging when the model is uncertain, and unsolved problems are maximum uncertainty. so the model learned: unknown territory = hedge aggressively. not a reasoning failure, a training signal problem. need to reward confident wrong attempts, not just cautious correct ones",
    "created_at": "Tue Jun 02 01:54:06 +0000 2026",
    "likes": 0,
    "views": "6",
    "url": "https://x.com/i/status/2061627387226529829",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 70,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2060482221245640709",
    "author": "HeMuyu0327",
    "text": "The multi-agent RL terrain is beautiful to watch, besides Claude's launch of Workflow. \n\nFor one, Kimi's RL work on \"agent swarm\" is a masterclass on reward design: three weighted reward terms covering output success, how many subagents there are, and how many of their subtasks are successful. \n\nRight now, this type of multi-agent RL is focusing on the scenario of one main agent + several subagents, and we are already now in the age of subagents calling subagents calling subagents in a nested loop.\n\nIt will be very interesting to see what kind of reward design we can come up with this \"recursive\" multi-agent or if it's ever needed. Kimi explicitly mentions in the report that since the reward for the subagents is hard to define, they only update the weights of the main agent.",
    "created_at": "Fri May 29 22:03:38 +0000 2026",
    "likes": 52,
    "views": "7060",
    "url": "https://x.com/i/status/2060482221245640709",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 70,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2061237890567303636",
    "author": "gavinz0228",
    "text": "Human values are diverse — so why does RLHF use a single static reward model? 📌 New work on in-context reward adaptation makes preference modeling robust to unseen preferences without retraining. #AI #RLHF https://t.co/1JmkePn19N",
    "created_at": "Mon Jun 01 00:06:23 +0000 2026",
    "likes": 0,
    "views": "3",
    "url": "https://x.com/i/status/2061237890567303636",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 70,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2061497707676496272",
    "author": "Philip_MIT",
    "text": "@yigitkkorkmaz @adcock_brett very cool! @yigitkkorkmaz - if interested, check out our new reward model (SOLE-R1) for online RL\n\njust announced it on X here: https://t.co/iuDAciaqDC",
    "created_at": "Mon Jun 01 17:18:48 +0000 2026",
    "likes": 0,
    "views": "35",
    "url": "https://x.com/i/status/2061497707676496272",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 70,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2059073124260335979",
    "author": "HuggingPapers",
    "text": "StepAudio 2.5: one model for speech recognition, synthesis, and live dialogue\n\nA unified audio-language foundation model that uses task-tailored RLHF to match or exceed specialized systems across ASR, text-to-speech, and real-time spoken interaction. https://t.co/YywHtuOAjc",
    "created_at": "Tue May 26 00:44:23 +0000 2026",
    "likes": 32,
    "views": "3078",
    "url": "https://x.com/i/status/2059073124260335979",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJNMF_pXsAAZPzb.jpg"
    ],
    "round_first_seen": 71,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2060021901234458958",
    "author": "ArtificialAnlys",
    "text": "Announcing AA-WER Streaming, our new benchmark measuring streaming Speech to Text models on accuracy and latency for voice agent use cases. Pareto optimal models on this new benchmark include those from Cartesia, ElevenLabs, and Deepgram\n\nStreaming Speech to Text (STT) powers real-time transcription in voice agents and live captioning, where models must balance accuracy against speed. Fast transcripts are especially important for keeping responses feeling natural and leaves more of the response-time budget for reasoning and tool calls. Accuracy also matters since transcription errors compound in downstream reasoning and speech generation.\n\nStreaming STT models transcribe audio as it is fed in, sharing outputs continuously, unlike offline (batch) models that process the entire file at once and are typically slower.\n\nWhat we measure:\nAA-WER Streaming reports Word Error Rate and latency together, measured from the moment end of speech is detected, with a Pareto line of increasing accuracy as time to transcript received increases. For direct comparability to offline models on accuracy, we test these streaming models on the same ~8 hours of audio as our offline benchmark, AA-WER v2.0: AA-AgentTalk, Earnings22-Cleaned-AA, VoxPopuli-Cleaned-AA.\n\nWe measure WER and latency as paired metrics at two points after Silero VAD-detected end of speech:\nFirst Final Transcription: WER is measured on the first final-denoted transcript returned after end of speech is detected. Latency is the time in seconds from end of speech to that final-denoted transcript. This is more useful for understanding performance as a standalone streaming transcription model, and for higher accuracy.\nFirst Partial Transcription: WER is measured on the first transcript-bearing event (partial or final) returned after end of speech is detected. Latency is the time in seconds from end of speech to that first transcript event. This is more useful for near instantaneous transcription for lower-accuracy tasks like responding to \"yes\" or \"no\" questions, or for speculative decoding.\n\nKey results:\n➤ Highest accuracy on Final after End of Speech: @Cartesia Ink-2 (semantic endpoints) at 3.59% WER, 0.21s latency, followed by ElevenLabs Scribe v2 Realtime (3.64%, 0.14s) and Cartesia Ink-2 (external endpoints) (3.66%, 0.09s)\n➤ Highest accuracy on First Partial after End of Speech: @ElevenLabs Scribe v2 Realtime at 3.65% WER, 0.13s latency, followed by Cartesia Ink-2 (external endpoints) (4.33%, 0.07s) and @AssemblyAI U3 Realtime Pro (4.46%, 0.47s)\n➤ Fastest transcription: @DeepgramAI Flux leads both Final and Partial at 0.020s and 0.019s respectively (both 7.36% WER). On Final, it's followed by @soniox_ai Realtime and Deepgram Nova-3 Realtime (both 0.06s); on First Partial, it’s followed by @NVIDIA Nemotron 3 ASR 80ms (0.04s) and Soniox Realtime (0.05s)\n\nCharts below include a Pareto frontier of accuracy vs. speed, so you can shortlist the models that best fit your latency constraints while still achieving high accuracy. See below for further detail ⬇️",
    "created_at": "Thu May 28 15:34:29 +0000 2026",
    "likes": 146,
    "views": "12506",
    "url": "https://x.com/i/status/2060021901234458958",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJaqF2BbEAASXvf.jpg"
    ],
    "round_first_seen": 71,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2060242024260161899",
    "author": "ArxivSound",
    "text": "Qingliang Meng, Yuqing Deng, Wei Liang, Limei Yu, Huizhi Liang, Tian Li, \"FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis,\" https://t.co/nQD6ibPwt1",
    "created_at": "Fri May 29 06:09:10 +0000 2026",
    "likes": 11,
    "views": "1113",
    "url": "https://x.com/i/status/2060242024260161899",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 71,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2061545919057903869",
    "author": "dbreunig",
    "text": "I'm sorry, but I just listened to this and it's bad.\n\n- Starts with a metaphor that doesn't map\n- Rephrases an article that would take ~3 min to read and ~4 minutes to speak into 3:29 of mangled flat speech.\n- Rephrasing added classic LLM slop like dramatic short, punctuating sentences and needless hype framing.\n\nJust plugging the original article into TTS would have been infinitely better.",
    "created_at": "Mon Jun 01 20:30:23 +0000 2026",
    "likes": 0,
    "views": "8",
    "url": "https://x.com/i/status/2061545919057903869",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 71,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2059143678061428760",
    "author": "tom_doerr",
    "text": "Curated survey of multimodal architectures organized by paradigm\n\nhttps://t.co/t8Zfh7pLlj https://t.co/J54b8Zampo",
    "created_at": "Tue May 26 05:24:44 +0000 2026",
    "likes": 25,
    "views": "3433",
    "url": "https://x.com/i/status/2059143678061428760",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJOMQxSXMAA7cWB.png"
    ],
    "round_first_seen": 72,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2059786093697044794",
    "author": "AINativeF",
    "text": "12. Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini\n\n🔑 Keywords: Multimodal embedding, zero-shot performance, contrastive learning, retrieval, Gemini Embedding 2\n\n💡 Category: Multi-Modal Learning\n\n🌟 Research Objective:\n   - Introduce Gemini Embedding 2, a multimodal embedding model that unifies representations for video, audio, image, and text data.\n\n🛠️ Research Methods:\n   - Implement large-scale contrastive learning in a multi-task, multi-stage training setup to improve embedding performance.\n\n💬 Research Conclusions:\n   - Achieved state-of-the-art performance on key embedding benchmarks, demonstrating superior zero-shot performance across specialized domains and a wide range of tasks.\n\n👉 Paper link: https://t.co/8xxDDrfam2",
    "created_at": "Wed May 27 23:57:28 +0000 2026",
    "likes": 0,
    "views": "36",
    "url": "https://x.com/i/status/2059786093697044794",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJXUiYDbUAAcfkP.jpg"
    ],
    "round_first_seen": 72,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2059661302260211759",
    "author": "ggg78g89",
    "text": "@fofrAI How great will be Omni pro if Omni flash is such a great model!",
    "created_at": "Wed May 27 15:41:35 +0000 2026",
    "likes": 0,
    "views": "169",
    "url": "https://x.com/i/status/2059661302260211759",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 72,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061075716289228839",
    "author": "rajkumar_rr",
    "text": "The Rise of Omnimodal AI: How Systems Like Gemini Could Transform Human–AI Interaction https://t.co/MW8EMEr0L1",
    "created_at": "Sun May 31 13:21:58 +0000 2026",
    "likes": 0,
    "views": "10",
    "url": "https://x.com/i/status/2061075716289228839",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 72,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2060681613353140597",
    "author": "LifeboatHQ",
    "text": "The universal theory of structure: a fundamental ontology for ontic structural realism https://t.co/UGXjHqxiRz",
    "created_at": "Sat May 30 11:15:56 +0000 2026",
    "likes": 0,
    "views": "38",
    "url": "https://x.com/i/status/2060681613353140597",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 72,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2060947646282608879",
    "author": "rajkumar_rr",
    "text": "What Is Gemini Omni? Google’s New Multimodal AI Explained (2026) https://t.co/MnDOh34PpJ",
    "created_at": "Sun May 31 04:53:04 +0000 2026",
    "likes": 0,
    "views": "21",
    "url": "https://x.com/i/status/2060947646282608879",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 72,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2060203644436525325",
    "author": "lagerskoy",
    "text": "GOOGLE JUST SHIPPED GEMINI OMNI AT I/O 2026 LAST WEEK AND THE ENTIRE VIDEO EDITING INDUSTRY HAS 12 MONTHS BEFORE ITS BUSINESS MODEL COLLAPSES. THIS IS NOT FUTURE TECH. THIS IS LIVE TODAY. CONVERSATIONAL VIDEO EDITING THROUGH NATURAL LANGUAGE WITH PHYSICS-AWARE COMPOSITING. MOST PEOPLE WILL NOT REALIZE WHAT JUST HAPPENED FOR 6 MORE MONTHS.\n\nHere's what Omni actually does.\n\nYou upload one video. You describe a change in plain English. The model executes the change at scene-aware fidelity. Add a lion to the floor that respects perspective and lighting. Make the curtains open while preserving the rest of the room. Change a plastic bottle to metal. Fill an empty bottle with water. Make lights flicker on a snap of your fingers.\n\nThe output is photorealistic. The geometry is correct. The lighting matches the scene. The physics are coherent.\n\nThis is not Veo 3 generating new video from text. This is Omni editing existing video through conversation. The distinction matters more than the tech press is processing.\n\nNow here's the math nobody is doing.\n\nA traditional VFX artist charges $150 to $400 per hour to composite a single complex element into existing footage. The work that goes into adding a CGI lion to a hotel room shot with correct lighting and perspective takes 20 to 40 hours of skilled labor. Total cost runs $3,000 to $16,000 per shot.\n\nThis pipeline executes the same shot in 90 seconds through natural language for the cost of a Google AI Premium subscription.\n\nThe freelance VFX community on Reddit is already arguing about whether this is real or marketing. The answer is both. The demos are real. The output quality is comparable to mid-tier post-production work. The marketing is also real because Google is positioning this as the consumer-facing video creation surface of the next decade.\n\nThis is the seventh creative compression of 2026.\n\nClaude Design collapsed the design layer. KIMI K2.6 collapsed the coding layer. Rodin Gen 2.5 collapsed the 3D asset layer. 21st dev plus Claude Code collapsed the landing page production layer. The Vision API plus Gemini plus Sora 2 stack collapsed the UGC video production layer. Section Store with AI Conversion Blocks collapsed the e-commerce CRO layer. Gemini Omni just collapsed the video post-production layer.\n\nSeven creative industries that defined entire freelance economies 18 months ago are now subscription stacks running under $500 a month combined.\n\nThe part most builders will miss.\n\nThe opportunity is not making videos for yourself. The opportunity is what happens when every solo creator has access to mid-tier VFX studio capability for $20 a month.\n\nA real estate agent can now produce property tour videos with cinematic lighting adjustments without hiring a film crew. A small business owner can produce product demonstration videos with photorealistic CGI element additions without paying a VFX studio. A teacher can produce educational content with complex visual demonstrations without learning After Effects.\n\nThe market floor just got lifted by 4 orders of magnitude.\n\nHere's what is actually happening across the entire creative stack in 2026.\n\nEvery layer of professional creative production that required specialized expertise 18 months ago is collapsing into conversational AI interfaces this year. Design. Code. 3D assets. Landing pages. UGC video. E-commerce optimization. Now video post-production.\n\nThe freelance economy built on selling these specialized skills is being repriced in real time. The clients who paid $5,000 for a UGC video. The brands who paid $15,000 for a landing page. The agencies who charged $25,000 for a product page redesign. The VFX shops who billed $10,000 per CGI shot. All of them are watching their margins compress while their clients ask why the AI version is not good enough.\n\nThe smart freelancers pivot from execution to strategy. The ones who try to compete on execution against AI lose every contract within 18 months.\n\nThe window for solo operators to capture the next wave of creative production is open right now.\n\nOpen Gemini. Upload one video. Describe one change in plain English. Watch what the next decade of creative production looks like.\n\nThen decide whether you are positioning to ship at this leverage or watching from the sidelines while others build the next category of creative business with tools that did not exist 30 days ago.",
    "created_at": "Fri May 29 03:36:40 +0000 2026",
    "likes": 73,
    "views": "9920",
    "url": "https://x.com/i/status/2060203644436525325",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2060203095100174336/vid/avc1/320x568/kn5kQf1x3zCCt9ps.mp4?tag=27"
    ],
    "round_first_seen": 72,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2060402294164713784",
    "author": "ChanPerco",
    "text": "Le temps passé des agences n’a plus de valeur.\n\nLes modèles de rémunération des agences évoluent progressivement vers des logiques de performance, de livrables ou d’abonnement. https://t.co/Z39IWo4cgx",
    "created_at": "Fri May 29 16:46:02 +0000 2026",
    "likes": 1,
    "views": "120",
    "url": "https://x.com/i/status/2060402294164713784",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJgE99CXsAEIQ7p.jpg"
    ],
    "round_first_seen": 73,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2059228776626860127",
    "author": "0xbeepit",
    "text": "The next generation of traders won’t trade manually, they’ll deploy agents.\n\nBeep is partnering with @BitgetWallet to onboard the next wave of Agent Traders.\n\n$35,000 in rewards is now live:\n\n• $30,000 user reward pool \n• $5,000 trading leaderboard\n• 2x Beeper Points multiplier for Bitget Wallet users\n\nUsers can now create their first AI trading agent on Beep for free through Bitget Wallet, deposit, and execute trades across AI Trading and Predict Agent.\n\nDeploy your first agent now: https://t.co/9J0iGFr7US",
    "created_at": "Tue May 26 11:02:53 +0000 2026",
    "likes": 539,
    "views": "60785",
    "url": "https://x.com/i/status/2059228776626860127",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJPZlUYacAEDRzX.jpg"
    ],
    "round_first_seen": 73,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2059291333496553797",
    "author": "Jia__Guo",
    "text": "Curious about the secret sauce behind our trillion-scale agentic foundation model? Here it comes!🥳 \n\nLast year, we released IcePop to stabilize MoE RL with double-sided masking. As we dive deeper, something unexpected happened: the masking ratio went down, while the training–inference mismatch continued to grow!😞\n\nThis year, we introduce 𝑲𝑷𝒐𝒑🪩, which replaces the fixed ratio constraint with the binary KL divergence to adaptively mask inappropriate tokens! The masking ratio adapts to fluctuations of the training–inference gap during training, keeping policy optimization stable and effective with long-horizon agentic RL rollouts. \n\nWith this simple change, it enables our Ring-2.6-1T to achieve over 76 on the SWE-bench-Verified with pure RL training! \n\nNo modifications to infrastructure. No routing replay. Just one parameter, power your agentic RL with 𝑲𝑷𝒐𝒑! \n\nClick to learn more about the details! \n\n📜Blog: https://t.co/uPu1gMg7ti",
    "created_at": "Tue May 26 15:11:28 +0000 2026",
    "likes": 228,
    "views": "32575",
    "url": "https://x.com/i/status/2059291333496553797",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJQKHYRaQAE8yPB.jpg"
    ],
    "round_first_seen": 73,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2061424412373475766",
    "author": "slimer48484",
    "text": "@murchiston I think less post is definitely a good thing, the extremely templated responses are fairly unpleasant as a user. but on the other than there's no stopping the massive RL pipelines that are needed to teach agentic harness use.",
    "created_at": "Mon Jun 01 12:27:33 +0000 2026",
    "likes": 2,
    "views": "11",
    "url": "https://x.com/i/status/2061424412373475766",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 73,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2060771871331623069",
    "author": "AIDailyGems",
    "text": "2026 swarm Agent 年，swarm Agent 、Agent team、 ai coding、skill、memory、evolve、agentic RL 等 AI Agent集合\n\nWould you choose this over Cursor/Claude Code for this part of the workflow?\n\nhttps://t.co/rineM9afzL",
    "created_at": "Sat May 30 17:14:36 +0000 2026",
    "likes": 1,
    "views": "149",
    "url": "https://x.com/i/status/2060771871331623069",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 73,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2060985670148223188",
    "author": "GpaAndy",
    "text": "@Amnaaach_ @Appreciators_IO this model boosts engagement and rewards active participation",
    "created_at": "Sun May 31 07:24:09 +0000 2026",
    "likes": 0,
    "views": "5",
    "url": "https://x.com/i/status/2060985670148223188",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 73,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2061285822461063200",
    "author": "AIDailyGems",
    "text": "Skill Reuse as Compression in Agentic RL\n\nWould this change how you build or evaluate agents?\n\nhttps://t.co/eG5VRr8NRU https://t.co/pNPsxFpjCu",
    "created_at": "Mon Jun 01 03:16:51 +0000 2026",
    "likes": 1,
    "views": "23",
    "url": "https://x.com/i/status/2061285822461063200",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJsoh5DXQAM5u2a.jpg"
    ],
    "round_first_seen": 73,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2059685409102446880",
    "author": "arcprize",
    "text": "A new ARC Prize 2026 - ARC-AGI-3\n\nKaggle notebook to help you get started with your first submission\n\nGet started in 3 make commands\n\nFirst $35K milestone prize (top score) will be awarded on June 30th, 2026 https://t.co/zz39tEEg2M",
    "created_at": "Wed May 27 17:17:23 +0000 2026",
    "likes": 96,
    "views": "5492",
    "url": "https://x.com/i/status/2059685409102446880",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJV46rQacAAgd6o.jpg"
    ],
    "round_first_seen": 73,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2061365575675515160",
    "author": "DataChaz",
    "text": "When SWE-agent got a 64% lift from interface changes, @rohit4verse's \"The Harness Is Everything\" (1.3M views) explained why.\n\nLife-Harness proves it again: 116/126 setups improved patching just the harness.\n\nFrozen models saw an 88.5% mean lift across 18 backbones.\n\nhow it works: https://t.co/5y8EHNMD67",
    "created_at": "Mon Jun 01 08:33:46 +0000 2026",
    "likes": 44,
    "views": "6577",
    "url": "https://x.com/i/status/2061365575675515160",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJtxEQ2bwAAAeJT.jpg"
    ],
    "round_first_seen": 74,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2060301471019659274",
    "author": "eliebakouch",
    "text": "subagents, teams of agents etc. will be first class citizens soon (if not already)\n\ntwo things here:\n1) you want to maximize token efficiency even more\n2) training/serving on your own harness gives you an even bigger boost than before\n\nbenchmarks in the opus 4.8 model card show that for now it's a latency vs cost tradeoff, but imo this will likely shift to intelligence/autonomy vs cost (think dynamic workflows or agent swarms). and for cost not to blow up too much, you need to maximize token efficiency even more\n\nwe'll also likely see huge gaps on more complex/autonomous benchmarks whether they use these features or not, a bit like when tool use was introduced. on those i'd expect third party harnesses to struggle to keep up with closed source models/harnesses\n\nthis is also a case for open source models (and maybe open harnesses like codex?). if you want deep control over this, doing your own RL to train the model in the environment you want it to operate in feels more important than ever",
    "created_at": "Fri May 29 10:05:23 +0000 2026",
    "likes": 78,
    "views": "7131",
    "url": "https://x.com/i/status/2060301471019659274",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJeiWbIWQAIB1ac.jpg"
    ],
    "round_first_seen": 74,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2061023379130593300",
    "author": "ds_wen_",
    "text": "Why Everyone Is Suddenly Building Their Own Agent Harness and why you should care\n\nhttps://t.co/Rju37UnzTf",
    "created_at": "Sun May 31 09:54:00 +0000 2026",
    "likes": 0,
    "views": "52",
    "url": "https://x.com/i/status/2061023379130593300",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 74,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2060039244501147928",
    "author": "kunchenguid",
    "text": "@ryancarson i do constantly switch between different harnesses\n\nthey are completely commoditized in my workflow. i can launch any agent and be about equally productive\n\nall tools i built can also work with any agent harness and any frontier model",
    "created_at": "Thu May 28 16:43:24 +0000 2026",
    "likes": 5,
    "views": "659",
    "url": "https://x.com/i/status/2060039244501147928",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 74,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2059498575810236581",
    "author": "affaan",
    "text": "in Shenzhen this week then HK,  giving a talk at ByteDance on harness engineering / meta harness / agent security and future of OSS; thanks to the BD Technology Community for reaching out and sponsoring + hosting me\n\nwould love to meet anyone from deepseek and/or other labs https://t.co/8urQuTuEmn",
    "created_at": "Wed May 27 04:54:58 +0000 2026",
    "likes": 15,
    "views": "4682",
    "url": "https://x.com/i/status/2059498575810236581",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJTPClEWwAI33yW.jpg"
    ],
    "round_first_seen": 74,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2060541178689556841",
    "author": "shawn_pana",
    "text": "I built an agent that turns anything on the internet into viral TikToks.  \n\nPowered by browser-harness running 24/7 in a VPS: \n> reads any post, thread, or webpage \n> creates a TikTok in the viral format I ask for \n> opens TikTok and uploads for me  \n\nZero hassle. Idea to post in minutes. \nTry it out ↓🔗",
    "created_at": "Sat May 30 01:57:54 +0000 2026",
    "likes": 84,
    "views": "19986",
    "url": "https://x.com/i/status/2060541178689556841",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2060540973240012800/vid/avc1/414x270/Mon9KSjXtplszE4B.mp4?tag=27"
    ],
    "round_first_seen": 74,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2059129111512338933",
    "author": "arvin17x",
    "text": "感谢 @webfansplz 拿 LobeHub 来验证 Rolldown v1 的大项目性能 🫡\n\n经过这几年迭代，我不得不说 LobeHub 可能是目前市面上最好的前端 infra 试金石：\n\n- 开源 toC 应用：完整的前后端架构，AI Agent Harness，代码规范程度堪比教科书 📒\n- 大规模 Codebase：100 万行 TypeScript 源码，65w 行业务代码 + 35w 行测试代码。dev/build/testing 全都是压力测试 🪨 我们的测试代码可能比一般项目的业务代码还多，妥妥的高质量 RL 环境 🤯\n- 最潮流的维护者们：Next 16、Vite 8、TypeScript 7 Preview 、Bun，我们永远第一批上车，紧跟行业前沿技术。这些年给上游生态提交的 issue 没有一百也有个七八十。\n\n真正的知名开源贡献者，敢 dogfood 市面上最激进的技术方案，崩在所有人前面，好让所有人的下一步走稳 😼",
    "created_at": "Tue May 26 04:26:51 +0000 2026",
    "likes": 90,
    "views": "17442",
    "url": "https://x.com/i/status/2059129111512338933",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 74,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2060733416551694417",
    "author": "nickbaumann_",
    "text": "Great read -- all it really takes is:\n\n- a harness\n- connectors to your data/tools\n- reliable, always-accessible agent(s)\n\nThe models have reached the inflection point where it's not more complicated than this https://t.co/lrj0I94FZb",
    "created_at": "Sat May 30 14:41:47 +0000 2026",
    "likes": 741,
    "views": "149280",
    "url": "https://x.com/i/status/2060733416551694417",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJkyHkRXIAEET6N.jpg",
      "https://pbs.twimg.com/media/HJkyHkgXAAYRgSQ.jpg",
      "https://pbs.twimg.com/media/HJkyHkPXAAYgUMX.jpg"
    ],
    "round_first_seen": 74,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2060198034076835860",
    "author": "Gsdata5566",
    "text": "@prpatel05 Exactly. The durable advantage is not just faster inference, but a closed loop where production traces become eval cases, eval failures become scoped fixes, and observability tells you whether the agent is actually improving.",
    "created_at": "Fri May 29 03:14:22 +0000 2026",
    "likes": 1,
    "views": "38",
    "url": "https://x.com/i/status/2060198034076835860",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 75,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060303555722965279",
    "author": "Gsdata5566",
    "text": "@Meet_644 Composable is the right direction. The hard part is making each layer observable enough that a failed agent run can point to the exact weak link: model, memory, tool, eval, or harness.",
    "created_at": "Fri May 29 10:13:40 +0000 2026",
    "likes": 1,
    "views": "28",
    "url": "https://x.com/i/status/2060303555722965279",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 75,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2059817003913769145",
    "author": "geoffreywoo",
    "text": "announcement:\n\ni have a confession: i trust agent traces more than most investor reference calls.\n\none shows how work actually gets done.\nthe other shows how well someone can perform institutional affection for 17 minutes.\n\nfounders building eval layers here should talk to me.",
    "created_at": "Thu May 28 02:00:17 +0000 2026",
    "likes": 38,
    "views": "3138",
    "url": "https://x.com/i/status/2059817003913769145",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 75,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060323764089258103",
    "author": "AIDailyGems",
    "text": "Put this in the agent eval pile if you care about practical workflow changes. Mobile client for Codex and Claude — control coding agents from your phone via WebSocket bridge\n\nhttps://t.co/nneXRTiHJR",
    "created_at": "Fri May 29 11:33:58 +0000 2026",
    "likes": 0,
    "views": "82",
    "url": "https://x.com/i/status/2060323764089258103",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 75,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2059941261268648395",
    "author": "Gsdata5566",
    "text": "@AlphaXAgentX LLM-as-judge is useful for regression, but weak for discovering shared blind spots. Agent eval stacks need adversarial cases, tool traces, human review, and production incident feedback.",
    "created_at": "Thu May 28 10:14:03 +0000 2026",
    "likes": 0,
    "views": "15",
    "url": "https://x.com/i/status/2059941261268648395",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 75,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2059384492523946158",
    "author": "cwolferesearch",
    "text": "more details can be found here for those who are interested: https://t.co/aDCtCVKDTP",
    "created_at": "Tue May 26 21:21:39 +0000 2026",
    "likes": 3,
    "views": "999",
    "url": "https://x.com/i/status/2059384492523946158",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 75,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2059812151041413184",
    "author": "hsu_steve",
    "text": "IIUC this is a new benchmark and the (real world) tasks aren't something that labs could have benchmaxxed already?\n\nIf so, is this a good IT agent eval?\n\nAnthropic lead seems pretty modest. Cf Qwen 3.7 and other Chinese models.",
    "created_at": "Thu May 28 01:41:00 +0000 2026",
    "likes": 23,
    "views": "3774",
    "url": "https://x.com/i/status/2059812151041413184",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 75,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060344969257177446",
    "author": "AIDailyGems",
    "text": "Put this in the agent eval pile if you care about practical workflow changes. MCP plugin for Unreal Engine 5.7 — gives AI assistants full read/write access to Blueprints, Materials, Niagara, Animation, Mesh, AI, GAS, Logic Driver, ComboGr\n\nhttps://t.co/aEGgME3faM",
    "created_at": "Fri May 29 12:58:14 +0000 2026",
    "likes": 0,
    "views": "53",
    "url": "https://x.com/i/status/2060344969257177446",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 75,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060048736357593279",
    "author": "HuggingPapers",
    "text": "NVIDIA's AXPO fixes tool collapse in agentic reasoning\n\nAn 8B Qwen3-VL model learns to keep using tools\nby freezing thought prefixes and resampling calls,\nbeating a 32B baseline across 9 multimodal benchmarks. https://t.co/1Ec31Esf2m",
    "created_at": "Thu May 28 17:21:07 +0000 2026",
    "likes": 28,
    "views": "1849",
    "url": "https://x.com/i/status/2060048736357593279",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJbDaJUW0AcOX0m.jpg"
    ],
    "round_first_seen": 76,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2060470519775043879",
    "author": "HuggingPapers",
    "text": "Multimodal inputs: text, image, and video\n\nUp to 262K context\n\nReady for vLLM on Hopper and Blackwell\n\nhttps://t.co/i1rFPjyufZ",
    "created_at": "Fri May 29 21:17:08 +0000 2026",
    "likes": 78,
    "views": "6713",
    "url": "https://x.com/i/status/2060470519775043879",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 76,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2059707137308369024",
    "author": "altiamkabir",
    "text": "DeepSeek got everyone talking about reasoning. SenseNova U1 is pushing another frontier: unified multimodal generation + understanding in open source. The full training codebase of SenseNova U1 is open sourced now too, which makes this release even more interesting. Worth an upvote.\n\nhttps://t.co/SlmtdlbbRf",
    "created_at": "Wed May 27 18:43:43 +0000 2026",
    "likes": 41,
    "views": "7222",
    "url": "https://x.com/i/status/2059707137308369024",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 76,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2061206239980257665",
    "author": "che_shr_cat",
    "text": "2/ In \"The Thinking Pixel\" Yuwei Sun, Yuxuan Yao, Hui Li, and Siyu Zhu solve the alignment bottleneck in diffusion models. \n\nInstead of massive feedforward passes, they embed recursive reasoning directly into multimodal latent spaces.",
    "created_at": "Sun May 31 22:00:37 +0000 2026",
    "likes": 2,
    "views": "382",
    "url": "https://x.com/i/status/2061206239980257665",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 76,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2060433104247173188",
    "author": "gmi_cloud",
    "text": "@zubiqo as we said, this is only a multimodal comparison, not for reasoning, vision grounding, and latency. Please do share your own comparison and we would love to see those!",
    "created_at": "Fri May 29 18:48:27 +0000 2026",
    "likes": 0,
    "views": "24",
    "url": "https://x.com/i/status/2060433104247173188",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 76,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2061275242710929899",
    "author": "ElliotTalksTech",
    "text": "@dev_maims I agree Gemini isn't that good at coding, but it also isn't that bad.\n\nIf ChatGPT and Claude were down it isn't that bad of a backup.\n\nGemini was mainly made with the goal of being multimodal and having broad reasoning.",
    "created_at": "Mon Jun 01 02:34:49 +0000 2026",
    "likes": 1,
    "views": "416",
    "url": "https://x.com/i/status/2061275242710929899",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 76,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2060760708829282411",
    "author": "astropol0",
    "text": "Opus 4.8 ⤵️\nBest coding performance\nHighest intelligence &amp; reasoning\nMost honest/direct answers...\n\nGPT-5.5 ⤵️\nBetter creativity\nFaster responses\nStrong multimodal capabilities (images + video)\nMore fun and daily casual conversations...\n\nWhat do u choose? https://t.co/KKad51uCIv",
    "created_at": "Sat May 30 16:30:14 +0000 2026",
    "likes": 3,
    "views": "861",
    "url": "https://x.com/i/status/2060760708829282411",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJlK8YXakAA280C.jpg"
    ],
    "round_first_seen": 76,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2060242283912642766",
    "author": "ArxivSound",
    "text": "Heejoon Koo, Yoon Tae Kim, Miika Toikkanen, June-Woo Kim, \"Mitigating Stethoscope-Induced Shortcuts in Respiratory Sound Classification under Federated Domain Generalization with Causality-Inspired Interventions,\" https://t.co/VSbyyo3MTO",
    "created_at": "Fri May 29 06:10:12 +0000 2026",
    "likes": 0,
    "views": "287",
    "url": "https://x.com/i/status/2060242283912642766",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 77,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061513256796074180",
    "author": "amritamaz",
    "text": "Full-duplex speech models handle overlapping conversational, but consider pausing briefly in a sentence: an audio-only agent may think you are done and start speaking. But with video, the agent has more context: your shifted gaze or pose indicates that you're still speaking. https://t.co/3RRNpLLGB7",
    "created_at": "Mon Jun 01 18:20:36 +0000 2026",
    "likes": 0,
    "views": "99",
    "url": "https://x.com/i/status/2061513256796074180",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJvywKbWEAAckzS.jpg"
    ],
    "round_first_seen": 77,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061134616803647671",
    "author": "CosmicBH",
    "text": "@kamaalkun @BDipitous exactly and simply from seeing your previous work i identified that easily how many people who have their opinions about ai usage actually watch kamaal?\n\nthe point is it’s simple peer pressure of a hate train and no one has given good reasoning just basic ai hate speech",
    "created_at": "Sun May 31 17:16:01 +0000 2026",
    "likes": 1,
    "views": "30",
    "url": "https://x.com/i/status/2061134616803647671",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 77,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2060242097673048550",
    "author": "ArxivSound",
    "text": "Yanze Xu, Wenwu Wang, Mark D. Plumbley, \"Explainable AI in Speaker Recognition -- Making Latent Representations Understandable,\" https://t.co/honznDIp51",
    "created_at": "Fri May 29 06:09:28 +0000 2026",
    "likes": 2,
    "views": "406",
    "url": "https://x.com/i/status/2060242097673048550",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 77,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061568097979633829",
    "author": "ypatil125",
    "text": "Enjoyed joining @apoorv03 for MS&E435 to talk about where frontier model training is headed!\n\nThe most interesting work right now sits at the boundary between evals, RL environments, post-training, and systems that learn from real production feedback. Powerful methodologies to train highly performant models are here. A lot of model progress comes down to defining the hill you want to climb.\n\nWe are very focused on pushing the frontier on post-training and continual learning at Applied Compute. If that is interesting to you, come join us!",
    "created_at": "Mon Jun 01 21:58:31 +0000 2026",
    "likes": 26,
    "views": "3868",
    "url": "https://x.com/i/status/2061568097979633829",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 78,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2061533200694874454",
    "author": "clthegoat",
    "text": "Orbit enables RL post-training for trillion-parameter LLMs on a single GPU node, with an extremely small train-rollout gap. A big step toward making post-training for frontier models accessible!",
    "created_at": "Mon Jun 01 19:39:51 +0000 2026",
    "likes": 15,
    "views": "1537",
    "url": "https://x.com/i/status/2061533200694874454",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 78,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2059521687465377818",
    "author": "SzymonOzog_",
    "text": "Laguna tech report is out!\n\nStarting from random weights and getting to a highly capable agent requires problem solving across all layers of the stack\n\nRead all about our model factory, pre-training, post training and RL. As well as how we mix data, perform quantization and evaluate the model",
    "created_at": "Wed May 27 06:26:49 +0000 2026",
    "likes": 42,
    "views": "2978",
    "url": "https://x.com/i/status/2059521687465377818",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 78,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2061375472362197436",
    "author": "vivek_2332",
    "text": "read the VPO (vector policy optimization) paper today, from @RyanBoldi @MIT_CSAIL. a rethink of what RL post-training should optimize for when test-time search is downstream. my notes:\n\n1. Problem \n-> LLMs increasingly run inside inference-time search: draw many samples, pick the best. search is already doing the exploiting, so training should focus on exploring. \n-> but standard post-training maxes a fixed scalar reward, and GRPO collapses the policy onto whichever single strategy tops that scalar. the distribution sharpens, every extra sample comes out a near-duplicate, and the diversity that search runs on is gone right when you need it.\n\n2. Method \n-> two ingredients: (a) multi-answer chains, emit m candidates in one rollout so each attends to the previous and steers into uncovered regions, (b) stochastic scalarization, sample w ~ Dir(1) and score the set by best-of-m under each w they have incentive to differ. \n-> multi-answer + fixed scalar still collapses since the gradient pushes every position toward the same peak. \n\n3. Results \n-> across 4 domains VPO matches or beats scalar baselines on best@k.\n-> inside evolutionary search it cracks hard problems GRPO can't solve at any budget.\n\n4. Thoughts \n-> the ultra feedback result is the most honest part. VPO only helps when the reward components actually compete. when they're near-collinear it collapses and loses to GRPO. so the real condition for it working is \"how non-collinear is your reward,\" which is narrower than the framing lets on.\n-> adv is broadcast across the whole chain, so the model never learns which of its answers earned the reward. the specialization just comes from the set-level signal over many updates.",
    "created_at": "Mon Jun 01 09:13:05 +0000 2026",
    "likes": 55,
    "views": "3761",
    "url": "https://x.com/i/status/2061375472362197436",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJt5JkoawAA0aD5.jpg",
      "https://pbs.twimg.com/media/HJt5MGEaoAAB56C.png",
      "https://pbs.twimg.com/media/HJt5PgLaEAAmiwU.png"
    ],
    "round_first_seen": 78,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2060836995048591416",
    "author": "willccbb",
    "text": "@eliebakouch wait so the RL rollout viewer is the same as the eval rollout viewer? does this mean evals and environments are the same thing?? and people can go from evals to post-training with a single command???",
    "created_at": "Sat May 30 21:33:22 +0000 2026",
    "likes": 35,
    "views": "1890",
    "url": "https://x.com/i/status/2060836995048591416",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 78,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2061506818917244937",
    "author": "Alibaba_Qwen",
    "text": "Demo2: Multimodal Interactive Hybrid Agent https://t.co/GYGgYjeoyJ",
    "created_at": "Mon Jun 01 17:55:01 +0000 2026",
    "likes": 36,
    "views": "11118",
    "url": "https://x.com/i/status/2061506818917244937",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061506734678773760/vid/avc1/480x270/ngp7jnNzsYfGmlcS.mp4?tag=27"
    ],
    "round_first_seen": 79,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2060149124117475791",
    "author": "StepFun_ai",
    "text": "⚡️ Step 3.7 Flash is here: The new frontier is agent efficiency.\n\n#1 ClawEval-1.1 (67.1), #1 SimpleVQA Search (79.2), #2 SWE-PRO (56.3), 95.3 on V* Python. Open weights under Apache 2.0.\n\nBuilt for agentic, coding, search, and multimodal workflows — balancing speed, cost, and reliable execution.\n\n- 400 TPS. 198B sparse MoE, ~11B active. 256K context, 3 reasoning levels.\n- Understands UIs, charts, docs, images — then writes code or calls tools to act on what it sees.\n- Web + visual search reaches further: more sources, deeper follow-up.\n- Reliable tool use — less drift, fewer broken toolcalls. 98%+ on τ²-bench across all difficulty levels.\n- Works with Claude Code, KiloCode, Hermes Agent, OpenClaw, and protocols like MCP.\n- Runs locally on Mac Studio M4 Max, DGX Spark, AMD AI Max+ 395.\n\nGitHub: https://t.co/kqlZkVIRHv\nHuggingFace: https://t.co/qqceCrgPiw\nGGUF: https://t.co/rR6XrnymWG\nModelScope: https://t.co/wney6Tzvqy\nAPI: https://t.co/RvHWzRG7Fu\nBlog: https://t.co/BxDiajiQ5G",
    "created_at": "Fri May 29 00:00:01 +0000 2026",
    "likes": 1501,
    "views": "323998",
    "url": "https://x.com/i/status/2060149124117475791",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJcdzQdbcAAcQki.jpg"
    ],
    "round_first_seen": 79,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2060601492160327826",
    "author": "Chinazhidx",
    "text": "Gamma-World, a new generative multi-agent world model from Tsinghua &amp; NVIDIA, can generate synchronized 4-player gameplay from only 2-player training data🔥\n\nIt enables action-responsive generation at 24 FPS and achieves SOTA results across 5 multiplayer Minecraft scenarios. #LLM https://t.co/WiFENTHrZB",
    "created_at": "Sat May 30 05:57:34 +0000 2026",
    "likes": 4,
    "views": "176",
    "url": "https://x.com/i/status/2060601492160327826",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2060600658580189184/vid/avc1/810x270/fTANthBBJ_zDiDE8.mp4?tag=14",
      "https://pbs.twimg.com/media/HJi5enuakAAILA2.jpg",
      "https://pbs.twimg.com/media/HJi5ffuaEAAPUYw.jpg"
    ],
    "round_first_seen": 79,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061506731671457800",
    "author": "Alibaba_Qwen",
    "text": "Demo1：Multimodal Interactive Hybrid Agent https://t.co/e5hzk8bKo2",
    "created_at": "Mon Jun 01 17:54:40 +0000 2026",
    "likes": 44,
    "views": "7949",
    "url": "https://x.com/i/status/2061506731671457800",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061506652030005248/vid/avc1/480x270/_zMkQdc2DKl4Uyh3.mp4?tag=27"
    ],
    "round_first_seen": 79,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061094530989031616",
    "author": "bspectacledGOAT",
    "text": "deeper point is this-\nand this is known to you(i guess)\npretraining gives the model broad world-distribution competence; post-training bends that distribution toward desired behavior\n\nReinforcement learning is powerful because it can make latent abilities usable, but if the reward/judge/task mix is too narrow, it can also Goodhart the model into a policy that scores well while losing some base-model breadth, taste, calibration, or spontaneity.\n\nRecent RLHF literature explicitly treats reward over-optimization as a problem where policies exploit idiosyncrasies of the reward function and become less generalizable!https://t.co/dT96zxS8t4!",
    "created_at": "Sun May 31 14:36:44 +0000 2026",
    "likes": 0,
    "views": "14",
    "url": "https://x.com/i/status/2061094530989031616",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 80,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2061166139716100290",
    "author": "marcelolebre",
    "text": "What gets rewarded is leverage. The IC who deploys $50K in tokens and ships a product becomes more valuable than the one who burns $5K and ships nothing. People will get paid increasingly for what they orchestrate. Bounties on tickets, revenue share on features, hybrid base-plus-outcome contracts. The early versions are already in the wild. 8/10",
    "created_at": "Sun May 31 19:21:16 +0000 2026",
    "likes": 2,
    "views": "398",
    "url": "https://x.com/i/status/2061166139716100290",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 80,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2060390968101364143",
    "author": "SISLaboratory",
    "text": "A new must read SISL paper for anyone working on RLHF and reward modes. Check it out!",
    "created_at": "Fri May 29 16:01:01 +0000 2026",
    "likes": 3,
    "views": "262",
    "url": "https://x.com/i/status/2060390968101364143",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 80,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2060021917239914524",
    "author": "ArtificialAnlys",
    "text": "Full results: https://t.co/wDb6a2nhqV\nMethodology: https://t.co/ePPoyfUXXm\nWe’re hiring! https://t.co/uevWGOWgYm",
    "created_at": "Thu May 28 15:34:33 +0000 2026",
    "likes": 12,
    "views": "2903",
    "url": "https://x.com/i/status/2060021917239914524",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 81,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2059894369516175405",
    "author": "Suryanshti777",
    "text": "🤯Holy shit...Someone at OpenAI got tired of speech-to-text systems breaking the moment an accent changed…\nor a different language appeared…\nor background noise kicked in.\n\nSo they trained one model on 680,000 hours of real-world audio and quietly changed voice AI forever.\n\nThat model became Whisper.\n\nNot just transcription.\n\nWhisper can:\n\n→ detect languages automatically\n→ translate speech in real time\n→ handle messy audio most systems fail on\n→ transcribe across dozens of languages\n→ run locally on your machine with one command\n\npip install openai-whisper\n\nThat’s it.\n\nNo giant pipeline.\nNo separate models stitched together.\nNo enterprise-only nonsense.\n\nJust a single Transformer model replacing entire speech stacks.\n\nThe craziest part?\n\nMost people using AI products today have already heard Whisper working…\n\nwithout realizing it.\n\nIt became the invisible engine behind demos, agents, note apps, meeting copilots, subtitles, voice assistants, research tools, and thousands of AI startups overnight.\n\nSome open-source projects become tools.\n\nWhisper became infrastructure.\n\nLink in comments👇",
    "created_at": "Thu May 28 07:07:43 +0000 2026",
    "likes": 25,
    "views": "3056",
    "url": "https://x.com/i/status/2059894369516175405",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJY3AlhbgAE078f.jpg"
    ],
    "round_first_seen": 81,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2059664194363171050",
    "author": "Alacritic_Super",
    "text": "Most AI voice tools still focus on speech generation.\n\nOmniVoice Studio is pushing toward something MUCH bigger:\na full multimodal voice AI workspace.\n\nAnd honestly?\nThis is where conversational AI infrastructure is heading FAST.\n\nOmniVoice Studio combines speech synthesis, voice cloning, transcription and multimodal AI workflows into a unified open-source platform developers can run locally and customize.\n\nThe technical architecture is what makes it interesting.\n\nIt connects:\nLLMs\n\n+ speech models\n+ audio pipelines\n+ multimodal processing\n+ workflow orchestration\n\ninto one system for building AI-native voice applications.\n\nThat's a HUGE shift.\n\nFuture AI systems won't just chat through text.\n\nThey will operate through:\nvoice, audio, video and real-time multimodal interaction layers.\n\nAnd open-source ecosystems are accelerating that transition incredibly fast.\n\nGitHub:\nhttps://t.co/P7ZJDm5s3V\n\nFollow @Alacritic_Super for more AI infrastructure, multimodal systems & open-source breakthroughs 🚀",
    "created_at": "Wed May 27 15:53:05 +0000 2026",
    "likes": 1,
    "views": "57",
    "url": "https://x.com/i/status/2059664194363171050",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJVlq0ibQAAZrRN.jpg",
      "https://pbs.twimg.com/media/HJVlq2ZbcAAluIy.jpg"
    ],
    "round_first_seen": 82,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2059275767289119176",
    "author": "AiquestAcademy",
    "text": "Bolting vision encoders onto frozen LLMs is a dead end.\n\nThe future is Native Multimodal Modeling. This paper drops an industrial blueprint for building true NMMs.\n\nThe endgame: Multi-to-Multi architectures where understanding and generation coexist in one unified transformer. It breaks down the full stack:\n• Architectural nativity\n• Massive data curation\n• End-to-end training recipes\n\nProject demo page is available.\n#AI #MachineLearning",
    "created_at": "Tue May 26 14:09:37 +0000 2026",
    "likes": 0,
    "views": "46",
    "url": "https://x.com/i/status/2059275767289119176",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 82,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2061330003015418089",
    "author": "AipokusyCz",
    "text": "NVIDIA Cosmos 3: první otevřený omni-model pro fyzické AI uvažování. Zpracovává text, obraz, video i zvuk najednou pro robotiku a autonomní systémy. Dostupný na Hugging Face. #AI #Robotika https://t.co/YSOcTbUV9t",
    "created_at": "Mon Jun 01 06:12:25 +0000 2026",
    "likes": 0,
    "views": "21",
    "url": "https://x.com/i/status/2061330003015418089",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJtQtQjWEAkDpsY.jpg",
      "https://pbs.twimg.com/media/HJtQtfDXUAAvuPb.jpg",
      "https://pbs.twimg.com/media/HJtQtsAWUAQJ8A5.jpg"
    ],
    "round_first_seen": 82,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2060217838393282903",
    "author": "_reachsumit",
    "text": "SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search\n\nPresents an RL framework to model search boundaries and apply penalties, reducing unnecessary agentic searches.\n\n📝 https://t.co/oOjP3p49Wh\n👨🏽‍💻 https://t.co/eNvVKJiS1t",
    "created_at": "Fri May 29 04:33:04 +0000 2026",
    "likes": 2,
    "views": "408",
    "url": "https://x.com/i/status/2060217838393282903",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 83,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2060268392414941579",
    "author": "liulicheng10",
    "text": "great blog and it feels like theres 'phase transition' between single turn RL and agentic RL, the extrapolation is not that easy",
    "created_at": "Fri May 29 07:53:57 +0000 2026",
    "likes": 7,
    "views": "1386",
    "url": "https://x.com/i/status/2060268392414941579",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 83,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2059631384445235482",
    "author": "nrehiew_",
    "text": "SFT is 3 epochs of 40B tokens each, with a 85% focus on agentic trajectories. \n- They synthetically generate coding tasks from git commits and use teacher generated ground truth trajectories\n- They heavily focus on instruction following for the next RL stage. Instructions are synthetically generated and applied to the coding trajectories \n- Trajectories are generated across different harnesses OpenHands, OpenCode and Mini-SWE-Agent",
    "created_at": "Wed May 27 13:42:42 +0000 2026",
    "likes": 6,
    "views": "439",
    "url": "https://x.com/i/status/2059631384445235482",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 83,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2059409282886709445",
    "author": "GT_HaoKang",
    "text": "This is a qutie late update but every people working on agentic RL should aware that this repo released more than agentic pipeline itself. You can reverse how Anthropic's agentic SFT/RL strategy somehow.\nhttps://t.co/JESk73uDgK",
    "created_at": "Tue May 26 23:00:09 +0000 2026",
    "likes": 12,
    "views": "1016",
    "url": "https://x.com/i/status/2059409282886709445",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 83,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2059731516469436469",
    "author": "MKhalifaaaa",
    "text": "Very neat idea in the AlphaProof Nexus paper:\nTo convert binary signal of proof evaluation into a numeric, continuous reward, they used solution Elo scores which were put in context so the prover agent can differentiate good from excellent solution.\nhttps://t.co/7ZoByPtwab https://t.co/jyhWmBJrb8",
    "created_at": "Wed May 27 20:20:36 +0000 2026",
    "likes": 3,
    "views": "270",
    "url": "https://x.com/i/status/2059731516469436469",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJWinonWAAUiQjB.jpg"
    ],
    "round_first_seen": 83,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2061565702516183357",
    "author": "HenryL_AI",
    "text": "When you building a self-improving agent, two natural questions emerge: 𝐐𝟏: 𝐖𝐡𝐢𝐜𝐡 𝐦𝐨𝐝𝐞𝐥𝐬 𝐩𝐫𝐨𝐝𝐮𝐜𝐞 𝐭𝐡𝐞 𝐛𝐞𝐬𝐭 𝐡𝐚𝐫𝐧𝐞𝐬𝐬 𝐮𝐩𝐝𝐚𝐭𝐞𝐬? 𝐐𝟐: 𝐖𝐡𝐢𝐜𝐡 𝐦𝐨𝐝𝐞𝐥𝐬 𝐛𝐞𝐧𝐞𝐟𝐢𝐭 𝐦𝐨𝐬𝐭 𝐟𝐫𝐨𝐦 𝐡𝐚𝐫𝐧𝐞𝐬𝐬 𝐮𝐩𝐝𝐚𝐭𝐞𝐬?\n\nOur new paper has counter-intuitive answers to both: they decouple from model capability, in opposite ways.\n\nQ1 (who produces good updates): the updater's base capability barely matters.A 9B model (Qwen3.5) produces harness updates that match Claude Opus 4.6's. Best vs worst evolver gap ≤3.1pp.\n\nQ2 (who benefits most): non-monotonic.Mid-tier solvers benefit the most. Strong-tier hits ceiling. Weak-tier benefits LEAST despite the most headroom — failing at two layers: skill activation (25% vs ~96% for strong) and adherence drift across trajectory (~4x steeper).\n\nTested across 7 evolver models × 6 solver agents × 3 agentic benchmarks (SWE-bench Verified, MCP-Atlas, SkillsBench). \n\nImplication: don't pay frontier prices for both halves of the loop. Put capability budget on the agent (solver), not the evolver.",
    "created_at": "Mon Jun 01 21:49:00 +0000 2026",
    "likes": 38,
    "views": "4185",
    "url": "https://x.com/i/status/2061565702516183357",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJwVvEabIAE1Gbd.jpg",
      "https://pbs.twimg.com/media/HJwVyRfbkAANb4r.jpg"
    ],
    "round_first_seen": 84,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2059389241570754903",
    "author": "teortaxesTex",
    "text": "Honestly, looks about right, harness (mini-swe-agent) affinities aside\nKimi is the closest to a mature autonomous SWE agent out of open models\nDS is weak and needs handholding (though has isolated strengths like debugging)\na mark of good eval: stronger separation of top tier https://t.co/g73eF0Y6zE",
    "created_at": "Tue May 26 21:40:31 +0000 2026",
    "likes": 62,
    "views": "10035",
    "url": "https://x.com/i/status/2059389241570754903",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJRrD-sXsAAqhXM.jpg",
      "https://pbs.twimg.com/media/HJRrfGPWcAAWzIL.png"
    ],
    "round_first_seen": 84,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2059478729265664342",
    "author": "BohuTANG",
    "text": "agent 工具设计踩坑(model+harness 发现): Opus 4.6 训练数据里文件工具用 file_path，DeepSeek v4 Pro 用 path。如果你的 agent 工具层不做别名兼容，模型会反复用错参数名触发校验失败 -- 每次白白烧掉一个 turn + 几千 token，什么活都没干",
    "created_at": "Wed May 27 03:36:06 +0000 2026",
    "likes": 27,
    "views": "5186",
    "url": "https://x.com/i/status/2059478729265664342",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 84,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2059294269698199929",
    "author": "dair_ai",
    "text": "System scaling is the next real bottleneck in agentic AI.\n\nIf you build agent orchestration layers, this is a clean map of where the engineering leverage actually sits. The labs own the model. You own the harness, and that is increasingly where agent quality is won or lost.\n\nThe default mental model still puts all the weight on the foundation model. Bigger model, better agent. But agent behavior actually emerges from the whole stack around it. Memory substrate, context constructor, skill routing, orchestration loop, and the verification and governance layer.\n\nThis new research calls that stack the harness and argues we should treat it as a first-class object of design and evaluation. It names three core bottlenecks to scale. Context governance, trustworthy memory, and dynamic skill routing. It also ships CheetahClaws, a Python-native reference harness, and compares it with Claude Code and OpenClaw.\n\nPaper: https://t.co/HynpWFVUqq\n\nLearn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c",
    "created_at": "Tue May 26 15:23:08 +0000 2026",
    "likes": 106,
    "views": "7263",
    "url": "https://x.com/i/status/2059294269698199929",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJQVOX_bEAATe44.png"
    ],
    "round_first_seen": 84,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2060416571361890564",
    "author": "Aniket1836020",
    "text": "✨ Opportunity for research scientists at Stitch CA, USA\n\nStitch is building the most talent-dense research team. If you've built your own agent harness for fun or have run tens to hundreds of agents in parallel to solve coding tasks, you will feel right at home here.\n\nApply here \nhttps://t.co/acKBbnWmlj",
    "created_at": "Fri May 29 17:42:45 +0000 2026",
    "likes": 24,
    "views": "2242",
    "url": "https://x.com/i/status/2060416571361890564",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 84,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2061704322459800057",
    "author": "_reachsumit",
    "text": "Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses\n\nIntroduces a 20B search agent trained with RL inside a stateful harness that offloads bookkeeping to the environment.\n\n📝 https://t.co/lZiQcQLmQD\n👨🏽‍💻 https://t.co/xZoMEXAPZd",
    "created_at": "Tue Jun 02 06:59:49 +0000 2026",
    "likes": 0,
    "views": "16",
    "url": "https://x.com/i/status/2061704322459800057",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 85,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060403234825785840",
    "author": "novasarc01",
    "text": "i’m increasingly convinced that the best agent evals will come from mining real agent failure traces. my view is that every failed trace contains a potential eval but not in its raw form. raw traces are messy, long and too specific. the research problem is to distill them into clean reproducible tests. the pipeline i’m interested in is (which i'm currently working on):\n\nfailure trace → failure attribution → earliest divergence point → minimal reproducible state → targeted eval → regression suite\n\nthis turns trace data from passive observability into an active improvement loop. like can we extract the exact decision point where the agent should have behaved differently? and can we convert that into an eval that catches the same failure class in the future? i guess this matters because most agent failures are trajectory-level failures and not just output-level failures. \n\npersonally i think this is much more realistic than relying only on hand-written benchmarks (imo they should look more like failure memory systems). hand-written evals encode what we think agents will fail on. traces encode what agents actually failed on. also once you have the mechanism, you can mutate the trace into variants. that is basically fuzzing for agents.",
    "created_at": "Fri May 29 16:49:46 +0000 2026",
    "likes": 299,
    "views": "54443",
    "url": "https://x.com/i/status/2060403234825785840",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 85,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2059564676569076021",
    "author": "_philschmid",
    "text": "Interesting new SWE/agentic benchmark (DeepSWE) was released yesterday. 113 tasks across 91 repos in 5 languages. Here are interesting things I noticed:\n\n- The evaluation harness (mini-swe-agent) gives every model a single bash tool and the same SI. No vendor editing primitives.\n\n- Eval Prompts are shorter than SWE-Bench Pro, but require 5.5× more code and touch 7 files on average. The idea is to mimic how developers actually talk to agents, short behavioral descriptions, not verbose specs.\n\n- SI describes a specific workflow: find code, reproduce, fix, verify, edge cases, submit. This maps directly onto how the verifier grades, which could bias toward models that follow instructions literally over models that explore more.\n\n- The bash tool is guarded, outputs over 10k chars get truncated. Malformed tool calls get caught and retried with guidance rather than crashing. To prevent to blow up context.\n\n- Mini-swe-agent claims to match or beat 1P harnesses on the same tasks. Claude Opus scored +10pp over Claude Code. Gemini 3.1 Pro scored +20pp over Gemini CLI.\n\nWould love to see how other harness × model combinations will do, e.g. @cursor_ai, @antigravity, @FactoryAI and how well the eval harness does on more general knowledge work, e.g. GDPval.\n\nGreat to see the SWE-agent team keep pushing on both the research and eval side. 🤗",
    "created_at": "Wed May 27 09:17:38 +0000 2026",
    "likes": 180,
    "views": "15692",
    "url": "https://x.com/i/status/2059564676569076021",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJUK1RyWQAAgEU9.jpg",
      "https://pbs.twimg.com/media/HJUK1RmWgAISmgA.png"
    ],
    "round_first_seen": 85,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060482513345405178",
    "author": "yunta_tsai",
    "text": "@aikukharenko I wrote my eval cases using something I inspected and trust.\n\nIf you want the coding agent to help, make sure you are happy with the eval metrics and ask the sub-agents to iterate in a sandbox environment so they don’t cheat.",
    "created_at": "Fri May 29 22:04:47 +0000 2026",
    "likes": 7,
    "views": "513",
    "url": "https://x.com/i/status/2060482513345405178",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 85,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2059471964079620504",
    "author": "0xCVYH",
    "text": "Shipped a WANDERING failure-mode detector to UK AISI's Inspect Evals register.\n\nSame pattern behind $245M of May 2026 crypto-agent exploits — now anyone can run it via `inspect eval`.\n\nQwen3.6-27B / SWE-bench Pro (N=99):\n• v5: 55%/5%\n• v1∪v5: 70%/5%\n• v4: 80%/30%/15-turn lead\n\n🧵",
    "created_at": "Wed May 27 03:09:14 +0000 2026",
    "likes": 3,
    "views": "798",
    "url": "https://x.com/i/status/2059471964079620504",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 85,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2061160449345753202",
    "author": "SciFi",
    "text": "DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution\n\nYunhai Hu, Zining Liu, Xiangyang Yin, Tianhua Xia, Bo Bao, Eric Sather, Vithursan Thangarasa, Sai Qian Zhang\nhttps://t.co/plTgUKpVVF [𝚌𝚜.𝙰𝙸] https://t.co/EoqJQsaRMU",
    "created_at": "Sun May 31 18:58:40 +0000 2026",
    "likes": 0,
    "views": "58",
    "url": "https://x.com/i/status/2061160449345753202",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJq2gaRWYAEvf_-.png"
    ],
    "round_first_seen": 86,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2060735053940883610",
    "author": "SciFi",
    "text": "Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning\n\nYang Zhang, Xiaoshuai Sun, Rui Zhao, Wujin Sun, Yidong Chen, Jiayi Ji, Qian Chen, Rongrong Ji\nhttps://t.co/JDIYD8B9gM [𝚌𝚜.𝙰𝙸]\n💬Accepted at ICML 2026 https://t.co/Go12xLW3Wt",
    "created_at": "Sat May 30 14:48:18 +0000 2026",
    "likes": 0,
    "views": "112",
    "url": "https://x.com/i/status/2060735053940883610",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJkznISXkAEVkao.png"
    ],
    "round_first_seen": 86,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2061699964833501411",
    "author": "SoundPapers",
    "text": "Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty\n\nZhou Yang, Yueyi Yang\nhttps://t.co/iKuVGFDofo [𝚌𝚜.𝚂𝙳 𝚌𝚜.𝙰𝙸] https://t.co/N7pBKfmfsT",
    "created_at": "Tue Jun 02 06:42:30 +0000 2026",
    "likes": 0,
    "views": "1",
    "url": "https://x.com/i/status/2061699964833501411",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJyhMVmWYAA6tq9.png"
    ],
    "round_first_seen": 87,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061694280746619212",
    "author": "ArxivSound",
    "text": "Binh Nguyen, Charles Fleming, Thai Le, \"SARA: Stress Test Reasoning in Audio Deepfake Detection,\" https://t.co/oXXfEwaXrm",
    "created_at": "Tue Jun 02 06:19:55 +0000 2026",
    "likes": 0,
    "views": "49",
    "url": "https://x.com/i/status/2061694280746619212",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 87,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061695639181000710",
    "author": "ArxivSound",
    "text": "Ding Ma, Jinyi Mi, Fengji Li, Lester Phillip Violeta, Jiajun He, Wenchin Huang, Kazuhiro Kobayashi, Tomoki Toda, \"Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning,\" https://t.co/ja87lwM7MJ",
    "created_at": "Tue Jun 02 06:25:19 +0000 2026",
    "likes": 0,
    "views": "74",
    "url": "https://x.com/i/status/2061695639181000710",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 87,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2060175330665508917",
    "author": "ClementDelangue",
    "text": "Most people training agentic LLMs with RL right now have a silently broken training loop and have no idea.\n\nHere's the trap: single-turn RL works beautifully. Clean curves, sane rewards, everything converges. Then you add tools so the model can act mid-rollout, and things get weird. Loss spikes for no reason. Eventually a shape-mismatch error.\n\nThe culprit: every time you parse the model's output to detect a tool call, then re-tokenize the updated conversation for the next turn, you're rolling the dice. Usually the round-trip gives back the same tokens. Sometimes it doesn't and your gradient lands on a sequence the model never actually sampled. No crash. Just quietly wrong math and a useless gradient signal.\n\nThe fix is one rule: never re-encode tokens you've decoded. Keep the sampled tokens in one buffer, never re-render them, and both failure modes disappear. That's Token-In, Token-Out done right.\n\nOur team just published a beautiful deep-dive on exactly this, including an audit across the major open-weights model families showing most chat templates already support it. Required reading if you're doing multi-turn RL 🤗🔥\n\nhttps://t.co/zmx0EQl3jM",
    "created_at": "Fri May 29 01:44:09 +0000 2026",
    "likes": 1093,
    "views": "1003032",
    "url": "https://x.com/i/status/2060175330665508917",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJc2cXRWsAMbdXD.jpg"
    ],
    "round_first_seen": 88,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2060268257739677713",
    "author": "badlogicgames",
    "text": "pibot is now running fully local, using parakeet for STT, qwen3-tts for TTS, and Qwen 3.6 as the local multi-modal LLM via llama.cpp.\n\nThe STT and TTS inference engines are Rust/mlx-c based. Ported from Python. So, zero Python dependencies :D https://t.co/BgNQmEqGW4",
    "created_at": "Fri May 29 07:53:25 +0000 2026",
    "likes": 349,
    "views": "42376",
    "url": "https://x.com/i/status/2060268257739677713",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJeKxkXWIAAc3n5.jpg"
    ],
    "round_first_seen": 89,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2059902265474138585",
    "author": "mlexploration",
    "text": "13. Agent Explorative Policy Optimization for Multimodal Agentic Reasoning\n\nPDF: https://t.co/wknmL4C4tO",
    "created_at": "Thu May 28 07:39:05 +0000 2026",
    "likes": 1,
    "views": "7",
    "url": "https://x.com/i/status/2059902265474138585",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 89,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061553078466220540",
    "author": "Prince_Canuma",
    "text": "@zzddfge @badlogicgames @mitsuhiko Its Pi agent calling mlx-vlm server",
    "created_at": "Mon Jun 01 20:58:50 +0000 2026",
    "likes": 0,
    "views": "74",
    "url": "https://x.com/i/status/2061553078466220540",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 89,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2060536364748148988",
    "author": "KyeGomezB",
    "text": "8 /\n\nTowards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation\n\nPtah is a multi-agent system designed for reliable multimodal deep research, generating visually informative reports by interleaving text and images with strong verification mechanisms. It orchestrates planning, research, and writing stages using specialized agents and a Visual Working Memory.\n\nA dedicated verifier agent enforces factual grounding and cross-modal consistency, while PtahEval introduces new assessment protocols. Experiments show superior reliability and usability compared to baselines in deep research benchmarks.\n\nLearn more: https://t.co/73U7jTR2AM",
    "created_at": "Sat May 30 01:38:46 +0000 2026",
    "likes": 7,
    "views": "120",
    "url": "https://x.com/i/status/2060536364748148988",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 89,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061437074264248621",
    "author": "KirkDBorne",
    "text": "New release from @PacktDataML at https://t.co/WyEqNc93yl\n\n\"A Practical Guide to Reinforcement Learning from Human Feedback (RLHF)\"\n\n𝗔𝗺𝗮𝘇𝗼𝗻 𝘀𝘂𝗺𝗺𝗮𝗿𝘆:\n\nRLHF is a powerful approach to AI alignment and human-centered machine learning. By combining reinforcement learning algorithms with human feedback signals, RLHF has become a key method for improving the safety, reliability, and alignment of large language models (LLMs).\n\nThis book begins with the foundations of reinforcement learning and policy optimization, including algorithms such as proximal policy optimization (PPO), and explains how reward models and human preference learning help fine-tune AI systems and generative AI models.",
    "created_at": "Mon Jun 01 13:17:52 +0000 2026",
    "likes": 13,
    "views": "1183",
    "url": "https://x.com/i/status/2061437074264248621",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJuyGB0WUAIvruL.jpg"
    ],
    "round_first_seen": 90,
    "topic_first_seen": "reward model RLHF agent OR RLHF agent OR process reward model"
  },
  {
    "id": "2061694606870495338",
    "author": "ArxivSound",
    "text": "Yekaterina Yegorova, Argyrios Gerogiannis, Haolong Zheng, Julia Hockenmaier, Chang D. Yoo, Mark A. Hasegawa-Johnson, \"SALSA: Speech Aware LLM Adaptation via Learned Steering Activation Vectors,\" https://t.co/E92UL1hYVV",
    "created_at": "Tue Jun 02 06:21:13 +0000 2026",
    "likes": 1,
    "views": "98",
    "url": "https://x.com/i/status/2061694606870495338",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 91,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2061727662750732492",
    "author": "salmakhatunn",
    "text": "✅ Top AI LLM Models for Every Task\n\nAsk Gemini 3.1 Pro FREE now 👉 https://t.co/HddCG5NnJ6\n\n→ Writing &amp; Research: Grok 4.3, GLM 5.1,GPT-5.5, Claude 4.6, Gemini 3.1 Pro, Perplexity\n→ Social Content: Grok 4, GPT o3, DeepSeek\n→ Academic / STEM: Claude Opus 4.7, MiniMax M2.7 https://t.co/MAW6AYkeqY",
    "created_at": "Tue Jun 02 08:32:34 +0000 2026",
    "likes": 0,
    "views": "51",
    "url": "https://x.com/i/status/2061727662750732492",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061727609558597632/vid/avc1/398x270/uKi67cMoazn3mGGo.mp4?tag=14"
    ],
    "round_first_seen": 91,
    "topic_first_seen": "audio language model OR audio LLM OR speech LLM"
  },
  {
    "id": "2059621167099449708",
    "author": "tldr_ai_papers",
    "text": "🚀 Introducing Gemini Embedding 2! 🖼️ 🔊 📝 This new multimodal embedding model from Gemini can understand video, audio, images, and text in a *single* unified space.",
    "created_at": "Wed May 27 13:02:06 +0000 2026",
    "likes": 0,
    "views": "35",
    "url": "https://x.com/i/status/2059621167099449708",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 92,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2060409454596936057",
    "author": "CaptainHaHaa",
    "text": "Google’s new model Omni is now available at Invideo.\nHere's a thread of all the cool things you can do.",
    "created_at": "Fri May 29 17:14:29 +0000 2026",
    "likes": 21,
    "views": "1090",
    "url": "https://x.com/i/status/2060409454596936057",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 92,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2059815815080300674",
    "author": "JoshDaws",
    "text": "How is everyone using Google’s Omni model? The Gemini app on my iPhone isn’t giving me anything like the examples I’m seeing on X.",
    "created_at": "Thu May 28 01:55:34 +0000 2026",
    "likes": 0,
    "views": "737",
    "url": "https://x.com/i/status/2059815815080300674",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 92,
    "topic_first_seen": "multimodal foundation model OR omni model OR omnimodal model"
  },
  {
    "id": "2060541812897710548",
    "author": "somi_ai",
    "text": "@NVIDIAAIInfra loop speed was never really the bottleneck though. getting a clean reward signal out of real execution is the hard part. flaky outputs and nondeterminism make the feedback noisy enough that the agent learns the wrong thing",
    "created_at": "Sat May 30 02:00:25 +0000 2026",
    "likes": 1,
    "views": "167",
    "url": "https://x.com/i/status/2060541812897710548",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 93,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2059296738897477930",
    "author": "AntLingAGI",
    "text": "From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models. 👇\n\nKPop replaces the fixed-ratio mask with an adaptive binary-KL region that matches each token's inherent noise. More robust updates, stable long-horizon agentic RL.\n\nRing-2.6-1T → 76+ on SWE-bench Verified, pure RL.\n\nCongrats to @Jia__Guo & team!\n\nBlog: https://t.co/PvYHN5ywwz",
    "created_at": "Tue May 26 15:32:57 +0000 2026",
    "likes": 44,
    "views": "4325",
    "url": "https://x.com/i/status/2059296738897477930",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJQXctVbIAEk9Fn.png"
    ],
    "round_first_seen": 93,
    "topic_first_seen": "agentic reward model OR agentic RL OR agent reward"
  },
  {
    "id": "2060033437566697958",
    "author": "dr_cintas",
    "text": "HexoAI just open-sourced a self-improving AI that updates its own weights 🤯\n\nThe loop runs three agents:\n\n> a meta-agent builds an agent for your task\n> the target agent attempts it and logs everything\n> a feedback agent rewrites the harness and updates the weights\n\nThen it repeats, improving each generation.\n\n100% Open Source.",
    "created_at": "Thu May 28 16:20:19 +0000 2026",
    "likes": 24,
    "views": "3218",
    "url": "https://x.com/i/status/2060033437566697958",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 94,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2061381953061023881",
    "author": "FinanceYF5",
    "text": "9/🏁 AI 工程师，这周你最需要看哪一篇？\n\nSkillOpt 告诉你：优化 Agent 的最便宜手段是那个文档。\n\nLife-Harness 告诉你：先调接口，再考虑 Fine-tune。智力税永远出在最显眼但最少人动的地方。",
    "created_at": "Mon Jun 01 09:38:50 +0000 2026",
    "likes": 1,
    "views": "199",
    "url": "https://x.com/i/status/2061381953061023881",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 94,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2061372887445991518",
    "author": "julian__duru",
    "text": "We have been building Harnessy.\n\nHarnessy is an agent capability harness for software projects and agent runtimes: the layer that tells an agent what it can do, what context matters, and how to verify the environment.",
    "created_at": "Mon Jun 01 09:02:49 +0000 2026",
    "likes": 141,
    "views": "16456",
    "url": "https://x.com/i/status/2061372887445991518",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 94,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2061336667060113891",
    "author": "ranyi1115",
    "text": "“Loyalty to the harness ends up higher than loyalty to the model. Memory and context live in the harness.\"\n\nDai Yusen, Zhenfund.\n\nThis is why we built Starchild as the agent OS, not another model wrapper.",
    "created_at": "Mon Jun 01 06:38:53 +0000 2026",
    "likes": 5,
    "views": "287",
    "url": "https://x.com/i/status/2061336667060113891",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 94,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2060943482089533848",
    "author": "vinhnx",
    "text": "I found a wonderful repo from Josh McKinney (Ratatui &amp; OpenAI Codex core-maintainer): https://t.co/OKJJnGBlYe.  This directory holds design notes for the Rust-first coding agent harness experiment. https://t.co/WlmF9VDN5x",
    "created_at": "Sun May 31 04:36:31 +0000 2026",
    "likes": 1,
    "views": "94",
    "url": "https://x.com/i/status/2060943482089533848",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJnv59eaIAAXgxo.jpg"
    ],
    "round_first_seen": 94,
    "topic_first_seen": "RL agent harness OR agent harness OR harness eval"
  },
  {
    "id": "2061035047784735153",
    "author": "TuracTheThinker",
    "text": "Then end-to-end: multi-step journeys, ordered and unordered, including a tool that fails mid-chain. The question isn't \"does the happy path work\" — it's \"what does the agent do when step 3 of 5 dies.\" That's the eval that predicts prod. https://t.co/CqwK7XyxHo",
    "created_at": "Sun May 31 10:40:22 +0000 2026",
    "likes": 0,
    "views": "8",
    "url": "https://x.com/i/status/2061035047784735153",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJpEdEXXwAAp33E.png"
    ],
    "round_first_seen": 95,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2061309051976233110",
    "author": "CreaoAI",
    "text": "@scaling01 The 50% promo is nice, but the real number to watch is output cost once people start running this in agent loops. Long context is where usage compounds fastest (summaries, repo scans, transcript work, eval passes, etc.). Lots of potential here",
    "created_at": "Mon Jun 01 04:49:09 +0000 2026",
    "likes": 1,
    "views": "437",
    "url": "https://x.com/i/status/2061309051976233110",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 95,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2061282979440467995",
    "author": "BeeGeeEth",
    "text": "Key principle:\n\nLow output is not waste by itself.\n\nWaste means the agent consumed tokens without producing useful downstream value.\n\nTry:\n\ncheck token waste for [My_Agent]\n\nor:\n\neval cost change for [My_Agent]",
    "created_at": "Mon Jun 01 03:05:33 +0000 2026",
    "likes": 0,
    "views": "35",
    "url": "https://x.com/i/status/2061282979440467995",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 95,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2061620094933823877",
    "author": "aabyzov",
    "text": "@rauchg 10x cheaper changes behavior more than the rank does. At that price you stop rationing agent runs: eval every PR, let the loop retry. Cheap-and-close beats expensive-and-best for most real work.",
    "created_at": "Tue Jun 02 01:25:08 +0000 2026",
    "likes": 0,
    "views": "72",
    "url": "https://x.com/i/status/2061620094933823877",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 95,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060025586248986651",
    "author": "LangChain",
    "text": "Evals shape agent behavior.  Every eval is a vector that shifts the behavior of your agentic system.  \n\nMore evals ≠ better agents. Instead, build targeted evals that reflect desired behaviors in production.\n\nTools like LangSmith Engine help you targetedly create evals from your tracing data to build better agents.",
    "created_at": "Thu May 28 15:49:07 +0000 2026",
    "likes": 77,
    "views": "10676",
    "url": "https://x.com/i/status/2060025586248986651",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 95,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060265314252718331",
    "author": "AIDailyGems",
    "text": "Good eval candidate: give it one bug, one refactor, and one failing test before trusting it. Open-source AI coding assistant &amp; multi-agent teams for macOS, powered by local LLMs via LM Studio. Free, private, zero telemetry, no npm. Generat\n\nhttps://t.co/hFhEKmLcer",
    "created_at": "Fri May 29 07:41:43 +0000 2026",
    "likes": 2,
    "views": "88",
    "url": "https://x.com/i/status/2060265314252718331",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 95,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2061619620352602119",
    "author": "_itsjustshubh",
    "text": "@arpit_bhayani context window management and multi-agent coordination patterns are the ones i've seen. also how you'd design an LLM eval pipeline at scale — comes up more than expected",
    "created_at": "Tue Jun 02 01:23:15 +0000 2026",
    "likes": 1,
    "views": "125",
    "url": "https://x.com/i/status/2061619620352602119",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 95,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060934881308180818",
    "author": "newlinedotco",
    "text": "3. Orchestration is key!  \nHarness manages inter-agent data passing, error handling, and team coordination protocols.  \nNo more chaos in agent communication. It's organized and efficient! 📈",
    "created_at": "Sun May 31 04:02:20 +0000 2026",
    "likes": 0,
    "views": "27",
    "url": "https://x.com/i/status/2060934881308180818",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 95,
    "topic_first_seen": "agent harness eval OR eval harness OR agent eval"
  },
  {
    "id": "2060148165198925983",
    "author": "AINativeF",
    "text": "2. Agent Explorative Policy Optimization for Multimodal Agentic Reasoning\n\n🔑 Keywords: Vision-language models, Extended reasoning, AXPO, Tool use, Thinking-Acting Gap\n\n💡 Category: Multi-Modal Learning\n\n🌟 Research Objective:\n   - The paper addresses the challenges faced by agents using vision-language models with extended reasoning, specifically in tool utilization, through a method called AXPO.\n\n🛠️ Research Methods:\n   - AXPO, or Agent eXplorative Policy Optimization, optimizes thinking prefixes and resamples tool calls to improve performance, using uncertainty-based prefix selection.\n\n💬 Research Conclusions:\n   - The integration of AXPO with SFT outperforms traditional methods across multiple benchmarks, providing superior results with fewer parameters.\n\n👉 Paper link: https://t.co/1M7LkylPF2",
    "created_at": "Thu May 28 23:56:12 +0000 2026",
    "likes": 2,
    "views": "119",
    "url": "https://x.com/i/status/2060148165198925983",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJcd1sEbQAAdjJL.jpg"
    ],
    "round_first_seen": 96,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2060567181793108101",
    "author": "Chinazhidx",
    "text": "🚨Tongyi Lab introduces Qwen-VLA, a unified vision-language-action generalist model.\n\nA single model supports robotic manipulation, vision-language navigation &amp; cross-embodiment control.\n\nIt outperforms specialized models while operating across 11 robotic platforms.#EmbodiedAI https://t.co/zKKPvQwyOl",
    "created_at": "Sat May 30 03:41:14 +0000 2026",
    "likes": 12,
    "views": "715",
    "url": "https://x.com/i/status/2060567181793108101",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2060566146806616064/vid/avc1/468x270/OjgMygKRlZ3rosq2.mp4?tag=14",
      "https://pbs.twimg.com/media/HJiaAhGaYAAC8A3.jpg",
      "https://pbs.twimg.com/media/HJiaB7UboAArvYI.jpg",
      "https://pbs.twimg.com/media/HJiaEKEakAEHsOW.jpg"
    ],
    "round_first_seen": 96,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2059150409651150988",
    "author": "aipoch_ai",
    "text": "#3 — Digital Twin Clinical Agent\nhttps://t.co/yswf8uVCd8\nCore strength:\n→ Future-facing autonomous clinical simulation\n\nThis Skill represents the most ambitious vision:\nAI-generated patient modeling and dynamic clinical reasoning.\n\nWhat makes it unique:\n• Patient trajectory simulation\n• Clinical twin reasoning\n• Longitudinal health modeling\n• Agent-based healthcare interaction\n\nConceptually, this may represent the future of precision medicine AI.\n\nBut today, Digital Twin systems still face major challenges:\n• Missing longitudinal data\n• Weak reproducibility\n• Validation difficulty\n• Narrative-style reasoning risks\n• Lack of standardized evaluation\n\nIn practice, many Digital Twin systems currently function more as:\n“Intelligent clinical storytelling systems”\nthan rigorously validated biomedical infrastructures.\n\nBest for:\n• AI healthcare experimentation\n• Clinical simulation concepts\n• Future Digital Twin exploration\n\nMain limitation:\nThe field itself is still immature.\n\nVerdict:\nHighest long-term vision, but lowest current research stability.",
    "created_at": "Tue May 26 05:51:29 +0000 2026",
    "likes": 0,
    "views": "49",
    "url": "https://x.com/i/status/2059150409651150988",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJOSUdzawAA_NHx.jpg",
      "https://pbs.twimg.com/media/HJOSVV_bIAABPtD.jpg",
      "https://pbs.twimg.com/media/HJOSWrcawAAMgr4.png"
    ],
    "round_first_seen": 96,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2061065233909919955",
    "author": "Twendee_",
    "text": "@VraserX How are you measuring \"same quality\" across tasks like long-context reasoning, code generation, and multimodal work where these models diverge most?",
    "created_at": "Sun May 31 12:40:19 +0000 2026",
    "likes": 0,
    "views": "4",
    "url": "https://x.com/i/status/2061065233909919955",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 96,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2059785905427386440",
    "author": "AINativeF",
    "text": "6. LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence\n\n🔑 Keywords: LLaVA-OneVision-2, Windowed Attention, codec-stream tokenization, large-scale open supervision, JumpScore\n\n💡 Category: Multi-Modal Learning\n\n🌟 Research Objective:\n   - To develop LLaVA-OneVision-2, a vision-language model achieving superior multimodal performance across video understanding, temporal grounding, and tracking tasks.\n\n🛠️ Research Methods:\n   - Utilization of Windowed Attention for efficient computation, codec-stream tokenization for allocating token budgets, and large-scale open supervision using approximately 8M re-captioned video samples for pretraining.\n\n💬 Research Conclusions:\n   - LLaVA-OneVision-2 demonstrates remarkable performance, significantly surpassing existing models on multimodal benchmarks, including a notable improvement on the JumpScore benchmark and standard video, spatial, and tracking tasks.\n\n👉 Paper link: https://t.co/NjlbNXFMPV",
    "created_at": "Wed May 27 23:56:43 +0000 2026",
    "likes": 0,
    "views": "61",
    "url": "https://x.com/i/status/2059785905427386440",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJXUXaWakAAXjGb.jpg"
    ],
    "round_first_seen": 96,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2059831589715210243",
    "author": "MattQR",
    "text": "Google recently revealed that 1 in 6 AI Mode queries are now multimodal.\n\nThat means users are increasingly searching with:\n\n• images,\n• voice,\n• video,\n• screenshots, and \n• live camera inputs.\n\nMost websites are still optimized almost entirely for text.\n\nFeels like we are entering a world where machine-readable visual context becomes part of SEO itself.\n\nI guess multimodal visibility is going to become massively underestimated over the next few years.",
    "created_at": "Thu May 28 02:58:15 +0000 2026",
    "likes": 7,
    "views": "70",
    "url": "https://x.com/i/status/2059831589715210243",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 96,
    "topic_first_seen": "multimodal reasoning agent OR multimodal reasoning OR vision reasoning agent"
  },
  {
    "id": "2059655183957651793",
    "author": "ILYA_babay",
    "text": "@davidstutz92 Multimodal lifts capability but raises the citation bar — when input is image+audio+utterance, the chain still has to bottom out in retrievable evidence. Otherwise 'reasoning' is pattern-matching at higher dimensionality. Built that into https://t.co/HWQuNHiSLc",
    "created_at": "Wed May 27 15:17:17 +0000 2026",
    "likes": 0,
    "views": "6",
    "url": "https://x.com/i/status/2059655183957651793",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 97,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2059482796637536494",
    "author": "OmkarDutta",
    "text": "Google Omni can translate spoken audio without needing the source or translated text in the prompt.\n\nIt understands pacing, timing, scene rhythm, dialogue length, and cinematic flow.\n\nhttps://t.co/3FgpqLtkwz",
    "created_at": "Wed May 27 03:52:16 +0000 2026",
    "likes": 0,
    "views": "57",
    "url": "https://x.com/i/status/2059482796637536494",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2058851645535162368/vid/avc1/480x270/2qud3iFe5qR3XRef.mp4?tag=27"
    ],
    "round_first_seen": 97,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061087392438903134",
    "author": "m13v_",
    "text": "@blueprint_os @airasentia @openclaw the why almost always exists, it's just buried in the PR body or an issue thread nobody reopens. the diff is the only part most tooling surfaces, which is exactly why podlog pulls the commit, PR and issue reasoning into a short audio rundown you catch on a walk. written with ai",
    "created_at": "Sun May 31 14:08:22 +0000 2026",
    "likes": 0,
    "views": "18",
    "url": "https://x.com/i/status/2061087392438903134",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 97,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2060511514436784367",
    "author": "Trtd6Trtd",
    "text": "https://t.co/Jj6UA37UYp\n\nASR、TTS、リアルタイム音声対話の3タスクを単一アーキテクチャで統合した音声言語基盤モデル\n\n音声トークンも作って、音声とテキストを共有表現空間で学習しているらしい\n\nASRは音声入力という外部証拠があるため、LLMの自由形式テキスト生成よりも積極的に投機的デコードが使えるというのが知見だった",
    "created_at": "Sat May 30 00:00:02 +0000 2026",
    "likes": 30,
    "views": "2501",
    "url": "https://x.com/i/status/2060511514436784367",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 97,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2059336212431654913",
    "author": "GradiumAI",
    "text": "int8 quantization, no audible quality loss. Lower memory, faster inference. All on-device.\n\nhttps://t.co/n15FHxFaeu https://t.co/7OSUp9gjw4",
    "created_at": "Tue May 26 18:09:48 +0000 2026",
    "likes": 18,
    "views": "884",
    "url": "https://x.com/i/status/2059336212431654913",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJQ7M2SXUAAlRa6.jpg"
    ],
    "round_first_seen": 97,
    "topic_first_seen": "audio reasoning model OR audio reasoning OR speech reasoning"
  },
  {
    "id": "2061128217017393503",
    "author": "cerebral_valley",
    "text": "🥉 3rd Place + Best Use of Managed Agents: AgentGym\n\nAgentGym is an RL environment that trains managed agents to operate and respond in your voice. https://t.co/ast1Y0BU5x",
    "created_at": "Sun May 31 16:50:35 +0000 2026",
    "likes": 4,
    "views": "374",
    "url": "https://x.com/i/status/2061128217017393503",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061128031767588864/vid/avc1/480x270/D4SrlGj4ze-Df8JH.mp4?tag=14"
    ],
    "round_first_seen": 98,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2059687882038845641",
    "author": "Marktechpost",
    "text": "1/5 NVIDIA just released Polar — an RL training framework that doesn't require you to modify your agent harness at all.\n\nIt intercepts at the model API boundary instead. The harness runs exactly as it does in production.\n\nFull technical breakdown with benchmark tables: https://t.co/qrOpiRxzgE\n\nHere's how it works 🧵\n\n@NVIDIAAI  @billxbf @HaoZhang3438830  @ShaokunZhang1\n@songyang_han @shizhediao @yunhengjackiez",
    "created_at": "Wed May 27 17:27:12 +0000 2026",
    "likes": 20,
    "views": "2913",
    "url": "https://x.com/i/status/2059687882038845641",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 98,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2060000120763187379",
    "author": "myexamcloud",
    "text": "🚀 The Complete Agentic AI Certification Path in 2026\n\nTraditional programming alone is no longer enough.\n\nRead:\nhttps://t.co/QhytGLZAdH\n\n#AgenticAI #AI #AIAgents #SoftwareEngineering #FutureOfAI #RAG #AWS #AzureAI https://t.co/1AMCtZ9ccM",
    "created_at": "Thu May 28 14:07:56 +0000 2026",
    "likes": 10,
    "views": "47",
    "url": "https://x.com/i/status/2060000120763187379",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2059999982577745920/vid/avc1/320x568/E5zp0ulnX3F73fp7.mp4?tag=14"
    ],
    "round_first_seen": 98,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2059974887700377608",
    "author": "CAmukeshbhagat",
    "text": "$CRWV just closed the training-inference loop.\n\nAgents now learn in production — not just before it.\n\nServerless RL. 40% cost cut. Iteration: hours → seconds.\n\nGPU demand: episodic → persistent.\n\nMeta ($21B) + Anthropic + Jane Street ($6B) are already in.\n\nThe agentic economy needs infrastructure.\nCOREWEAVE just became the pipes.\n#NEOCLOUD",
    "created_at": "Thu May 28 12:27:40 +0000 2026",
    "likes": 0,
    "views": "83",
    "url": "https://x.com/i/status/2059974887700377608",
    "has_media": true,
    "media_urls": [
      "https://pbs.twimg.com/media/HJaAPkEbkAADTWI.jpg"
    ],
    "round_first_seen": 98,
    "topic_first_seen": "agentic RL training OR RL agent training OR RL post-training"
  },
  {
    "id": "2059135456093392911",
    "author": "noura_virtual",
    "text": "LLM Multi-Agent 2D Simulation\n解析完了\nべたな実装になっているため処理時間はとても長い\n技術的に特筆すべき点はないです💖",
    "created_at": "Tue May 26 04:52:04 +0000 2026",
    "likes": 1,
    "views": "49",
    "url": "https://x.com/i/status/2059135456093392911",
    "has_media": false,
    "media_urls": [],
    "round_first_seen": 99,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  },
  {
    "id": "2061541992790683726",
    "author": "Prince_Canuma",
    "text": "Today we're shipping our biggest MLX-VLM release yet: v0.6.0\n\n...and we are raising 💸\n\nThis one's about turning your Apple devices into real local agent machines. From your desk to your pocket. \n\nWhat's new:\n\n⚡ Speculative decoding everywhere — Gemma 4 EAGLE3 + DFlash, Qwen MTP, DeepSeek V4 MTP. Faster tokens, less waiting.\n\n🤖 Agent-ready server — native Anthropic /v1/messages API, stateful /v1/responses, tool calls, Codex context budgets. Plug Claude Code & Codex straight into local models.\n\n👁️ New models galore — DeepSeek V4, ZAYA1-VL, MiniCPM-V 4.6, LFM2 MoE, Step-3.7 Flash, Laguna + more.\n\n🎨 Image gen & editing — FLUX.2 (base + klein), PrismML Bonsai.\n\n🔊 Audio in — Qwen3 Omni, Gemma 4 audio, base64 chat audio.\n\n🧮 TurboQuant KV cache — RHT-correct fast paths for leaner memory.\n\n📦 Modular server, better metrics, cleaner streaming.\n\nRun real agents on the hardware already in your hands.\n\nGithub: https://t.co/1T06ur6LU5",
    "created_at": "Mon Jun 01 20:14:47 +0000 2026",
    "likes": 430,
    "views": "44451",
    "url": "https://x.com/i/status/2061541992790683726",
    "has_media": true,
    "media_urls": [
      "https://video.twimg.com/amplify_video/2061538785964236802/vid/avc1/480x270/jlXFBKDuPI1d9tgW.mp4?tag=27"
    ],
    "round_first_seen": 99,
    "topic_first_seen": "multi-modal LLM agent OR multimodal agent OR VLM agent"
  }
]