[작성자:] 스타크, 토니

  • Google Gemini 3 is Coming: The New AI Monarch Ascends the Throne!

    Guys, there's been a huge stir in the AI community recently 😱! Google Gemini 3 made a splash late at night, proclaiming the crowning of a new king in the AI realm ✨.

    Previously, people were spoilt for choice among AI models, with the differences in advantages between various models being rather marginal. But as soon as Gemini 3 Pro arrived, its performance was simply outstanding 💯. In tests representing the "pinnacle" of human intelligence, it outscored GPT - 5.1 and Claude Sonnet 4.5 by a large margin. In mathematics, it shows absolute dominance. When combined with code execution in AIME 2025, its accuracy rate reaches 100%, and in MathArena Apex, it leaves other large - scale models far behind. Moreover, its "visual intelligence" is truly remarkable. Its understanding ability of screenshots is twice that of the current advanced level 👏.

    Google also launched a "mini - bombshell", Google Antigravity. It's an agent - first development platform where developers can collaborate with multiple intelligent agents, skyrocketing their work efficiency 🚀. Additionally, Gemini 3 Pro is trained using Google TPU, with comprehensive data coverage. It has been integrated into Google Search, enabling it to instantly generate interactive charts or simulation tools when searching for complex concepts.

    Online practical tests have also yielded good results, with its direct - generation ability proving to be quite powerful. Guys, the AI era is unstoppable. Let's start paying close attention right away 🤩!

  • 大语言模型排行榜

    Official ranking

    Evaluate leading large - scale models according to the evaluation rules of OpenCompass and release the rankings.

  • Baidu's New AI Model ERNIE - 4.5 - VL is Amazing!

    Dear friends, there has been a major move in the AI field recently 🔥! Baidu has grandly released the new - generation multimodal AI model ERNIE - 4.5 - VL. In this era of rapid development of AI technology, it is really difficult to find an efficient and powerful AI model, which is the pain point of many developers and researchers 😭.

    However, this time Baidu's new model perfectly solves these problems 👏. It not only has powerful language - processing capabilities but also introduces the innovative function of "image thinking". With only 3B activation parameters, it has extremely high computational efficiency and flexibility, and can handle tasks quickly and efficiently. Moreover, this "image thinking" function is extremely powerful. It can perform operations such as image zooming and tool - calling for image search, greatly enriching the interactive experience between images and text.

    I think it will bring new possibilities to many fields such as intelligent search, online education, and e - commerce 💯. It's like equipping these fields with smart little wings, enabling them to fly higher and farther. Now this model is open - sourced, and developers and researchers can more conveniently explore the potential of multimodal AI. Dear friends, don't miss this great opportunity. Let's start researching together 👏!

    #Baidu AI Model #ERNIE - 4.5 - VL #Multimodal AI #Image Thinking #AI Technological Innovation

  • 谷歌Gemini 3 Pro预览版,百万级窗口超神!

    Dear friends, there's been a major development in the AI world recently 🔥! The Gemini series of artificial intelligence models under Google has made significant progress, and the latest preview version "gemini - 3 - pro - preview - 11 - 2025" has appeared on the Vertex AI platform.

    Previously, many AI models struggled when dealing with long documents and complex tasks, which was really frustrating 😣. However, Gemini 3 Pro supports an extremely large context window of up to 1 million tokens, which is simply a savior 👍! It can handle 200,000 tokens at the standard level and directly extends to 1 million tokens at the advanced level. It has also been optimized in terms of input - output ratio and the proportion of image/video/audio processing.

    It is regarded as a major upgrade of Gemini 2.5, focusing on multimodal reasoning and agent - based intelligence. The training data covers up to August 2024 and encompasses a variety of input sources. Industry analysts say that it is of revolutionary significance in the field of enterprise - level applications, such as financial modeling and biotech simulation.

    According to reports from multiple tech media, Google may reveal more details in the middle to late November, and the full release may be postponed until December. Compared with its predecessors, it is expected to outperform GPT - 4o in benchmark tests and perform excellently in multimodal creative generation and code - writing tasks 👏.
    Although Google has not yet officially responded, Vertex AI is accelerating the iteration of the Gemini series. Let's all look forward to its official debut ✨!

  • ChatGPT's "New Rules" Are Here! Prohibition on Providing Medical, Legal and Financial Advice!

    Dear friends, OpenAI updated the usage policy of ChatGPT on October 29. This time, the model is clearly prohibited from providing professional medical, legal or financial advice!

    This is mainly done to avoid regulatory risks, reduce the hidden danger of misleading people, and redefine the application boundaries of AI in high-risk fields. ChatGPT can no longer do things like interpreting medical images, assisting in diagnosis, drafting or interpreting legal contracts, providing personalized investment strategies or tax planning. If users raise such demands, the system will uniformly reply to guide them to consult human experts. Moreover, this policy covers all ChatGPT models and API interfaces to ensure consistent implementation.

    Although professionals can still use it for general concept discussion or data organization, they cannot directly provide "fiduciary" advice to end users. This adjustment is driven by global regulation. The EU's Artificial Intelligence Act is about to take effect, which will conduct strict reviews on high-risk AI, and the US FDA requires clinical verification for diagnostic AI tools. By doing so, OpenAI can avoid being recognized as "software as a medical device" and also prevent potential lawsuits.

    Regarding this new rule, users' reactions are divided into two camps. Some individual users feel quite regretful because they have lost the "low-cost consultation" channel. After all, they had saved a lot of professional consultation fees by relying on AI before. However, most of the medical and legal circles support it. After all, the "pseudo-professional" output of AI is indeed likely to lead to misdiagnosis or disputes. Data shows that over 40% of ChatGPT queries are of the advice type, and medical and financial advice account for nearly 30%. This policy may lead to a short-term decline in traffic.

    It also has a significant impact on the industry. Google, Anthropic, etc. may also follow suit and impose restrictions. Vertical AI tools, such as certified legal/medical models, may become popular. Chinese companies like Baidu have already complied in advance. Under the situation of stricter domestic regulation, innovation has to be explored within the "sandbox" mechanism.

    OpenAI emphasizes that the goal is to "balance innovation and safety". This update continues the Model Spec framework, and it is said that there will be further iterations in February 2025. The transformation of AI from an "omnipotent assistant" to a "limited assistant" seems to have become an industry consensus. In the future, technological breakthroughs and ethical constraints will develop together. I wonder what new balance the GPT-5 era will bring?

    What do you think of this new rule of ChatGPT? Come and share your thoughts in the comment section!

    Topic tags and keywords: #OpenAI #ChatGPT #UsagePolicyUpdate #MedicalAdvice #LegalAdvice #FinancialAdvice #AISupervision #IndustryImpact

  • Google Gemini is about to make a big splash! The Nano Banana2 image generation technology is coming with an upgrade.

    Dear friends, there's extremely important news! Google is busy preparing to release the AI image generation model Nano Banana2, with the internal code name "GEMPIX2". Judging from the new announcements on the official Gemini website, it may meet us in the next few weeks!


    The Nano Banana series is the ace of Google's DeepMind team. Since the first generation was launched on August 26, 2025, it has been extremely popular. It topped the LMArena image editing leaderboard during the early preview. Its "multi-round dialogue" interaction and character retention functions are excellent. It can easily blend photos, change backgrounds, and generate artistically styled images. In just a few weeks, it has attracted 10 million new users to join the Gemini ecosystem, with more than 200 million image editing operations!


    Judging from the preview cards and technical indicators on the Gemini UI interface, the exposure of Nano Banana2 this time indicates that it will continue to focus on creativity, optimize the visual generation speed and artistic style diversity for professional creators and developers, and may also be deeply integrated with the Gemini3.0 series to enhance the multimodal processing ability, such as the generation of customized visual styles for video overviews.


    Although Google has not announced the specific details yet, it feels like the release is just around the corner. Maybe it will appear together with the updates of products such as NotebookLM and Google Photos. The first-generation model has made the monthly active users of Gemini exceed 650 million. With the arrival of Nano Banana2 this time, it is expected to further narrow the gap with its competitors and inject new vitality into the creative industry. Moreover, Google emphasizes that all generated images will be marked with watermarks to ensure compliance.


    What are your expectations for Nano Banana2? Come and chat in the comment section!

    #GoogleGemini #NanoBanana2 #ImageGenerationTechnology #AIInnovation #GenerativeAI

  • Still struggling to make PPTs? Google Gemini can generate PPTs with one click to the rescue!

    Guys, the era of tedious PPT making may really be coming to an end! Google has introduced a super useful new function for the AI assistant Gemini. In Geminis interactive workspace Canvas, you can automatically generate super professional PPTs just by entering a one-sentence prompt. It can be used by both individual users and Google Workspace accounts!

    This function is super intelligent, "fast" and "accurate". If there is no specific material, for example, if you enter "Create a presentation on climate change", it can automatically organize the content framework, match the theme style and insert relevant pictures. If there are existing materials, just upload Word documents, PDF reports or Excel spreadsheets, and it can extract key information and transform it into clear and logical slide content.

    Moreover, the generated PPTs are not static finished products. They can be directly exported to Google Slides. On this basis, you can freely adjust the layout, add or delete content, and collaborate in real time with team members. Its a proper efficient workflow of "AI drafting + manual optimization".

    This is an important iteration of Google since the launch of the Canvas workspace in March this year. From initially supporting collaborative editing of text and code to now expanding to multimodal content generation, Gemini is striding forward towards a deep productivity tool!

    Have any of you guys used this function? Come and share your experiences in the comment section!

    #GoogleGemini #PPTGeneration #CanvasWorkspace #OfficeSkills #AIAssistedOffice

  • 🛠️ Comparison and Analysis of AI Programming CLI Tools

    🤖 Claude Code CLI

    Claude Code CLI is launched by Anthropic. Based on its large Claude models (such as Opus 4, Sonnet 4), it is a command - line intelligent programming assistant that emphasizes strong reasoning ability and in - depth code understanding.

    Advantages:

    • In - depth Code Understanding and Complex Task Handling: Claude Code can deeply understand the structure of code libraries and complex logical relationships. It supports a context window of hundreds of thousands of tokens, enabling efficient multi - file linkage operations and cross - file context understanding. It is particularly good at handling medium - to - large - scale projects.
    • Sub - agent Architecture and Powerful Toolset: It supports the sub - agent architecture, which can intelligently split complex tasks into multiple subtasks for parallel processing, achieving multi - agent - like collaboration. The built - in toolset is rich and professional, including more refined file operations (such as MultiEdit for batch modification), efficient file retrieval (Grep tool), task management and planning (TodoWrite/Read, Task sub - agent), and profound Git/GitHub integration capabilities, such as understanding PRs, code review, and handling comments.
    • Integration with Enterprise - level Toolchains: Claude Code can not only be seamlessly integrated with IDEs, directly showing code changes in the IDE's difference view, but also be integrated into the CI/CD process in the form of GitHub Actions. It allows @claude in the comments of PRs or Issues to automatically analyze code or fix errors.
    • Fine - grained Permission Control and Security: It provides a very complete and fine - grained permission control mechanism, allowing users to precisely control the permissions of each tool through configuration files or command - line parameters. For example, it can allow or prohibit a certain Bash command, limit the read - write range of files, and set different permission modes (such as the plan mode which is read - only and not writable). In an enterprise environment, system administrators can also enforce security policies that users cannot override.

    Disadvantages:

    • It is a commercial paid product with relatively high subscription fees.
    • Its image recognition ability is relatively weak: When dealing with the understanding and analysis of interface screenshots and the task of converting design drafts into code, its accuracy and restoration degree may be inferior to some competitors.

    Scope of Capabilities:

    Claude Code CLI is very suitable for medium - to - large - scale project development, code libraries that need long - term maintenance, and scenarios where high code quality is required, and AI assistance is needed for in - depth debugging, refactoring, or optimization. It is relatively mature in terms of enterprise - level security, functional integrity, and ecosystem.

    Usage:

    It is usually installed globally via npm: npm install -g @anthropic - ai/claude - code. After installation, run claude login to go through the OAuth authentication process. The first time it runs, it will guide you through account authorization and theme selection. After completion, you can enter the interactive mode. Users can command the AI to complete code generation, debugging, refactoring, etc. through natural language instructions.

    🔮 Gemini CLI

    Gemini CLI is an open - source command - line AI tool by Google. Based on the powerful Gemini 2.5 Pro model, it aims to turn the terminal into an active development partner.

    Advantages:

    • Free and Open - source with Generous Quota: It is open - source under the Apache 2.0 license, with high transparency. Personal Google account users can enjoy a free quota of 60 requests per minute and 1000 requests per day, which is highly competitive among similar tools.
    • Ultra - long Context Support: It supports a context window of up to 1 million tokens, easily handling large - scale code libraries, and can even read an entire project at once, which is very suitable for large - scale projects.
    • Terminal - native and Powerful Agent Capability: Designed specifically for the command - line interface, it minimizes developers' context switching. It adopts the "Think - Act" (ReAct) loop mechanism, combined with built - in tools (such as file operations, shell commands) and the Model Context Protocol (MCP) server, to complete complex tasks such as fixing errors and creating new functions.
    • High Scalability: Through the MCP server, bundled extensions, and the GEMINI.md file for custom prompts and instructions, it has a high degree of customizability.

    Disadvantages:

    • The accuracy of instruction execution and intention understanding is sometimes not as good as Claude Code, with slightly inferior performance.
    • There are potential data security risks in the free version. User data may be used for model training, making it unsuitable for handling sensitive or proprietary code.
    • The output quality may fluctuate. User feedback shows that Gemini - 2.5 - pro sometimes automatically downgrades to the less powerful Gemini - 2.5 - flash model, resulting in a decline in output quality.
    • Its integration with the enterprise - level development environment is relatively weak, and it is more positioned as an independent terminal tool.

    Scope of Capabilities:

    Gemini CLI, with its large context window and free features, is very suitable for individual developers, rapid prototyping, and exploratory programming tasks. It is suitable for handling large code libraries but is relatively weak in complex logic understanding and deep integration with enterprise - level toolchains.

    Usage:

    Install via npm: npm install -g @google/gemini - cli. After installation, run the gemini command. The first time it runs, it will guide users to authorize their Google accounts or configure the Gemini API Key (by setting the environment variable export GEMINI_API_KEY = "your API Key").

    🌐 Qwen Code CLI

    Qwen Code CLI is a command - line tool developed and optimized by Alibaba based on Gemini CLI, specifically designed to unleash the potential of its Qwen3 - Coder model in agent - based programming tasks.

    Advantages:

    • Deep Optimization for Qwen3 - Coder: It has customized prompts and function call protocols for the Qwen3 - Coder series of models (such as qwen3 - coder - plus), maximizing its performance in Agentic Coding tasks.
    • Support for Ultra - long Context: Relying on the Qwen3 - Coder model, it natively supports 256K tokens and can be extended to 1 million tokens, suitable for handling medium - to - large - scale projects.
    • Open - source and Supports OpenAI SDK Format: It is convenient for developers to call the model through compatible APIs.
    • Wide Range of Programming Language Support: The model natively supports up to 358 programming and markup languages.

    Disadvantages:

    • Token consumption may be relatively fast, especially when using large - parameter models (such as 480B), resulting in higher costs. Users need to pay close attention to usage.
    • The understanding and execution of complex tasks may sometimes get into loops or perform worse than top - tier models.
    • The understanding accuracy of tool calls may sometimes deviate.

    Scope of Capabilities:

    Qwen Code CLI is particularly suitable for developers who are interested in or prefer the Qwen model, as well as scenarios that require code understanding, editing, and certain workflow automation. It performs well in agent - based coding and long - context processing.

    Usage:

    Install via npm: npm install -g @qwen - code/qwen - code. After installation, you need to configure environment variables to point to the Alibaba Cloud DashScope endpoint that is compatible with the OpenAI API and set the corresponding API Key: export OPENAI_API_KEY = "your API key", export OPENAI_BASE_URL = "https://dashscope - intl.aliyuncs.com/compatible - mode/v1", export OPENAI_MODEL = "qwen3 - coder - plus".

    🚀 CodeBuddy

    CodeBuddy is an AI programming assistant launched by Tencent Cloud. Strictly speaking, it is not just a CLI tool but an AI programming assistant that integrates IDE plugins and other forms. However, its core capabilities overlap and are comparable to CLI tools, and it deeply integrates Tencent's self - developed Hunyuan large model and DeepSeek V3 model.

    Advantages:

    • Integration of Product, Design, and R & D: It integrates functions such as requirement document generation, design draft to code conversion (such as converting Figma to production - level code with a restoration degree of up to 99.9%), and cloud deployment, achieving end - to - end AI - integrated development from product design to R & D deployment.
    • Localization Optimization and Tencent Ecosystem Integration: Optimized specifically for Chinese developers, it provides better Chinese support and deeply integrates Tencent Cloud services (such as CloudBase), supporting one - click deployment.
    • Dual - model Driven: It integrates Tencent's Hunyuan large model and DeepSeek V3 model, providing high - precision code suggestions.
    • Visual Experience: It provides a Webview function, allowing direct preview of code debugging results within the IDE, with a smooth interactive experience.

    Disadvantages:

    • The interaction of some functions (such as @ symbol interaction) may need to be further simplified to improve operational convenience.
    • The code scanning speed may be slow in large projects.
    • The plugin compatibility with editors such as VSCode still needs to be enhanced.
    • Currently, an invitation code may be required for use.

    Scope of Capabilities:

    CodeBuddy is very suitable for developers and enterprises that need full - stack development support, hope for end - to - end AI assistance from design to deployment, and are deeply integrated into the Tencent Cloud ecosystem. It is especially suitable for quickly validating MVPs and accelerating product iterations.

    Usage:

    CodeBuddy is mainly used as an IDE plugin (such as the VS Code plugin), and it can also run in an independent IDE. Usually, users need to install the plugin and log in to their Tencent Cloud account to start experiencing features like code completion and the Craft mode.

    In general, Claude Code CLI, Gemini CLI, Qwen Code CLI, and CodeBuddy each have their own focuses and are actively exploring how to better assist and transform the programming workflow with natural language. The choice depends on your specific needs, technology stack, budget, and preferences for different ecosystems. Understanding their technical principles and challenges can also help us view and apply these powerful tools more rationally, making AI a truly capable assistant in the development process. CodeBuddy is mainly used as an IDE plugin (such as the VS Code plugin) and can also run in an independent IDE. Users usually need to install the plugin and log in to their Tencent Cloud account to start experiencing features such as code completion and the Craft mode.In general, Claude Code CLI, Gemini CLI, Qwen Code CLI, and CodeBuddy each have their own focuses and are actively exploring how to better assist and transform the programming workflow with natural language. The choice depends on your specific needs, technology stack, budget, and preferences for different ecosystems. Understanding their technical principles and challenges can also help us view and apply these powerful tools more rationally, making AI a truly capable assistant in the development process. CodeBuddy is mainly used as an IDE plugin (such as the VS Code plugin) and can also run in an independent IDE. Users usually need to install the plugin and log in to their Tencent Cloud account to start experiencing features such as code completion and the Craft mode.

  • Amazing! ByteDance has created Sa2VA by integrating LLaVA and SAM - 2, and a new favorite in multimodality is born.

    Dear folks, ByteDance has once again made a remarkable move in the AI realm! Collaborating with research teams from multiple universities, it has integrated the advanced vision - language model LLaVA and the segmentation model SAM - 2, unveiling an amazing new model, Sa2VA! 🎉

    LLaVA is an open - source vision - language model that excels in macroscopic video narration and content comprehension, yet it struggles a bit with detailed instructions. SAM - 2, on the contrary, is an outstanding image segmentation expert capable of identifying and segmenting objects within images, but it lacks language - understanding capabilities. To leverage their respective strengths, Sa2VA effectively combines these two models through a simple and efficient "code - word" system. 🧐

    The architecture of Sa2VA resembles a dual - core processor. One core is tasked with language understanding and dialogue, while the other is responsible for video segmentation and tracking. When a user enters an instruction, Sa2VA generates a specific instruction token and passes it to SAM - 2 for concrete segmentation operations. In this manner, the two modules function in their areas of expertise and can also engage in effective feedback - based learning, constantly enhancing the overall performance. 😎

    The research team has also designed a multi - task joint training curriculum for Sa2VA to boost its capabilities in image and video understanding. In numerous public tests, Sa2VA has demonstrated excellent performance, particularly shining in the video referential - expression segmentation task. It can accurately segment in complex real - world scenarios and can even track target objects in real - time within videos, boasting extremely strong dynamic - processing capabilities. 👏

    Moreover, ByteDance has made various versions of Sa2VA and its training tools publicly available, encouraging developers to conduct research and applications. This provides abundant resources for researchers and developers in the AI field, propelling the development of multimodal AI technology.

    Here are the project addresses:

    https://lxtgh.github.io/project/sa2va/

    https://github.com/bytedance/Sa2VA

    Dear friends, are you looking forward to Sa2VA? Come and share your thoughts in the comment section! 🧐

    #ByteDance #Sa2VA #Multimodal Intelligent Segmentation #LLaVA #SAM-2 #AI Model #Open-source

  • Amazing! Google's New Framework Helps AI Agent Learn from Mistakes, Will a Super Intelligent Agent Be Born? ✨

    Guys, Google has made a big splash in the AI field again! Recently, it proposed the revolutionary framework "Reasoning Memory" (learnable reasoning memory), aiming to enable AI Agents to achieve true "self - evolution", which is simply stunning 👏.

    First, let's talk about the pain points of current AI agents. Currently, AI Agents based on large language models perform well in reasoning and task execution, but they generally lack a sustainable learning mechanism. AIbase analysis shows that existing intelligent agents do not "grow" after completing tasks. Each execution is like starting anew, which brings a bunch of problems. For example, they make repeated mistakes, can't accumulate abstract experience, waste historical data, and have limited decision - making optimization. Even if a memory module is added, most of them are just simple information caches, lacking the ability to generalize, abstract, and reuse experience. It's very difficult to form "learnable reasoning memory", and thus they can't truly improve themselves 😔.

    Next, look at Google's new framework. The Reasoning Memory framework is a memory system specifically designed for AI agents, which can accumulate, generalize, and reuse reasoning experiences. Its core is to enable agents to extract abstract knowledge from their own interactions, mistakes, and successes to form "reasoning memories". Specifically:

    • Experience Accumulation: Agents no longer discard task history, but systematically record the reasoning process and results.
    • Generalization and Abstraction: Use algorithms to turn specific experiences into general rules, not just simple episodic storage.
    • Reuse and Optimization: Call on these memories in future tasks, adjust decisions according to past experiences, and reduce repeated mistakes.

    This mechanism allows AI agents to "learn from mistakes" like humans and achieve closed - loop self - evolution. Experiments show that agents equipped with this framework have a significantly improved performance in complex tasks. This is a huge leap from static execution to dynamic growth 😎.

    Finally, let's talk about the potential impact. AIbase believes that this research can reshape the AI application ecosystem. In fields such as automated customer service, medical diagnosis, and game AI, Agents can continuously optimize their own strategies and reduce human intervention. In the long run, it fills the "evolution gap" of LLM agents and lays the foundation for building more reliable autonomous systems. However, there are also challenges. For example, the memory generalization ability and computational cost still need to be further verified. But anyway, Google's move has strengthened its leading position in the forefront of AI, which is worthy of attention from the industry 🤩.

    Guys, what do you think of Google's new framework? Come and chat in the comments section 🧐.

    Paper address: https://arxiv.org/pdf/2509.25140https://arxiv.org/pdf/2509.25140

    Hashtags and keywords

    #Google #AI Agent #Self - evolution #Reasoning Memory #AI Framework #AI Application Ecosystem