Author: Stark, Tony

  • Sound starts in 300ms! Microsoft's real - time speech model is amazing ✨

    Guys, today I must share with you Microsoft's open - source real - time speech model VibeVoice - Realtime - 0.5B 👏!

    Previously, when using traditional TTS models, the startup time was often 1 - 3 seconds. That kind of lag really affected the experience 😫. This was the pain point in our use of speech models. However, VibeVoice - Realtime - 0.5B perfectly solves this problem. On average, it only takes 300 milliseconds from text input to sound output, almost with zero delay. It's just like having a conversation with a real person. As soon as you type, the other side starts to respond. It's extremely smooth 💯.

    Its capabilities don't stop there! It can generate an ultra - long audio of up to 90 minutes at one time, and the whole process is smooth and natural, just like a professional broadcaster reading. Moreover, it natively supports up to 4 characters to have a conversation simultaneously, with smooth emotion transitions. The built - in emotion perception module can automatically recognize emotions without manual annotation, and it's ready to use as soon as you get it 👍.

    I tried it myself. I used it on HuggingFace to read the first chapter of "The Three - Body Problem". There was no voice break, and the effect was excellent. Its English performance is close to the commercial level, and it's also very good in Chinese. Although there is still room for improvement in the handling of some polyphonic characters and neutral tones, the official will release a fine - tuned version. With its lightweight design, it can run at real - time speed on an ordinary laptop and can already be integrated into many tools.
    Currently, this model has been completely open - sourced and supports commercial use. There are also many interesting demos in the community. Guys, don't miss it. Hurry up and give it a try 👇!

  • In the Global Unicorn 500 in 2025, China and the United States shine brightly!

    Dear friends, the Global Unicorn 500 in 2025 has been revealed! As soon as I heard the news, I rushed to share it with you all!

    This conference was held in Laoshan District, Qingdao, Shandong Province. The released "Report on the Global Unicorn 500 in 2025" is of great value. In 2025, the total valuation of global unicorn companies reached 39.14 trillion yuan, an increase of 30.71% compared with last year. This figure is astonishing, even exceeding Germany's GDP 💯. However, the market environment is a bit tough. Only 12 unicorns successfully went public, but merger and acquisition activities increased significantly.

    In terms of enterprise distribution, China and the United States are absolute "kings", accounting for 74.8% of the number of enterprises and 86.8% of the total valuation. The United States has strong innovation capabilities in fields such as artificial intelligence, while China has an excellent performance in advanced manufacturing and automotive technology. The number of enterprises in the advanced manufacturing field in China is more than six times that of the United States, and the total valuation is nearly 2 trillion yuan higher 👏.

    In terms of urban distribution, the head - clustering effect is obvious. The top ten cities have gathered more than half of the unicorn enterprises. This also shows the important position of these cities in the global innovation ecosystem.

    I have to say that these unicorn enterprises are really amazing! Let's all feel the power of this innovation together 🤩.

    #The Global Unicorn 500 in 2025 #Chinese and American Enterprises #Unicorn Enterprises #Advanced Manufacturing #Innovation Ecosystem

  • The Ceiling of Open - source AI! The Official Version of DeepSeek V3.2 is Here. Does Its Reasoning Ability Approach GPT - 5?

    Dear friends! DeepSeek has made another big move! 🎉Today, two official - version models, DeepSeek - V3.2 and V3.2 - Speciale, are released simultaneously. This time, it has pushed the reasoning ability to a new height and also solved the major pain point of Agent tool - calling!

    🌟 Explosive Reasoning Ability

    The ordinary version of V3.2 balances reasoning and efficiency. In public tests, it reaches the level of GPT - 5, only slightly lower than Gemini - 3.0 - Pro. The Speciale version is even more powerful! 🐂It is an enhanced version for long - term thinking plus the ability to prove mathematical theorems. It has directly won four gold medals in IMO/CMO/ICPC/IOI, and its ICPC result is even better than the second - ranked human!

    🤖 Upgraded Agent Ability

    This is DeepSeek's first model with "thinking + tool - calling"! It proposes a large - scale Agent training data synthesis method, supports tool - calling in the thinking mode, and has full generalization ability. It has won the first place among open - source models in the agent evaluation, greatly narrowing the gap with closed - source models.

    🔧 Two Versions for Your Choice

    • The official version of V3.2: Suitable for daily Q&A / general Agent tasks. The official website / APP/API have been updated simultaneously.
    • V3.2 - Speciale: The extreme reasoning version, only for research use. The API is free of charge until December 15, 2025 (you can access it by setting a specific address for base_url).

    💻 Open - source and Eco - friendly

    The model has been open - sourced on HuggingFace and ModelScope. The thinking mode supports Claude Code, and the API has also optimized the tool - calling process (you can continue thinking by sending back the thought chain).

    Are there any friends who have used DeepSeek? Come and share your experience in the comment section below 👇

    Keyword Tags

    #DeepSeekV3.2 #Open - source AI #The Ceiling of Reasoning Ability #Agent Tool - calling #AI Model Update

    #Cutting - edge Technology #A Substitute for GPT5 #ICPC Gold - medal AI #Upgrade of Programming Assistant

  • Claude Opus4.5 Released, Competing with GPT - 5.1!

    Dear friends, there's big news in the AI community today! 🔥Claude Opus4.5 might be officially launched today!

    Previously, a new model entry with the code name "Claude Kayak" briefly appeared on the AI benchmark platform Epoch AI, marked with today's release date. Although it was quickly removed, it still attracted high attention from the global AI community. 🤩The industry generally believes that "Claude Kayak" is the flagship model Claude Opus4.5 that Anthropic is about to launch.

    As a super - strong version of the Claude4 series, Opus4.5 is expected to significantly improve in complex reasoning, multi - step agent tasks, and code - generation capabilities. It is hoped to break the 80% score in authoritative evaluations, directly competing with OpenAI GPT - 5.1 and Google Gemini3.0Pro. 👏

    Since the release of Opus4.1 in August this year, Anthropic has successively launched Sonnet4.5 and Haiku4.5. If Opus4.5 makes its debut as scheduled this time, the entire Claude4 series will be updated, and its position in the fields of multimodality and enterprise - level AI will be more solid. 👍

    Now, developers are not only looking forward to the new model bringing stronger agent - coordination capabilities and longer context - handling capabilities but also worried that the high computing power requirements will make it "in limited supply" like the Opus series. Let's all wait for the official news. If it's really released, this will definitely be a major event in the AI competition at the end of 2025!

    #ClaudeOpus4.5 #AI Release #GPT - 5.1 #GeminiPro #AI Competition

  • ✨Google makes a big move! The Antigravity AI IDE empowered by Gemini 3 is incredibly appealing!

    Dear friends, there's a big surprise in the AI development community! 🎉 Not only has Google released the new - generation flagship large - model Gemini3, but it has also launched a brand - new AI - native integrated development environment, Google Antigravity. This is truly set to "revolutionize" the development circle!

    This so - called "anti - gravity" agent - based development platform directly upgrades AI from an ordinary code assistant to an extremely powerful "active partner" 🤝. It perfectly addresses the pain points of competitors like Cursor and Claude, liberating developers from tedious low - level coding! Now, Antigravity has opened for public preview, supporting Windows, macOS, and Linux systems. What's more, it's completely free 🆓, and the quota for Gemini3Pro is quite generous. Who wouldn't love this!

    🌟 Its awesomeness doesn't stop there!

    • Autonomous and parallel developmentAutonomous and Parallel Development: The "agent - first" design concept is amazing! As long as developers provide high - level task descriptions, such as "Build a flight - query web app", the intelligent agents driven by Gemini3 will automatically formulate plans, list conditions, and give architectural suggestions. Multiple intelligent agents can run asynchronously in the background simultaneously, acting like a super - intelligent "task control center" to allocate resources. They can directly operate code editors, terminal command lines, and browsers to achieve autonomous development. It truly realizes "humans command, AI does the work", allowing us to focus more on creativity.
    • Verifiable code qualityThe trust issue of AI coding tools has always been a headache. However, Antigravity has solved it with its unique "Artifacts" mechanism! Every time an agent completes a step, it will generate a task list, an implementation plan, and also provide screenshots before and after bug fixing, functional demonstration videos, etc. All the outputs are clear at a glance, allowing us to verify whether the task has been completed with just one look. This is extremely user - friendly for enterprise - level development 👍.
    • Revolutionary collaborative feedbackRevolutionary Collaborative Feedback: Antigravity has taken the feedback experience to a new level! Developers can directly click, mark, and leave messages on the web screenshots generated by AI. They can also make precise comments on code diffs and browser operation screen recordings. Moreover, feedback doesn't affect the intelligent agent's process, and it supports collaborative commenting similar to Google Docs. "Human - machine collaboration" has become as smooth as modifying design drafts.

    Antigravity is not only deeply integrated with Gemini3Pro but also supports Claude Sonnet4.5 and OpenAI open - source models. Its future ecological compatibility is surely extremely strong! Now, it can be downloaded and experienced on the official website (antigravity.google). The quota refresh cycle is very user - friendly, and ordinary developers basically won't run out of it.

    The AI IDE has officially entered the era of "multi - agent, verifiable, visual feedback". Cursor, Claude Dev, Windsurf, etc. are probably under great pressure 😜. I sincerely suggest that all front - end, full - stack, and AI engineers give it a try as soon as possible. Maybe this will be the most worthy development tool to switch to in 2025!

    📱 Download address:https://antigravity.google/download

    Have any of you used it, dear friends? Come and share your feelings in the comment section below 👇

  • Google Gemini 3 is Coming: The New AI Monarch Ascends the Throne!

    Guys, there's been a huge stir in the AI community recently 😱! Google Gemini 3 made a splash late at night, proclaiming the crowning of a new king in the AI realm ✨.

    Previously, people were spoilt for choice among AI models, with the differences in advantages between various models being rather marginal. But as soon as Gemini 3 Pro arrived, its performance was simply outstanding 💯. In tests representing the "pinnacle" of human intelligence, it outscored GPT - 5.1 and Claude Sonnet 4.5 by a large margin. In mathematics, it shows absolute dominance. When combined with code execution in AIME 2025, its accuracy rate reaches 100%, and in MathArena Apex, it leaves other large - scale models far behind. Moreover, its "visual intelligence" is truly remarkable. Its understanding ability of screenshots is twice that of the current advanced level 👏.

    Google also launched a "mini - bombshell", Google Antigravity. It's an agent - first development platform where developers can collaborate with multiple intelligent agents, skyrocketing their work efficiency 🚀. Additionally, Gemini 3 Pro is trained using Google TPU, with comprehensive data coverage. It has been integrated into Google Search, enabling it to instantly generate interactive charts or simulation tools when searching for complex concepts.

    Online practical tests have also yielded good results, with its direct - generation ability proving to be quite powerful. Guys, the AI era is unstoppable. Let's start paying close attention right away 🤩!

  • November Large - Language - Model Ranking List

    Official ranking

    Evaluate leading large - scale models according to the evaluation rules of OpenCompass and release the rankings.

  • Baidu's New AI Model ERNIE - 4.5 - VL is Amazing!

    Dear friends, there has been a major move in the AI field recently 🔥! Baidu has grandly released the new - generation multimodal AI model ERNIE - 4.5 - VL. In this era of rapid development of AI technology, it is really difficult to find an efficient and powerful AI model, which is the pain point of many developers and researchers 😭.

    However, this time Baidu's new model perfectly solves these problems 👏. It not only has powerful language - processing capabilities but also introduces the innovative function of "image thinking". With only 3B activation parameters, it has extremely high computational efficiency and flexibility, and can handle tasks quickly and efficiently. Moreover, this "image thinking" function is extremely powerful. It can perform operations such as image zooming and tool - calling for image search, greatly enriching the interactive experience between images and text.

    I think it will bring new possibilities to many fields such as intelligent search, online education, and e - commerce 💯. It's like equipping these fields with smart little wings, enabling them to fly higher and farther. Now this model is open - sourced, and developers and researchers can more conveniently explore the potential of multimodal AI. Dear friends, don't miss this great opportunity. Let's start researching together 👏!

    #Baidu AI Model #ERNIE - 4.5 - VL #Multimodal AI #Image Thinking #AI Technological Innovation

  • Google Gemini 3 Pro Preview: Superb with a Million-Level Window!

    Dear friends, there's been a major development in the AI world recently 🔥! The Gemini series of artificial intelligence models under Google has made significant progress, and the latest preview version "gemini - 3 - pro - preview - 11 - 2025" has appeared on the Vertex AI platform.

    Previously, many AI models struggled when dealing with long documents and complex tasks, which was really frustrating 😣. However, Gemini 3 Pro supports an extremely large context window of up to 1 million tokens, which is simply a savior 👍! It can handle 200,000 tokens at the standard level and directly extends to 1 million tokens at the advanced level. It has also been optimized in terms of input - output ratio and the proportion of image/video/audio processing.

    It is regarded as a major upgrade of Gemini 2.5, focusing on multimodal reasoning and agent - based intelligence. The training data covers up to August 2024 and encompasses a variety of input sources. Industry analysts say that it is of revolutionary significance in the field of enterprise - level applications, such as financial modeling and biotech simulation.

    According to reports from multiple tech media, Google may reveal more details in the middle to late November, and the full release may be postponed until December. Compared with its predecessors, it is expected to outperform GPT - 4o in benchmark tests and perform excellently in multimodal creative generation and code - writing tasks 👏.
    Although Google has not yet officially responded, Vertex AI is accelerating the iteration of the Gemini series. Let's all look forward to its official debut ✨!

  • ChatGPT's "New Rules" Are Here! Prohibition on Providing Medical, Legal and Financial Advice!

    Dear friends, OpenAI updated the usage policy of ChatGPT on October 29. This time, the model is clearly prohibited from providing professional medical, legal or financial advice!

    This is mainly done to avoid regulatory risks, reduce the hidden danger of misleading people, and redefine the application boundaries of AI in high-risk fields. ChatGPT can no longer do things like interpreting medical images, assisting in diagnosis, drafting or interpreting legal contracts, providing personalized investment strategies or tax planning. If users raise such demands, the system will uniformly reply to guide them to consult human experts. Moreover, this policy covers all ChatGPT models and API interfaces to ensure consistent implementation.

    Although professionals can still use it for general concept discussion or data organization, they cannot directly provide "fiduciary" advice to end users. This adjustment is driven by global regulation. The EU's Artificial Intelligence Act is about to take effect, which will conduct strict reviews on high-risk AI, and the US FDA requires clinical verification for diagnostic AI tools. By doing so, OpenAI can avoid being recognized as "software as a medical device" and also prevent potential lawsuits.

    Regarding this new rule, users' reactions are divided into two camps. Some individual users feel quite regretful because they have lost the "low-cost consultation" channel. After all, they had saved a lot of professional consultation fees by relying on AI before. However, most of the medical and legal circles support it. After all, the "pseudo-professional" output of AI is indeed likely to lead to misdiagnosis or disputes. Data shows that over 40% of ChatGPT queries are of the advice type, and medical and financial advice account for nearly 30%. This policy may lead to a short-term decline in traffic.

    It also has a significant impact on the industry. Google, Anthropic, etc. may also follow suit and impose restrictions. Vertical AI tools, such as certified legal/medical models, may become popular. Chinese companies like Baidu have already complied in advance. Under the situation of stricter domestic regulation, innovation has to be explored within the "sandbox" mechanism.

    OpenAI emphasizes that the goal is to "balance innovation and safety". This update continues the Model Spec framework, and it is said that there will be further iterations in February 2025. The transformation of AI from an "omnipotent assistant" to a "limited assistant" seems to have become an industry consensus. In the future, technological breakthroughs and ethical constraints will develop together. I wonder what new balance the GPT-5 era will bring?

    What do you think of this new rule of ChatGPT? Come and share your thoughts in the comment section!

    Topic tags and keywords: #OpenAI #ChatGPT #UsagePolicyUpdate #MedicalAdvice #LegalAdvice #FinancialAdvice #AISupervision #IndustryImpact