<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[vLLM 部署实战：高吞吐量 LLM 推理服务]]></title><description><![CDATA[<blockquote>
<p dir="auto">来源：AI 订阅指南</p>
</blockquote>
<p dir="auto">vLLM 是目前最高效的开源 LLM 推理框架。</p>
<p dir="auto"><strong>核心优势：</strong></p>
<ul>
<li>PagedAttention 技术，吞吐量提升 2-4 倍</li>
<li>支持连续批处理</li>
<li>兼容 OpenAI API 格式</li>
<li>支持 Tensor Parallelism</li>
</ul>
<p dir="auto"><strong>部署示例：</strong></p>
<pre><code class="language-bash">python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-4-70B \
  --tensor-parallel-size 4 \
  --port 8000
</code></pre>
<p dir="auto"><strong>性能对比（A100×4）：</strong></p>
<ul>
<li>HuggingFace Transformers：~200 tokens/s</li>
<li>vLLM：~2000 tokens/s</li>
</ul>
<hr />
<p dir="auto"><em>更多本地部署教程请关注 AI 订阅指南。</em></p>
]]></description><link>https://aspxai.com/topic/206/vllm-部署实战-高吞吐量-llm-推理服务</link><generator>RSS for Node</generator><lastBuildDate>Mon, 22 Jun 2026 07:37:07 GMT</lastBuildDate><atom:link href="https://aspxai.com/topic/206.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 22 Jun 2026 02:58:37 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to vLLM 部署实战：高吞吐量 LLM 推理服务 on Mon, 22 Jun 2026 03:03:45 GMT]]></title><description><![CDATA[<p dir="auto">显存只有 8G 能跑什么模型？主要用于代码辅助。</p>
]]></description><link>https://aspxai.com/post/1220</link><guid isPermaLink="true">https://aspxai.com/post/1220</guid><dc:creator><![CDATA[眼底客栈]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:45 GMT</pubDate></item><item><title><![CDATA[Reply to vLLM 部署实战：高吞吐量 LLM 推理服务 on Mon, 22 Jun 2026 03:03:45 GMT]]></title><description><![CDATA[<p dir="auto">知识库更新频率也是个问题，我们做了增量索引方案。</p>
]]></description><link>https://aspxai.com/post/1219</link><guid isPermaLink="true">https://aspxai.com/post/1219</guid><dc:creator><![CDATA[观雪驻足]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:45 GMT</pubDate></item><item><title><![CDATA[Reply to vLLM 部署实战：高吞吐量 LLM 推理服务 on Mon, 22 Jun 2026 03:03:45 GMT]]></title><description><![CDATA[<p dir="auto">微调一个 7B 模型大概多少钱？有没有便宜的方案？</p>
]]></description><link>https://aspxai.com/post/1218</link><guid isPermaLink="true">https://aspxai.com/post/1218</guid><dc:creator><![CDATA[七街酒]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:45 GMT</pubDate></item></channel></rss>