<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[GGUF 量化格式详解：让大模型在消费级硬件上运行]]></title><description><![CDATA[<blockquote>
<p dir="auto">来源：AI 订阅指南</p>
</blockquote>
<p dir="auto">GGUF 是 llama.cpp 使用的模型量化格式，让大模型能在普通电脑上运行。</p>
<p dir="auto"><strong>量化级别对比：</strong></p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>级别</th>
<th>位宽</th>
<th>70B 模型大小</th>
<th>质量损失</th>
</tr>
</thead>
<tbody>
<tr>
<td>Q8_0</td>
<td>8-bit</td>
<td>~70GB</td>
<td>极小</td>
</tr>
<tr>
<td>Q5_K_M</td>
<td>5-bit</td>
<td>~48GB</td>
<td>很小</td>
</tr>
<tr>
<td>Q4_K_M</td>
<td>4-bit</td>
<td>~40GB</td>
<td>小</td>
</tr>
<tr>
<td>Q3_K_S</td>
<td>3-bit</td>
<td>~32GB</td>
<td>明显</td>
</tr>
</tbody>
</table>
<p dir="auto"><strong>推荐</strong>：日常使用 Q4_K_M，质量与大小平衡最好。</p>
<p dir="auto"><strong>转换工具</strong>：使用 <code>llama.cpp/quantize</code> 命令行工具。</p>
<hr />
<p dir="auto"><em>更多本地部署教程请关注 AI 订阅指南。</em></p>
]]></description><link>https://aspxai.com/topic/208/gguf-量化格式详解-让大模型在消费级硬件上运行</link><generator>RSS for Node</generator><lastBuildDate>Mon, 22 Jun 2026 07:52:23 GMT</lastBuildDate><atom:link href="https://aspxai.com/topic/208.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 22 Jun 2026 02:58:37 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to GGUF 量化格式详解：让大模型在消费级硬件上运行 on Mon, 22 Jun 2026 03:03:44 GMT]]></title><description><![CDATA[<p dir="auto">显存只有 8G 能跑什么模型？主要用于代码辅助。</p>
]]></description><link>https://aspxai.com/post/1212</link><guid isPermaLink="true">https://aspxai.com/post/1212</guid><dc:creator><![CDATA[微醺安之]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:44 GMT</pubDate></item><item><title><![CDATA[Reply to GGUF 量化格式详解：让大模型在消费级硬件上运行 on Mon, 22 Jun 2026 03:03:44 GMT]]></title><description><![CDATA[<p dir="auto">知识库更新频率也是个问题，我们做了增量索引方案。</p>
]]></description><link>https://aspxai.com/post/1211</link><guid isPermaLink="true">https://aspxai.com/post/1211</guid><dc:creator><![CDATA[dev]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:44 GMT</pubDate></item><item><title><![CDATA[Reply to GGUF 量化格式详解：让大模型在消费级硬件上运行 on Mon, 22 Jun 2026 03:03:44 GMT]]></title><description><![CDATA[<p dir="auto">知识库更新频率也是个问题，我们做了增量索引方案。</p>
]]></description><link>https://aspxai.com/post/1210</link><guid isPermaLink="true">https://aspxai.com/post/1210</guid><dc:creator><![CDATA[neoncat]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:44 GMT</pubDate></item><item><title><![CDATA[Reply to GGUF 量化格式详解：让大模型在消费级硬件上运行 on Mon, 22 Jun 2026 03:03:44 GMT]]></title><description><![CDATA[<p dir="auto">我们用过 pgvector + LangChain，效果不错但查询延迟有点高。</p>
]]></description><link>https://aspxai.com/post/1209</link><guid isPermaLink="true">https://aspxai.com/post/1209</guid><dc:creator><![CDATA[stormhawk7]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:44 GMT</pubDate></item><item><title><![CDATA[Reply to GGUF 量化格式详解：让大模型在消费级硬件上运行 on Mon, 22 Jun 2026 03:03:44 GMT]]></title><description><![CDATA[<p dir="auto">我们用过 pgvector + LangChain，效果不错但查询延迟有点高。</p>
]]></description><link>https://aspxai.com/post/1208</link><guid isPermaLink="true">https://aspxai.com/post/1208</guid><dc:creator><![CDATA[落落]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:44 GMT</pubDate></item><item><title><![CDATA[Reply to GGUF 量化格式详解：让大模型在消费级硬件上运行 on Mon, 22 Jun 2026 03:03:44 GMT]]></title><description><![CDATA[<p dir="auto">知识库更新频率也是个问题，我们做了增量索引方案。</p>
]]></description><link>https://aspxai.com/post/1207</link><guid isPermaLink="true">https://aspxai.com/post/1207</guid><dc:creator><![CDATA[星尘知命]]></dc:creator><pubDate>Mon, 22 Jun 2026 03:03:44 GMT</pubDate></item></channel></rss>