﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>IT博客-魔のkyo的工作室-随笔分类-机器学习</title><link>http://www.cnitblog.com/luckydmz/category/8970.html</link><description /><language>zh-cn</language><lastBuildDate>Tue, 22 Oct 2019 19:22:19 GMT</lastBuildDate><pubDate>Tue, 22 Oct 2019 19:22:19 GMT</pubDate><ttl>60</ttl><item><title>sklearn常用无监督学习算法目录</title><link>http://www.cnitblog.com/luckydmz/archive/2019/10/22/91926.html</link><dc:creator>魔のkyo</dc:creator><author>魔のkyo</author><pubDate>Mon, 21 Oct 2019 18:19:00 GMT</pubDate><guid>http://www.cnitblog.com/luckydmz/archive/2019/10/22/91926.html</guid><wfw:comment>http://www.cnitblog.com/luckydmz/comments/91926.html</wfw:comment><comments>http://www.cnitblog.com/luckydmz/archive/2019/10/22/91926.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cnitblog.com/luckydmz/comments/commentRss/91926.html</wfw:commentRss><trackback:ping>http://www.cnitblog.com/luckydmz/services/trackbacks/91926.html</trackback:ping><description><![CDATA[<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">无监督学习算法指的是算法只有输入数据（Features），不需要用到(或根本不知道)输入的对应输出(Target)，从中提取一些有用的知识。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">sklearn中无监督学习算法主要分为以下类型：预处理、分解和聚类。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">scikit-learn中的算法实现</span></div><div style="overflow: auto;"><table style="border-collapse: collapse; table-layout: fixed; white-space: nowrap; width: 0px;"><colgroup><col style="width: 103px;"><col style="width: 102px;"><col style="width: 165px;"><col style="width: 233px;"></colgroup><tbody><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-0-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">算法中文名</span></div></td><td data-cell-id="3049-1571671879384-cell-0-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">所属模块</span></div></td><td data-cell-id="3049-1571671879384-cell-0-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">类名</span></div></td><td data-cell-id="3049-1571671879384-cell-0-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">主要参数</span></div></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-1-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">范围缩放</span></div></td><td data-cell-id="3049-1571671879384-cell-1-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">preprocessing</span></div></td><td data-cell-id="3049-1571671879384-cell-1-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">MinMaxScaler</span></div></td><td data-cell-id="3049-1571671879384-cell-1-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">feature_range=(0,1)</span></div></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-2-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">标准化缩放</span></div></td><td data-cell-id="3049-1571671879384-cell-2-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">preprocessing</span></div></td><td data-cell-id="3049-1571671879384-cell-2-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">StandardScaler</span></div></td><td data-cell-id="3049-1571671879384-cell-2-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br /></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-3-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">One-Hot编码</span></div></td><td data-cell-id="3049-1571671879384-cell-3-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">preprocessing</span></div></td><td data-cell-id="3049-1571671879384-cell-3-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">OneHotEncoder</span></div></td><td data-cell-id="3049-1571671879384-cell-3-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">categorical_features = 'all'</span></div></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-4-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">多项式特征</span></div></td><td data-cell-id="3049-1571671879384-cell-4-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">preprocessing</span></div></td><td data-cell-id="3049-1571671879384-cell-4-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">PolynomialFeatures</span></div></td><td data-cell-id="3049-1571671879384-cell-4-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">degree=2</span></div></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-5-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">主成分分析</span></div></td><td data-cell-id="3049-1571671879384-cell-5-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">decomposition</span></div></td><td data-cell-id="3049-1571671879384-cell-5-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">PCA</span></div></td><td data-cell-id="3049-1571671879384-cell-5-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">n_components, whiten=False</span></div></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-6-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">非负矩阵分解</span></div></td><td data-cell-id="3049-1571671879384-cell-6-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">decomposition</span></div></td><td data-cell-id="3049-1571671879384-cell-6-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">NMF</span></div></td><td data-cell-id="3049-1571671879384-cell-6-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">n_components</span></div></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-7-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">K聚类</span></div></td><td data-cell-id="3049-1571671879384-cell-7-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">cluster</span></div></td><td data-cell-id="3049-1571671879384-cell-7-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">KMeans</span></div></td><td data-cell-id="3049-1571671879384-cell-7-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">n_clusters</span></div></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-8-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">凝聚聚类</span></div></td><td data-cell-id="3049-1571671879384-cell-8-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">cluster</span></div></td><td data-cell-id="3049-1571671879384-cell-8-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">AgglomerativeClustering</span></div></td><td data-cell-id="3049-1571671879384-cell-8-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">n_clusters, linkage=ward</span></div></td></tr><tr style="height: 40px;"><td data-cell-id="3049-1571671879384-cell-9-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">DBSCAN</span></div></td><td data-cell-id="3049-1571671879384-cell-9-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">cluster</span></div></td><td data-cell-id="3049-1571671879384-cell-9-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">DBSCAN</span></div></td><td data-cell-id="3049-1571671879384-cell-9-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><div class="table-cell-line"><span style="font-size: 14px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">min_samples, eps</span></div></td></tr></tbody></table></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">可以看到上述算法所属模块即对应了预处理、分解和聚类。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">预处理就是用于对监督学习算法的输入数据做前期处理，输入和输出都是一组Features。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">事实上分解和聚类也可以作为监督学习算法的前期处理，不过他们也可以提供一些额外的功能。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">下面对这些算法做逐一简要说明：</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">MinMaxScaler</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;">根据最大最小值缩放和平移特征，默认参数使得每个特征都在0~1之间。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">StandardScaler</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff;">根据均值和方差缩放和平移特征，默认参数使得每个特征的均值是0，方差是1。</span></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">OneHotEncoder</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff;">对分类变量（特征是离散的枚举值）进行编码，把一个具有N个枚举值的特征用N个0,1值的特征表示。</span></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">PolynomialFeatures</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff;">制造原特征的交互特征和多项式特征，例如(x1,x2)可以生成出(1, x1, x2, x1^2, x2^2, x1*x2)，通过degree可以控制生成特征的最高次。可以让线性模型学习出对原特征来说非线性的结果。</span></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">PCA</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;">主成分分析是找到原特征的一种新的表示，从线性代数的角度讲是找到一组正交基，然后把原特征当成向量，算出它们在新基下的表示，也就得到一组新表示下特征。这组正交基的取法是先找到原特征离散度最大的轴向，作为正交基的第一个轴，然后第二个轴是在与第一个轴&#8220;垂直&#8221;的超平面上，继续选取离散度最大的轴向，第三个轴要在与前两个轴都&#8220;垂直&#8221;的超平面上找离散度最大的轴向，以此类推。因为变换得到的新特征再前几个轴上离散程度较高后面依次降低，我们有理由期望新特征的前几个分量对Target的影响更大（虽然不一定），因此我们可以用PCA来对Feature降维（即丢弃掉后面离散程度最小的一些轴向上的坐标）。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">NMF</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff;">非负矩阵分解原理类似PCA，不过它的基并不正交，NMF要求原特征均为非负，其基的每个分量和得到的新特征也均为非负，对于有多个独立源叠加而成的数据，比如多人说话的音轨或包含多种乐器的音乐，NMF能识别出组成合成数据的原始分量。NMF也可以用于降维。</span></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">KMeans</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff;">K均值聚类，所谓聚类就是把数据集按照一定的规则进行分组，使得同组的数据相似，不同组的数据相异，这些组在聚类算法中称为簇，K均值聚类要事先告知簇个数，算法的核心就是不停修正每个簇的</span><span style="color: #393939; background-color: #ffffff; font-weight: bold;">簇中心</span><span style="color: #393939; background-color: #ffffff;">，先随机选K个点作为簇中心，交替进行以下两个步骤：将每个数据点分配给最近的簇中心，然后将每个簇中心设置为所分配的所有数据点的平均值，重复以上步骤直至簇的分配不再变化。前面说</span>聚类也可以作为监督学习算法的前期处理，这里<span style="color: #393939; background-color: #ffffff;">如果我们用簇中心来代表簇里的数据点，那么每个点都可以用一个单一分量来表示（簇的编号），这称为矢量量化。（不过这种分量一般不具有连续意义，似乎还要再进行OneHot编码）</span></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">AgglomerativeClustering</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff;">凝聚聚类，先把每个数据点视作一个簇，然后按照一定的规则逐步合并，凝聚聚类算法可以生成可视的</span><span style="color: #393939; background-color: #ffffff; font-weight: bold;">树状图</span><span style="color: #393939; background-color: #ffffff;">来观察合并过程。凝聚聚类也需要提前告知簇个数。</span></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">DBSCAN</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff;">DBSCAN聚类，先随机选取一个没标记过的数据点标记一个新簇，以eps为距离做DFS让遍历到的数据点加入簇，如果最终遍历到的点少于min_samples则视为噪声，然后在再随机选取数据点再遍历再标记，进行直到所有点被标记到一个簇里或标记为噪声，DBSCAN可以生成具有复杂形状的簇，噪声也可以用来做</span><span style="color: #393939; background-color: #ffffff; font-weight: bold;">异常值检测</span><span style="color: #393939; background-color: #ffffff;">。</span></div><div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"></div><img src ="http://www.cnitblog.com/luckydmz/aggbug/91926.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cnitblog.com/luckydmz/" target="_blank">魔のkyo</a> 2019-10-22 02:19 <a href="http://www.cnitblog.com/luckydmz/archive/2019/10/22/91926.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>sklearn常用监督学习算法目录</title><link>http://www.cnitblog.com/luckydmz/archive/2019/10/16/91906.html</link><dc:creator>魔のkyo</dc:creator><author>魔のkyo</author><pubDate>Wed, 16 Oct 2019 15:44:00 GMT</pubDate><guid>http://www.cnitblog.com/luckydmz/archive/2019/10/16/91906.html</guid><wfw:comment>http://www.cnitblog.com/luckydmz/comments/91906.html</wfw:comment><comments>http://www.cnitblog.com/luckydmz/archive/2019/10/16/91906.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cnitblog.com/luckydmz/comments/commentRss/91906.html</wfw:commentRss><trackback:ping>http://www.cnitblog.com/luckydmz/services/trackbacks/91906.html</trackback:ping><description><![CDATA[<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #df402a; font-weight: bold; font-style: italic;">优</span><span style="color: #4d80bf; font-weight: bold; font-style: italic;">缺</span><span style="font-weight: bold; font-style: italic;">点和</span><span style="color: #77c94b; font-weight: bold; font-style: italic;">注意项</span></div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">K近邻</span></div>
<div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;">适用于小型数据集，<span style="color: #df402a;">基准模型，容易解释</span>。<span style="color: #4d80bf;">不适用于高维稀疏数据</span>，<span style="color: #4d80bf;">不能外推</span>（超出训练数据集的范围进行预测）。</div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">线性模型（最小二乘法、岭回归、Lasso回归、弹性网络、</span><span style="color: #393939; background-color: #ffffff; font-weight: bold;">逻辑回归、线性支持向量机</span><span style="font-weight: bold;">）</span></div>
<div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;">非常可靠的首选算法，<span style="color: #df402a;">适用于非常大的数据集</span>，也<span style="color: #df402a;">适用于高维数据</span>，<span style="color: #df402a;">可以外推</span>。<span style="color: #4d80bf;">在低维空间中泛化性能可能很差</span>（这还要看具体问题本身的特点，还可以通过扩展特征来增加线性模型的可用性）。</div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">其中<span style="color: #77c94b;">最小二乘法、岭回归、Lasso回归、弹性网络为回归器，不需要数据缩放</span>。<span style="color: #77c94b; background-color: #ffffff;">逻辑回归、线性支持向量机为分类器</span><span style="color: #393939; background-color: #ffffff;">，逻辑回归如果不进行数据缩放会导致收敛较慢需要增加迭代次数，线性支持向量机</span><span style="color: #77c94b; background-color: #ffffff;">需要进行数据缩放</span><span style="color: #393939; background-color: #ffffff;">。</span></div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">朴素贝叶斯（</span><span style="color: #393939; background-color: #ffffff; font-weight: bold;">高斯朴素贝叶斯、伯努利</span><span style="color: #393939; font-weight: bold;">朴素贝叶斯、</span><span style="color: #393939; background-color: #ffffff; font-weight: bold;">多项式</span><span style="color: #393939; font-weight: bold;">朴素贝叶斯</span><span style="font-weight: bold;">）</span></div>
<div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #77c94b;">只适用于分类问题</span>。适用于非常大的数据集和高维数据，比线性模型<span style="color: #df402a;">速度快</span>，<span style="color: #4d80bf;">精度低</span>于线性模型。</div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">决策树</span></div>
<div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #df402a;">速度很快</span>，不需要数据缩放，<span style="color: #df402a;">可以可视化，很容易理解</span>。<span style="color: #4d80bf;">不适用于高维稀疏数据</span>，<span style="color: #4d80bf;">不能外推</span>。</div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">随机森林</span></div>
<div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;">几乎总是比单棵决策树的表现要好，<span style="color: #df402a;">鲁棒性很好(可以容忍训练集中有一些错误的数据)，通常不需要反复调节参数就可以给出很好的结果</span>，不需要数据缩放，<span style="color: #4d80bf;">不适用于高维稀疏数据</span>，<span style="color: #4d80bf;">不能外推。</span></div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">梯度提升机</span></div>
<div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;">是监督学习中最强大也最常用的模型之一。精度通常比随机森林略高，<span style="color: #df402a;">与随机森林相比，训练速度更慢，但预测速度更快，需要的内存更少</span>，比随机森林<span style="color: #77c94b;">需要更多的参数调节</span>，<span style="color: #4d80bf;">不适用于高维稀疏数据</span>，<span style="color: #4d80bf;">不能外推。</span></div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939; background-color: #ffffff; font-weight: bold;">核</span><span style="font-weight: bold;">支持向量机</span></div>
<div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #df402a;">是非常强大的模型，允许决策边界很复杂，在低维和高维数据集上的表现都很好。对于特征含义相似的中等大小（几千~几万这样的量级）的数据集很强大。</span><span style="color: #77c94b;">需要数据缩放</span>，<span style="color: #77c94b;">对参数敏感</span>，<span style="color: #df402a;">可以外推。</span></div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">神经网络（</span><span style="color: #393939; background-color: #ffffff; font-weight: bold;">多层感知机</span><span style="font-weight: bold;">）</span></div>
<div style="white-space: pre-wrap; text-indent: 28px; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #df402a;">可以构建非常复杂的模型</span>，特别是对于大型数据集而言。<span style="color: #77c94b;">对于数据缩放敏感，对参数选取敏感</span>。<span style="color: #4d80bf;">大型网络需要很长的训练时间</span>。</div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"></div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="font-weight: bold;">scikit-learn中的算法实现</span></div>
<div style="overflow: auto;">
<table style="border-collapse: collapse; table-layout: fixed; white-space: nowrap; width: 0px;">
     <colgroup><col style="width: 103px;"><col style="width: 86px;"><col style="width: 151px;"><col style="width: 157px;"><col style="width: 119px;"></colgroup>
     <tbody>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-0-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">算法名</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-0-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">所属模块</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-0-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">分类器</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-0-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">回归器</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-0-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">主要参数(-号表示越小越模型复杂)</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-1-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">K近邻（KNN）</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-1-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">neighbors</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-1-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">KNeighborsClassifier</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-1-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">KNeighborsRegressor</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-1-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">-n_neighbors=5</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-2-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">最小二乘法</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-2-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">linear_model</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-2-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
             <td data-cell-id="7756-1570626445453-cell-2-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">LinearRegression</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-2-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-3-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">岭回归</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-3-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">linear_model</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-3-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
             <td data-cell-id="7756-1570626445453-cell-3-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">Ridge</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-3-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">-alpha=1</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-4-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">Lasso回归</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-4-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">linear_model</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-4-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
             <td data-cell-id="7756-1570626445453-cell-4-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">Lasso</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-4-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">-alpha=1</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-5-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">弹性网络</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-5-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">linear_model</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-5-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
             <td data-cell-id="7756-1570626445453-cell-5-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">ElasticNet</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-5-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">-alpha=1, l1_ratio=0.5</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-6-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">逻辑回归</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-6-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">linear_model</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-6-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">LogisticRegression</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-6-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
             <td data-cell-id="7756-1570626445453-cell-6-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">+C=1.0</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-7-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">线性支持向量机</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-7-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">svm</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-7-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">LinearSVC</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-7-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">LinearSVR</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-7-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">+C=1.0</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-8-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">高斯朴素贝叶斯</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-8-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">naive_bayes</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-8-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">GaussianNB</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-8-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
             <td data-cell-id="7756-1570626445453-cell-8-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-9-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">伯努利</span><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">朴素贝叶斯</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-9-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">naive_bayes</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-9-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">BernoulliNB</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-9-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
             <td data-cell-id="7756-1570626445453-cell-9-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">-alpha=1.0</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-10-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">多项式</span><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">朴素贝叶斯</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-10-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">naive_bayes</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-10-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">MultinomialNB</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-10-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;"><br />
             </td>
             <td data-cell-id="7756-1570626445453-cell-10-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">-alpha=1.0</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-11-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">决策树</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-11-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">tree</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-11-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">DecisionTreeClassifier</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-11-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">DecisionTreeRegressor</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-11-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">+max_depth</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-12-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">随机森林</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-12-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">ensemble</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-12-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">RandomForestClassifier</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-12-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">RandomForestRegressor</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-12-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">+n_estimators</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-13-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">梯度提升机</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-13-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">ensemble</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-13-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">GradientBoostingClassifier</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-13-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">GradientBoostingRegressor</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-13-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">+n_estimators, +learning_rate</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-14-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">核支持向量机</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-14-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">svm</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-14-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">SVC</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-14-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">SVR</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-14-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">kernel='rbf', +C, +gamma</span></div>
             </td>
         </tr>
         <tr style="height: 40px;">
             <td data-cell-id="7756-1570626445453-cell-15-0" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">多层感知机</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-15-1" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #393939; background-color: #ffffff; font-weight: normal; font-style: normal; text-decoration: none;">neural_network</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-15-2" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">MLPClassifier</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-15-3" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">MLPRegressor</span></div>
             </td>
             <td data-cell-id="7756-1570626445453-cell-15-4" style="font-size: 14px; color: #393939; border: 1px solid #a7a7a7; overflow: hidden; word-wrap: break-word; white-space: pre-wrap;">
             <div class="table-cell-line"><span style="font-size: 12px; font-family: Microsoft YaHei, STXihei; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;">solver='lbfgs', -alpha, hidden_layer_sizes</span></div>
             </td>
         </tr>
     </tbody>
</table>
</div>
<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"></div><img src ="http://www.cnitblog.com/luckydmz/aggbug/91906.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cnitblog.com/luckydmz/" target="_blank">魔のkyo</a> 2019-10-16 23:44 <a href="http://www.cnitblog.com/luckydmz/archive/2019/10/16/91906.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>对scikit-learn库几个线性回归模型的实验总结</title><link>http://www.cnitblog.com/luckydmz/archive/2019/10/06/91864.html</link><dc:creator>魔のkyo</dc:creator><author>魔のkyo</author><pubDate>Sun, 06 Oct 2019 09:25:00 GMT</pubDate><guid>http://www.cnitblog.com/luckydmz/archive/2019/10/06/91864.html</guid><wfw:comment>http://www.cnitblog.com/luckydmz/comments/91864.html</wfw:comment><comments>http://www.cnitblog.com/luckydmz/archive/2019/10/06/91864.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cnitblog.com/luckydmz/comments/commentRss/91864.html</wfw:commentRss><trackback:ping>http://www.cnitblog.com/luckydmz/services/trackbacks/91864.html</trackback:ping><description><![CDATA[<div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><strong>scikit-learn有如下几个常用的线性回归模型</strong></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><strong>普通最小二乘法 linear_model.LinearRegression</strong></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><strong>岭回归 linear_model.Ridge</strong></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><strong>Lasso回归 linear_model.Lasson</strong></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><strong>弹性网络 linear_model.ElasticNet</strong></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">下面分别对这些模型进行讨论，</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">普通最小二乘法的思想就是让均方误差最小化。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">假设X是样本，f是预测值，y是实际值，最小二乘就是找一对 w,b，满足 f(i) = w * x(i)转置 + b 并且，均方误差 = (y-f).dot(y-f) / len(X) 最小化</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">普通最小二乘法没什么可调整的参数，而且为了让均方误差最小化，w的一些分量可能非常大，这意味着这一特征对预测结果起决定性作用，而这时候训练出来的其他特征的权重可能并不正确，比如应该做正贡献的反而w对应分量是负的。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">为了解决这个问题，我们可以通过施加约束，让w的分量都尽量小，这种约束叫正则化。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">上面提到普通最小二乘法让使用误差均方来评价模型参数的好坏，这个用来估算模型好坏程度的函数叫<span style="font-weight: bold;">损失函数。</span></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">所谓施加约束，其实就是在损失函数上做手脚。要限制w的分量也有不同的方法：</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><strong>L1正则化</strong>，对应的模型就是Lasso回归，我简单的理解就是在Sigma|wi|上做手脚去限制</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><strong>L2正则化</strong>，对应的模型就是岭回归（Ridge），我简单的理解就是在w.dot(w)上手脚去限制</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">更详细的说明可以参考 <a href="https://www.cnblogs.com/yongfuxue/articles/9971749.html"><span style="color: #003884; text-decoration: underline;">https://www.cnblogs.com/yongfuxue/articles/9971749.html</span></a></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><span style="color: #393939;">至于</span>ElasticNet，是结合了L1正则化和L2正则化</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">其构造函数有两个主要参数ElasticNet(alpha=1, l1_ratio=0.5)</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">第一个参数alpha：</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">其中alpha越大，约束越大，模型越简单，越容易欠拟合。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">反之alpha越小，约束越小（alpha=0时退化成普通最小二乘），模型越复杂，越容易过拟合。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">通常要在对数尺度上对alpha进行搜索，已确定效果最好的alpha。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">第二个参数l1_ratio：</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">l1_ratio是L1和L2的比例，官方文档上写到 a*L1 + b * L2，alpha = a+b，l1_ratio = a/(a+b)</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">所以通过控制l1_ratio的比例[0,1]，其实可以使模型退化成Lasso或Ridge。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">因此，下面的例子就不在对Lasso和Ridge进行测试。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">需要补充说明的是Lasso模型有一个用途就是它可以用来发现哪些特征比较有用，哪些特征影响不大，因为Lasso训练出来的模型很多w分量是0，通过训练后的模型.coef_ 可以看到这些系数。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">下面我们对KNN、最小二乘、弹性网络的</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><strong>学习曲线</strong>（训练数据集的大小和评分的函数）进行观察</div><div><img src="http://www.cnitblog.com/images/cnitblog_com/luckydmz/boston.png" alt="" width="640" height="480" border="0" /><img data-media-type="image" src="C:/Users/daimingzhuang/AppData/Local/YNote/data/kyo_86@163.com/ec6ede9dcdbb42beaa5170698d06c35f/boston.png" style="width: 620px;" data-attr-org-src-id="06BAB83D79EE4AACA7AE6BC67963A907" alt="" /></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">这个例子的数据是用13个特征预测波士顿地区的房价，</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">可以看到最小二乘法在训练集比较小的时候非常差，要到100以上才比较稳定，当数据足够多之后效果还不错。（用普通最小二乘法预测的不稳定性在后面的例子更明显）</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">而ElasticNet稳定些，会比较快的可用，但是最终反而比普通最小二乘稍差，也可能是我没有仔细调过参的原因。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">对于KNN算法模型的评分随着训练集的增加逐步上升，但最终预测效果差强人意。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">下面还是波士顿房价的问题但把这些特征的两两乘积也作为特征，特征增加到了91项（=13+13*12/2）</div><div><img data-media-type="image" src="file:///C:/Users/daimingzhuang/AppData/Local/YNote/data/kyo_86@163.com/4e829647eaf64a1ca43eb3879c259c64/extended_boston.png" style="width: 620px;" data-attr-org-src-id="25E50857AC044B0FB8FCCDE093C15A9D" alt="" /><img src="http://www.cnitblog.com/images/cnitblog_com/luckydmz/extended_boston.png" alt="" width="640" height="480" border="0" /></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">可以看到普通最小二乘表现得有点奇怪，在训练集小于200之前非常不稳定，一度评分为0，即使训练集大于200也不太稳定。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">弹性网络则要稳定的多，最终的评分也是弹性网络最好，这次我稍微调过参数了。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">KNN还是一样，随着训练集的增加评分逐步上升，但最终预测效果差强人意。</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"></div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;">下面是所使用的代码</div><div style="white-space: pre-wrap; text-align: left; line-height: 1.75; font-size: 14px;"><div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;numpy&nbsp;as&nbsp;np<br /></span><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;matplotlib.pyplot&nbsp;as&nbsp;plt<br /></span><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;pandas&nbsp;as&nbsp;pd<br /><br /></span><span style="color: #0000FF; ">from</span><span style="color: #000000; ">&nbsp;sklearn.model_selection&nbsp;</span><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;train_test_split<br /></span><span style="color: #0000FF; ">from</span><span style="color: #000000; ">&nbsp;sklearn.neighbors&nbsp;</span><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;KNeighborsRegressor<br /></span><span style="color: #0000FF; ">from</span><span style="color: #000000; ">&nbsp;sklearn.linear_model&nbsp;</span><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;LinearRegression<br /></span><span style="color: #0000FF; ">from</span><span style="color: #000000; ">&nbsp;sklearn.linear_model&nbsp;</span><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;ElasticNet<br /></span><span style="color: #0000FF; ">from</span><span style="color: #000000; ">&nbsp;mglearn.datasets&nbsp;</span><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;load_extended_boston<br /></span><span style="color: #0000FF; ">from</span><span style="color: #000000; ">&nbsp;sklearn.datasets&nbsp;</span><span style="color: #0000FF; ">import</span><span style="color: #000000; ">&nbsp;load_boston<br /><br />X,y&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;load_extended_boston()<br /><br /></span><span style="color: #008000; ">#</span><span style="color: #008000; ">&nbsp;boston&nbsp;=&nbsp;load_boston()</span><span style="color: #008000; "><br />#</span><span style="color: #008000; ">&nbsp;X,y&nbsp;=&nbsp;boston.data,&nbsp;boston.target</span><span style="color: #008000; "><br /></span><span style="color: #000000; "><br />X_train,&nbsp;X_test,&nbsp;y_train,&nbsp;y_test&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;train_test_split(<br />&nbsp;&nbsp;&nbsp;&nbsp;X,&nbsp;y,&nbsp;random_state</span><span style="color: #000000; ">=</span><span style="color: #000000; ">0)<br /><br />regressor_names&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;[</span><span style="color: #800000; ">"</span><span style="color: #800000; ">5NN</span><span style="color: #800000; ">"</span><span style="color: #000000; ">,&nbsp;</span><span style="color: #800000; ">"</span><span style="color: #800000; ">LinearRegression</span><span style="color: #800000; ">"</span><span style="color: #000000; ">,&nbsp;</span><span style="color: #800000; ">"</span><span style="color: #800000; ">ElasticNet</span><span style="color: #800000; ">"</span><span style="color: #000000; ">]<br />line_style</span><span style="color: #000000; ">=</span><span style="color: #000000; ">[</span><span style="color: #800000; ">"</span><span style="color: #800000; ">-</span><span style="color: #800000; ">"</span><span style="color: #000000; ">,</span><span style="color: #800000; ">"</span><span style="color: #800000; ">:</span><span style="color: #800000; ">"</span><span style="color: #000000; ">,</span><span style="color: #800000; ">"</span><span style="color: #800000; ">--</span><span style="color: #800000; ">"</span><span style="color: #000000; ">]<br /></span><span style="color: #0000FF; ">for</span><span style="color: #000000; ">&nbsp;i,&nbsp;regressor&nbsp;</span><span style="color: #0000FF; ">in</span><span style="color: #000000; ">&nbsp;enumerate([KNeighborsRegressor(n_neighbors&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">5</span><span style="color: #000000; ">),&nbsp;LinearRegression(),&nbsp;ElasticNet(alpha</span><span style="color: #000000; ">=</span><span style="color: #000000; ">0.01</span><span style="color: #000000; ">,&nbsp;l1_ratio</span><span style="color: #000000; ">=</span><span style="color: #000000; ">0.5</span><span style="color: #000000; ">,&nbsp;max_iter</span><span style="color: #000000; ">=</span><span style="color: #000000; ">100000</span><span style="color: #000000; ">)]):<br />&nbsp;&nbsp;&nbsp;&nbsp;p1&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;np.zeros((len(X_train)</span><span style="color: #000000; ">-</span><span style="color: #000000; ">10</span><span style="color: #000000; ">,&nbsp;</span><span style="color: #000000; ">2</span><span style="color: #000000; ">))<br />&nbsp;&nbsp;&nbsp;&nbsp;p2&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;np.zeros((len(X_train)</span><span style="color: #000000; ">-</span><span style="color: #000000; ">10</span><span style="color: #000000; ">,&nbsp;</span><span style="color: #000000; ">2</span><span style="color: #000000; ">))<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">for</span><span style="color: #000000; ">&nbsp;n_samples&nbsp;</span><span style="color: #0000FF; ">in</span><span style="color: #000000; ">&nbsp;range(</span><span style="color: #000000; ">10</span><span style="color: #000000; ">,&nbsp;len(X_train)):<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;regressor.fit(X_train[0:n_samples],&nbsp;y_train[0:n_samples])<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;p1[n_samples</span><span style="color: #000000; ">-</span><span style="color: #000000; ">10</span><span style="color: #000000; ">]&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;(n_samples,&nbsp;regressor.score(X_train,&nbsp;y_train))<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;p2[n_samples</span><span style="color: #000000; ">-</span><span style="color: #000000; ">10</span><span style="color: #000000; ">]&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;(n_samples,&nbsp;regressor.score(X_test,&nbsp;y_test))<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;plt.plot(p1[:,0],p1[:,</span><span style="color: #000000; ">1</span><span style="color: #000000; ">],&nbsp;line_style[i],&nbsp;label&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #800000; ">"</span><span style="color: #800000; ">Training&nbsp;</span><span style="color: #800000; ">"</span><span style="color: #000000; ">+</span><span style="color: #000000; ">regressor_names[i])<br />&nbsp;&nbsp;&nbsp;&nbsp;plt.plot(p2[:,0],p2[:,</span><span style="color: #000000; ">1</span><span style="color: #000000; ">],&nbsp;line_style[i],&nbsp;label&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #800000; ">"</span><span style="color: #800000; ">Test&nbsp;</span><span style="color: #800000; ">"</span><span style="color: #000000; ">+</span><span style="color: #000000; ">regressor_names[i])<br /><br />plt.title(</span><span style="color: #800000; ">"</span><span style="color: #800000; ">learning&nbsp;curve(extended_boston)</span><span style="color: #800000; ">"</span><span style="color: #000000; ">)<br />plt.xlabel(</span><span style="color: #800000; ">"</span><span style="color: #800000; ">Training&nbsp;set&nbsp;size</span><span style="color: #800000; ">"</span><span style="color: #000000; ">)<br />plt.ylabel(</span><span style="color: #800000; ">"</span><span style="color: #800000; ">Score(R^2)</span><span style="color: #800000; ">"</span><span style="color: #000000; ">)<br />plt.ylim(0,&nbsp;</span><span style="color: #000000; ">1</span><span style="color: #000000; ">)<br />plt.xlim(0,len(X_train))<br />plt.legend()<br />plt.show()<br /></span></div></div><img src ="http://www.cnitblog.com/luckydmz/aggbug/91864.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cnitblog.com/luckydmz/" target="_blank">魔のkyo</a> 2019-10-06 17:25 <a href="http://www.cnitblog.com/luckydmz/archive/2019/10/06/91864.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>