我是机器学习的初学者,只有表面曝光才有这么多经验。我想知道你在哪里可以研究或了解特征选择算法。我的python编程水平是业余级别的(从 codeacademy 学习一切除了课程)
从Univariate feature selection开始,我尝试使用该特定网站作为学习点,但似乎相当复杂。
最好的是,如果你能够在上面提到的区域为我展示一个速成班的地方,我真的很想在更复杂的机器学习上尽快开始。
感谢任何形式的帮助!
(在溢出中搜索了3-4天,但没有找到简单的东西,所以我决定问)
修改 好吧,我意识到我的问题似乎被搁置了,因为它似乎不是主题溢出,所以也许我会更具体。
参考:selector = SelectPercentile(f_classif, percentile=10)
selector.fit(X, y)
==>从上面提到的相同网站,这是如何正常工作的
对于X:
[[ '0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00']
[ '0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00']
[ '0,tcp,http,SF,235,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,29,29,1.00,0.00,0.03,0.00,0.00,0.00,0.00,0.00']
[ '0,tcp,http,SF,219,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0.00,0.00,0.00,0.00,1.00,0.00,0.00,39,39,1.00,0.00,0.03,0.00,0.00,0.00,0.00,0.00']
[ '0,tcp,http,SF,217,2032,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0.00,0.00,0.00,0.00,1.00,0.00,0.00,49,49,1.00,0.00,0.02,0.00,0.00,0.00,0.00,0.00']]
使用以下方式创建:
x = []
j = ""
for i in range(5):
j = ','.join(temp[i][:41:])
x.append([j])
x = np.array(x)
对于y:
[0, 0, 0, 0, 0] #this is only a small sample part of the data (which also is true with the data for X), with total values consisting of either '0's or '1's
依次使用
import numpy as np
y = np.append(y, 0)
#codes for y similar to above for X
变为:
[0 0 0 0 0]
并导致错误:
Traceback (most recent call last):
File "<pyshell#39>", line 1, in <module>
selector.fit(x, y)
File "C:\Python27\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 315, in fit
self.scores_, self.pvalues_ = self.score_func(X, y)
File "C:\Python27\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 141, in f_classif
return f_oneway(*args)
File "C:\Python27\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 99, in f_oneway
[safe_sqr(a).sum(axis=0) for a in args])
File "C:\Python27\lib\site-packages\sklearn\utils\__init__.py", line 321, in safe_sqr
X = X ** 2
TypeError: unsupported operand type(s) for ** or pow(): 'numpy.ndarray' and 'int'
这就是我真正想要学习机器学习的原因,即使你学习python近2个月,现在scikit-learn看起来仍然很陌生。这是因为我正在尝试从网站上学习,只需跟进提供的代码并对其进行自定义以适合我自己的数据。
溢出真的是令人生畏的加入,因为我理解编程不能真的贡献但是我猜你打破那个障碍后一切都会好的