我正在尝试使用Python中的Sklearn进行k-fold交叉验证,现在已经遵循了两个教程,但我的代码不会运行验证。
每次我尝试
cross_val_score(dt, x, y, cv=5)
我收到错误:
Traceback (most recent call last):
File "C:/Users/djsg38/Documents/CS6001-SpatialTemporal/HW2/main.py", line 573, in <module>
scores = cross_val_score(dt, x, y, cv=5)
File "C:\Python27\lib\site-packages\sklearn\model_selection\_validation.py", line 128, in cross_val_score
X, y, groups = indexable(X, y, groups)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 206, in indexable
check_consistent_length(*result)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 177, in check_consistent_length
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 116, in _num_samples
'estimator %s' % x)
TypeError: Expected sequence or array-like, got estimator Its official US President Barack Obama wants lawmakers weigh \
0 1 4 12 3 2 12 4 4 2
1 0 0 1 0 0 0 0 0 0
2 1 0 4 0 0 0 0 0 0
3 0 0 0 0 0 0 4 0 0
4 0 3 10 0 0 1 0 0 0
5 0 0 0 0 0 0 0 0 0
6 0 0 0 4 1 7 0 0 0
7 3 0 0 0 0 0 0 0 0
8 1 0 4 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
10 0 1 6 3 0 3 0 0 0
11 0 0 0 1 0 0 0 0 0
12 0 2 1 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0
17 0 0 5 4 1 9 1 0 0
18 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
21 0 0 3 2 1 1 0 0 1
22 0 0 0 0 0 0 0 0 0
23 0 0 1 0 0 0 0 0 0
24 1 0 0 0 0 0 0 0 0
25 0 0 0 1 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0
27 0 0 1 0 0 0 0 0 0
28 0 0 0 0 0 0 0 0 0
29 0 1 0 0 0 0 0 0 0
.. ... ... .. ... ... ... ... ... ...
70 0 0 0 0 0 0 0 0 0
71 0 0 0 2 0 5 0 0 0
72 5 0 0 0 0 0 0 0 0
73 0 0 0 0 0 0 0 0 0
74 0 0 1 0 0 0 0 0 0
75 1 0 1 0 0 0 1 0 0
76 2 0 0 0 0 0 0 0 0
77 1 0 0 0 0 0 0 0 0
78 0 0 0 0 0 0 0 0 0
79 1 0 0 0 0 0 0 0 0
80 0 0 0 0 0 0 0 0 0
81 0 0 1 0 0 0 0 0 0
82 0 0 1 0 0 0 0 0 0
83 0 0 0 0 0 0 0 1 0
84 0 0 2 4 1 3 1 0 0
85 0 0 0 1 0 0 0 0 0
86 0 0 1 0 0 0 0 0 0
87 0 0 0 0 0 0 0 0 0
88 0 0 0 0 0 0 0 0 0
89 0 0 0 0 0 0 0 0 0
90 0 0 0 0 0 0 0 0 0
91 0 0 2 1 0 0 0 0 0
92 0 0 0 0 0 0 0 0 0
93 0 0 0 0 0 0 0 0 0
94 1 0 0 0 0 0 0 0 0
95 0 2 1 0 0 0 0 0 0
96 0 0 0 0 0 0 0 0 0
97 0 0 4 1 0 0 0 0 0
98 0 0 11 1 0 0 0 0 0
99 0 0 0 0 0 0 0 0 0
whether ... Heh heh funny disassociate personWere \
0 4 ... 0 0 0 0 0
1 0 ... 0 0 0 0 0
2 0 ... 0 0 0 0 0
3 0 ... 0 0 0 0 0
4 0 ... 0 0 0 0 0
5 0 ... 0 0 0 0 0
6 2 ... 0 0 0 0 0
7 0 ... 0 0 0 0 0
8 0 ... 0 0 0 0 0
9 0 ... 0 0 0 0 0
10 0 ... 0 0 0 0 0
11 1 ... 0 0 0 0 0
12 0 ... 0 0 0 0 0
13 1 ... 0 0 0 0 0
14 0 ... 0 0 0 0 0
15 1 ... 0 0 0 0 0
16 0 ... 0 0 0 0 0
17 1 ... 0 0 0 0 0
18 0 ... 0 0 0 0 0
19 0 ... 0 0 0 0 0
20 0 ... 0 0 0 0 0
21 8 ... 0 0 0 0 0
22 0 ... 0 0 0 0 0
23 0 ... 0 0 0 0 0
24 0 ... 0 0 0 0 0
25 0 ... 0 0 0 0 0
26 1 ... 0 0 0 0 0
27 0 ... 0 0 0 0 0
28 0 ... 0 0 0 0 0
29 0 ... 0 0 0 0 0
.. ... ... ... ... ... ... ...
70 0 ... 0 0 0 0 0
71 1 ... 0 0 0 0 0
72 0 ... 0 0 0 0 0
73 0 ... 0 0 0 0 0
74 0 ... 0 0 0 0 0
75 0 ... 0 0 0 0 0
77 0 ... 0 0 0 0 0
78 0 ... 0 0 0 0 0
79 1 ... 0 0 0 0 0
80 0 ... 0 0 0 0 0
81 3 ... 0 0 0 0 0
82 0 ... 0 0 0 0 0
83 0 ... 0 0 0 0 0
84 0 ... 0 0 0 0 0
85 0 ... 0 0 0 0 0
86 0 ... 0 0 0 0 0
87 0 ... 0 0 0 0 0
88 0 ... 0 0 0 0 0
89 1 ... 0 0 0 0 0
90 0 ... 0 0 0 0 0
91 0 ... 0 0 0 0 0
92 0 ... 0 0 0 0 0
93 0 ... 0 0 0 0 0
94 1 ... 0 0 0 0 0
95 0 ... 0 0 0 0 0
96 0 ... 0 0 0 0 0
97 0 ... 0 0 0 0 0
98 1 ... 0 0 0 0 0
99 0 ... 1 1 1 1 1
therehighlightAs indepth umpireshighlightThe headhighlightTwo \
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 0 0 0 0
12 0 0 0 0
13 0 0 0 0
14 0 0 0 0
15 0 0 0 0
16 0 0 0 0
17 0 0 0 0
18 0 0 0 0
19 0 0 0 0
20 0 0 0 0
21 0 0 0 0
22 0 0 0 0
23 0 0 0 0
24 0 0 0 0
25 0 0 0 0
26 0 0 0 0
27 0 0 0 0
28 0 0 0 0
29 0 0 0 0
.. ... ... ... ...
70 0 0 0 0
71 0 0 0 0
72 0 0 0 0
73 0 0 0 0
74 0 0 0 0
75 0 0 0 0
76 0 0 0 0
77 0 0 0 0
78 0 0 0 0
79 0 0 0 0
80 0 0 0 0
81 0 0 0 0
82 0 0 0 0
83 0 0 0 0
84 0 0 0 0
85 0 0 0 0
86 0 0 0 0
87 0 0 0 0
88 0 0 0 0
89 0 0 0 0
90 0 0 0 0
91 0 0 0 0
92 0 0 0 0
93 0 0 0 0
94 0 0 0 0
95 0 0 0 0
96 0 0 0 0
97 0 0 0 0
98 0 0 0 0
99 1 1 1 1
disrespect
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
.. ...
70 0
71 0
72 0
73 0
74 0
75 0
76 0
77 0
78 0
79 0
80 0
81 0
82 0
83 0
84 0
85 0
86 0
87 0
88 0
89 0
90 0
91 0
92 0
93 0
94 0
95 0
96 0
97 0
98 0
99 1
[100 rows x 12993 columns]
这是我的代码:
def encode_target(df, target_column):
df_mod = df.copy()
targets = df_mod[target_column].unique()
map_to_int = {name: n for n, name in enumerate(targets)}
df_mod["Target"] = df_mod[target_column].replace(map_to_int)
return (df_mod, targets)
df = pd.read_csv("C:/Users/djsg38/Documents/CS6001- SpatialTemporal/HW2/finalCounts.csv")
df2, targets = encode_target(df, "MYLABEL")
features = list(df2.columns[:12338])
y = df2["TARGET"]
x = df2[features]
dt = DecisionTreeClassifier()
dt.fit(x, y)
scores = cross_val_score(dt, x, y, cv=5)
我的DecisionTreeClassifier似乎运行正常,当我将其作为图像输出时看起来不错,但这里的问题在于最后一行。
P.S。我不确定是否有列限制?我遵循的经典示例使用了Iris数据集,因此有四列可以查看数据。但对我来说,我有12,338列数据(100篇文章中每个独特单词的字数)。
答案 0 :(得分:0)
与我所遵循的教程相反,我无法通过我的X值,因为它收到了错误。原因可能是由于其中包含字符串标题,而不是正面。
我所做的解决方案只是手动将我的数据拆分5倍,并对数据执行5个不同的决策树,每次都有1个测试集。