您好,我有这段代码:
import pandas as pd
import numpy as np
import warnings
from sklearn import svm
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")
from sklearn.model_selection import train_test_split
df = pd.read_csv("datatrain.csv" , sep="," ,encoding = 'windows-1250' )
df = df[['FEATURE1' , 'FEATURE2' , 'FEATURE3' ,'LABEL']]
df.dropna(inplace=True)
print(df.head())
X = np.array(df.drop(['LABEL'], 1))
y = np.array(df['LABEL'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf = svm.SVC(kernel="linear", C= 1.0)
clf.fit(X_train[:-500], y_train[:-500])
accuracy = clf.score(X_test, y_test)
print("accuracy: ", accuracy)
我的数据集很大,超过150K行,但是如你所见,我只使用前500行。当我启动我的代码时,第一个print(df.head())
运行,但后来我的座位上只有一个弹跳的蟒蛇火箭,没有任何反应。
答案 0 :(得分:0)
您正在使用除最后500行之外的所有行。它应该是clf.fit(X_train[:500], y_train[:500])
。
有关如何从切片中获取第n个元素的详细说明,请参阅此answer。