from sklearn.model_selection import train_test_split
Data1 = pd.read_csv(r"C:\Users\Zihao\Desktop\New\OBSTET.csv", index_col = 0)
Data1.fillna(0, inplace = True)
Dependent = Data1.ix[:,0]
X_train, y_train, x_test, y_test = train_test_split()
这是我的数据。我知道第一列是因变量,其余列是独立变量。
如何拆分?我不确定我应该通过哪个论点。
答案 0 :(得分:2)
如果您正在尝试预测您的Dependent变量,那将是您的" y"。虽然独立变量是你的" X"。
如果是这种情况:
Dependent = Data1.ix[:, 0] # your "y"
Independent = Data1.ix[:, 1:] # the rest of the columns (commonly refered to as "X"
X_train, x_test, y_train, y_test = train_test_split(Independent, Dependent)
这会将75%的数据放入X_train,y_train。另外25%进入x_test,y_test。