答案 0 :(得分:0)
如@moys所建议,可以先将数据框列拆分为不重叠的列名,然后再使用train_test_split
中的scikit-learn
。
示例:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
生成数据:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
以某种方式分割df列,例如一半:
cols = int(len(df.columns)/2)
df_A = df.iloc[:, 0:cols]
df_B = df.iloc[:, cols:]
使用train_test_split:
train_A, test_A = train_test_split(df_A, test_size=0.33)
train_B, test_B = train_test_split(df_B, test_size=0.33)