patient_dummies = pd.get_dummies(df['PatientSerial'], prefix='Serial_', drop_first = True)
df = pd.concat([df, patient_dummies], axis = 1)
df.drop(['PatientSerial'], inplace = True, axis = 1)
machine_dummies = pd.get_dummies(df['MachineID'], drop_first = True)
df = pd.concat([df, machine_dummies], axis = 1)
df.drop(['MachineID'], inplace = True, axis = 1)
我在dataframe df中有两列我想要更改为无序的分类变量。而不是分别做每一个,是否有更有效的方法来实现这一目标?我在考虑以下方式:
patient_dummies = pd.get_dummies(df['PatientSerial'], prefix='Serial_', drop_first = True)
machine_dummies = pd.get_dummies(df['MachineID'], drop_first = True)
df = pd.concat([df, patient_dummies + machine_dummies], axis = 1)
df.drop(['PatientSerial','MachineID'], inplace = True, axis = 1)
但这并没有奏效;它产生了' nan'对于所有条目而不是0和1和1。
答案 0 :(得分:3)
是:pandas.get_dummies()
接受columns
参数。如果您从DataFrame传递列名称,它将返回这两个dummified列,作为您传递的整个DataFrame的一部分。
df = pd.get_dummies(df, columns=['PatientSerial', 'MachineID'], drop_first=True)
例如:
np.random.seed(444)
v = np.random.choice([0, 1, 2], size=(2, 10))
df = pd.DataFrame({'other_col': np.empty_like(v[0]),
'PatientSerial': v[0],
'MachineID': v[1]})
pd.get_dummies(df, columns=['PatientSerial', 'MachineID'],
drop_first=True, prefix=['Serial', 'MachineID'])
other_col Serial_1 Serial_2 MachineID_1 MachineID_2
0 2 0 0 0 1
1 1 0 0 0 1
2 2 0 0 0 0
3 2 1 0 1 0
4 2 0 1 0 0
5 2 1 0 0 1
6 2 0 1 0 0
7 2 1 0 0 1
8 2 1 0 0 0
9 2 1 0 0 1