我的数据框如下:
df = pd.DataFrame({'testName': [4402, 3747 ,5555,8754],
'moduleName': ['singing', 'dance','booze', 'vocals'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED']})
我想对测试名称和moduleName列进行虚拟编码,可以做到这一点:
dummy_cols= ['testName','moduleName']
df = pd.get_dummies(diag, columns=dummy_cols)
但是我想对它进行虚拟编码,如果在endResult中为WARNING,则虚拟编码应为1,如果其FAILED为2,则输出应具有与endResult相对应的1和2。我该如何实现?
所需的输出:
df1 = pd.DataFrame({'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED'], 'testName_4402':[1,0,0,0], 'testName_3747':[0,2,0,0], 'testName_5555':[0,0,1,0], 'testName_8754':[0,0,0,2], 'moduleName_booze':[0,0,1,0], 'moduleName_dance':[0,2,0,0], 'moduleName_singing':[1,0,0,0], 'moduleName_vocals':[0,0,0,2]})
答案 0 :(得分:0)
您可以使用pd.get_dummies,但是* 2将所有raws与FAILED一起使用。
df = pd.DataFrame({'testName': [4402, 3747 ,5555,8754],
'moduleName': ['singing', 'dance','booze', 'vocals'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED']})
df['testName'] = df['testName'].astype('category')
dummy_cols= ['testName','moduleName']
result = pd.get_dummies(df[dummy_cols]).join(df['endResult'])
result = result.apply(lambda x: x * 2 if x.endResult == 'FAILED' else x * 1, 1)
result['endResult'] = df['endResult']
输出:
testName_3747 testName_4402 testName_5555 testName_8754 \
0 0 1 0 0
1 2 0 0 0
2 0 0 1 0
3 0 0 0 2
moduleName_booze moduleName_dance moduleName_singing moduleName_vocals \
0 0 0 1 0
1 0 2 0 0
2 1 0 0 0
3 0 0 0 2
endResult
0 WARNING
1 FAILED
2 WARNING
3 FAILED