我有一个熊猫数据框(N = 1485),看起来像这样:
ID Intervention
1 Blood Draw, Flushed, Locked
1 Blood Draw, Port De-Accessed, Heparin-Locked, Tubing Changed
1 Blood Draw, Flushed
2 Blood return Verified, Flushed
2 Cap Changed
3 Port De-Accessed
我希望能够在每个逗号前将每个字符串中的代码虚拟掉,所以它看起来类似于:
ID Blood Draw Flushed Locked ....
1 Yes Yes Yes
1 Yes No No
...
谢谢!
答案 0 :(得分:0)
您可以尝试以下操作:
for event in ['Blood Draw', 'Flushed', 'Locked']:
df[event] = df['Intervention'].str.contains(event)
将为您提供True
/ False
而不是'Yes'/'No'
,这在您进行后期处理时可能会更有用。
答案 1 :(得分:0)
SELECT DISTINCT Company_Code
FROM Company
ORDER BY CAST(Company_Code AS INT);
要执行上述步骤,请过滤import numpy as np
df1=df['Intervention'].str.split(',', expand=True)
df2=df1.replace(np.nan, '', regex=True) # Replacing None with blank data
pd.concat([pd.get_dummies(df2[col]) for col in df2], axis=1, keys=df2.columns) # Creates dummies for all the columns
列,执行此过程并与原始数据帧合并,以便dummies语句起作用(为所有列创建dummies)。
答案 2 :(得分:0)
您可以使用pd.Series.str.get_dummies
和字典映射:
d = {1: 'yes', 0: 'no'}
res = df.join(df.pop('Intervention').str.get_dummies(', ').applymap(d.get))
我认为,最好将其转换为仅用于显示目的的字符串。布尔值可以更有效地按布尔序列进行保存和操作。
结果
print(res)
ID Blood Draw Blood return Verified Cap Changed Flushed Heparin-Locked \
0 1 yes no no yes no
1 1 yes no no no yes
2 1 yes no no yes no
3 2 no yes no yes no
4 2 no no yes no no
5 3 no no no no no
Locked Port De-Accessed Tubing Changed
0 yes no no
1 no yes yes
2 no no no
3 no no no
4 no no no
5 no yes no
设置
df = pd.DataFrame({'ID': [1, 1, 1, 2, 2, 3],
'Intervention': ['Blood Draw, Flushed, Locked',
'Blood Draw, Port De-Accessed, Heparin-Locked, Tubing Changed',
'Blood Draw, Flushed', 'Blood return Verified, Flushed',
'Cap Changed', 'Port De-Accessed']})