我有一个熊猫数据框,其中的列名为“ Notes”。它具有类似于以下示例的条目。我想基于列表创建虚拟变量列:
Lst=[‘loan’,’Borrower’,’debts’]
如果“备注”列中的字符串包含该条目,则我想为列表中的每个条目创建一个二进制标志。谁能建议该怎么做?
数据:
print(data_df[['Id','Notes']][:10])
Id Notes
59 60 568549 added on 11/04/09 > I use my current l...
76 77 I would like to use this loan to consolidate c...
88 89 Borrower added on 06/28/10 > I would really ...
229 230 I just got married and ran up some debt during...
输出:
Id Notes loan Borrower debts
59 60 568549 added on 11/04/09 > I use my current l... 0 0 0
76 77 I would like to use this loan to consolidate c... 1 0 0
88 89 Borrower added on 06/28/10 > I would really ... 0 1 0
229 230 I just got married and ran up some debt during... 0 0 1
答案 0 :(得分:1)
先检查str.findall
,然后再检查get_dummies
df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()
Out[639]:
Borrower debts loan
0 0 0 1
1 1 0 0
2 0 1 0
yourdf=pd.concat([df,df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()],axis=1)
yourdf
Out[640]:
Note Borrower debts loan
0 loan lll 0 0 1
1 llll Borrower 1 0 0
2 ......debts 0 1 0
df=pd.DataFrame({'Note':['loan lll','llll Borrower','......debts']})
答案 1 :(得分:0)
要使用函数转换数据,应创建一个新列,请为该列分配一个带有lambda表达式的apply方法。像这样:
<dataframe>['new column name'] = <dataframe>['some existing column name'].apply(<some function>)
更具体地讲:
data_df['loan'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('loan') else 0)
data_df['Borrower'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('Borrower') else 0)
data_df['debt'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('debt') else 0)
如果您有多行代码,则可能会定义一个新功能,但这可以使想法更清晰