从列表创建虚拟变量

时间:2019-05-13 03:26:23

标签: python-3.x pandas

我有一个熊猫数据框,其中的列名为“ Notes”。它具有类似于以下示例的条目。我想基于列表创建虚拟变量列:

Lst=[‘loan’,’Borrower’,’debts’]

如果“备注”列中的字符串包含该条目,则我想为列表中的每个条目创建一个二进制标志。谁能建议该怎么做?

数据:

print(data_df[['Id','Notes']][:10])

     Id                                              Notes
59    60   568549 added on 11/04/09 > I use my current l...     
76    77  I would like to use this loan to consolidate c...
88    89    Borrower added on 06/28/10 > I would really ...
229  230  I just got married and ran up some debt during...

输出:

     Id                                              Notes      loan        Borrower        debts
59    60   568549 added on 11/04/09 > I use my current l...     0       0           0
76    77  I would like to use this loan to consolidate c...     1       0           0
88    89    Borrower added on 06/28/10 > I would really ...     0       1           0
229  230  I just got married and ran up some debt during...     0       0           1

2 个答案:

答案 0 :(得分:1)

先检查str.findall,然后再检查get_dummies

df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()
Out[639]: 
   Borrower  debts  loan
0         0      0     1
1         1      0     0
2         0      1     0
yourdf=pd.concat([df,df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()],axis=1)
yourdf
Out[640]: 
            Note  Borrower  debts  loan
0       loan lll         0      0     1
1  llll Borrower         1      0     0
2    ......debts         0      1     0

df=pd.DataFrame({'Note':['loan lll','llll Borrower','......debts']})

答案 1 :(得分:0)

要使用函数转换数据,应创建一个新列,请为该列分配一个带有lambda表达式的apply方法。像这样:

<dataframe>['new column name'] = <dataframe>['some existing column name'].apply(<some function>)

更具体地讲:

data_df['loan'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('loan') else 0)
data_df['Borrower'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('Borrower') else 0)
data_df['debt'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('debt') else 0)

如果您有多行代码,则可能会定义一个新功能,但这可以使想法更清晰