如果在pandas数据框架上基于字符串排行

时间:2016-02-10 20:41:09

标签: python string if-statement pandas

这很简单。任务是检查一列中的字符串是否包含存储在另一个字符串中的所有单词。基于此做点什么。这是一个简单的例子

runner := txn.NewRunner(tcollection)
ops := []txn.Op{{
        C:      "accounts", 
        Id:     "aram",//Name
        Assert: M{"balance": M{"$gte": 100}},
        Update: M{"$inc": M{"balance": -100}},
}, {
        C:      "accounts",
        Id:     "ben",//Name
        Assert: M{"valid": true},
        Update: M{"$inc": M{"balance": 100}},
}}
id := bson.NewObjectId() // Optional
err := runner.Run(ops, id, nil)

现在我想检查df [“Strings”]的每一行,如果它包含单词“smoked”和数字“6”(这里的行3为真)。如果是这样,我需要新列df [“Result”]等于df [“Set”],但添加了“health damaging”字样。如果不只是复制df [“Set”]中包含的内容。输出应如下所示:

import pandas as pd

df = pd.DataFrame({'Strings':["The brown","fox smoked 6", "cigarettes per day", "in his cave"], 
'Set': ["Alpha", "Beta", "Gamma", "Delta"]})

... >>> df
     Set             Strings
0  Alpha           The brown
1   Beta        fox smoked 6
2  Gamma  cigarettes per day
3  Delta         in his cave
>>> 

1 个答案:

答案 0 :(得分:2)

您可以构建2个条件的掩码并将其传递给np.where

In [20]:

mask = (df['Strings'].str.contains('6')) & (df['Strings'].str.contains('smoked'))
In [23]:

et
df['Result'] = np.where(mask, df['Set'] + ' health damaging', df['Set'])
df
Out[23]:
     Set             Strings                Result
0  Alpha           The brown                 Alpha
1   Beta        fox smoked 6  Beta health damaging
2  Gamma  cigarettes per day                 Gamma
3  Delta         in his cave                 Delta

这里的面具使用.str.contains测试你的字符串是否存在以及我们和条件一起制作面具。