如果满足条件,请在新列中分配一个值

时间:2020-07-22 22:44:35

标签: python pandas

我有这个数据集:

Date                   ID        Tweet                         Note         
01/20/2020           4141    The cat is on the table          I bought a table      
01/20/2020           4142    The sky is blue                  Once upon a time 
01/20/2020           53      What a wonderful day             I have no words    

我想选择Tweet中包含的行,或注意以下单词之一:

w=["sky", "table"]

为此,我正在使用以下内容:

    def part_is_in(x, values):
        output = False
        for val in values:
            if val in str(x):
                return True
                break                
        return output


def fun_1(filename):
    w=["sky", "table"]
    filename['Logic'] = filename[['Tweet','Note']].apply(part_is_in, values=w)
    filename['Low_Tweet']=filename['Tweet']
    filename['Low_ Note']=filename['Note']
    lower_cols = [col for col in filename if col not in ['Tweet','Note']]
    filename[lower_cols]= filename[lower_cols].apply(lambda x: x.astype(str).str.lower(),axis=1)

# NEW COLUMN
    
    filename['Logic'] = pd.Series(index = filename.index, dtype='object')
    filename['TF'] = pd.Series(index = filename.index, dtype='object')
    
    for index, row in filename.iterrows():
            value = row['ID']

            if any(x in str(value) for x in w):
                filename.at[index,'Logic'] = True
            else:
                filename.at[index,'Logic'] = False
                filename.at[index,'TF'] = False

    for index, row in filename.iterrows():
            value = row['Tweet']

            if any(x in str(value) for x in w):
                filename.at[index,'Logic'] = True
            else:
                filename.at[index,'Logic'] = False
                filename.at[index,'TF'] = False
    
    for index, row in filename.iterrows():
            value = row['Note']

            if any(x in str(value) for x in w):
                filename.at[index,'Logic'] = True
            else:
                filename.at[index,'Logic'] = False
                filename.at[index,'TF'] = False
               
    return(filename)

它应该做的是查找在(w)列表中具有至少一个单词的行并分配一个值:

  • 如果该行的Tweet或Note中包含单词,则指定True,否则指定False。

我的预期输出是:

Date                   ID        Tweet                         Note               Logic     TF
01/20/2020           4141    The cat is on the table          I bought a table     True     False 
01/20/2020           4142    The sky is blue                  Once upon a time     True     False
01/20/2020           53      What a wonderful day             I have no words      False    False

手动检查,发现某些单词分配不正确。我的代码有什么问题?

2 个答案:

答案 0 :(得分:0)

我也是熊猫的新手,所以您需要以一粒盐来回答这个问题。如果您正在遍历DataFrame,而您并未按预期使用熊猫,那么我将从本教程中获得印象。

为此,我将指出这一点:

df['Logic'] = df['Tweet'].str.contains('table')
df['Logic'] |= df['Tweet'].str.contains('sky')

收益:

      Date    ID                    Tweet              Note  Logic 
0  1/20/20  4141  The cat is on the table  I bought a table   True 
1  1/20/20  4142          The sky is blue  Once upon a time   True 
2  1/20/20    53     What a wonderful day   I have no words  False 

答案 1 :(得分:0)

据我了解,如果关键字在那些特定的列中,则Logic为True,否则TF为false和Logic为false。我不知道何时TF为假,Logic为真。所以我不确定这是否有帮助,但是

pattern = '|'.join(w) 
df['Logic'] = df.Tweet.str.contains(pattern) | df.Note.str.contains(pattern)

此代码可以帮助您避免apply