Question

我有这个数据集：

Date                   ID        Tweet                         Note         
01/20/2020           4141    The cat is on the table          I bought a table      
01/20/2020           4142    The sky is blue                  Once upon a time 
01/20/2020           53      What a wonderful day             I have no words

我想选择Tweet中包含的行，或注意以下单词之一：

w=["sky", "table"]

为此，我正在使用以下内容：

    def part_is_in(x, values):
        output = False
        for val in values:
            if val in str(x):
                return True
                break                
        return output


def fun_1(filename):
    w=["sky", "table"]
    filename['Logic'] = filename[['Tweet','Note']].apply(part_is_in, values=w)
    filename['Low_Tweet']=filename['Tweet']
    filename['Low_ Note']=filename['Note']
    lower_cols = [col for col in filename if col not in ['Tweet','Note']]
    filename[lower_cols]= filename[lower_cols].apply(lambda x: x.astype(str).str.lower(),axis=1)

# NEW COLUMN
    
    filename['Logic'] = pd.Series(index = filename.index, dtype='object')
    filename['TF'] = pd.Series(index = filename.index, dtype='object')
    
    for index, row in filename.iterrows():
            value = row['ID']

            if any(x in str(value) for x in w):
                filename.at[index,'Logic'] = True
            else:
                filename.at[index,'Logic'] = False
                filename.at[index,'TF'] = False

    for index, row in filename.iterrows():
            value = row['Tweet']

            if any(x in str(value) for x in w):
                filename.at[index,'Logic'] = True
            else:
                filename.at[index,'Logic'] = False
                filename.at[index,'TF'] = False
    
    for index, row in filename.iterrows():
            value = row['Note']

            if any(x in str(value) for x in w):
                filename.at[index,'Logic'] = True
            else:
                filename.at[index,'Logic'] = False
                filename.at[index,'TF'] = False
               
    return(filename)

它应该做的是查找在（w）列表中具有至少一个单词的行并分配一个值：

如果该行的Tweet或Note中包含单词，则指定True，否则指定False。

我的预期输出是：

Date                   ID        Tweet                         Note               Logic     TF
01/20/2020           4141    The cat is on the table          I bought a table     True     False 
01/20/2020           4142    The sky is blue                  Once upon a time     True     False
01/20/2020           53      What a wonderful day             I have no words      False    False

手动检查，发现某些单词分配不正确。我的代码有什么问题？

Answer 1

我也是熊猫的新手，所以您需要以一粒盐来回答这个问题。如果您正在遍历DataFrame，而您并未按预期使用熊猫，那么我将从本教程中获得印象。

为此，我将指出这一点：

df['Logic'] = df['Tweet'].str.contains('table')
df['Logic'] |= df['Tweet'].str.contains('sky')

收益：

      Date    ID                    Tweet              Note  Logic 
0  1/20/20  4141  The cat is on the table  I bought a table   True 
1  1/20/20  4142          The sky is blue  Once upon a time   True 
2  1/20/20    53     What a wonderful day   I have no words  False

Answer 2

据我了解，如果关键字在那些特定的列中，则Logic为True，否则TF为false和Logic为false。我不知道何时TF为假，Logic为真。所以我不确定这是否有帮助，但是

pattern = '|'.join(w) 
df['Logic'] = df.Tweet.str.contains(pattern) | df.Note.str.contains(pattern)

此代码可以帮助您避免apply。

如果满足条件，请在新列中分配一个值

2 个答案: