我有这个数据集:
Date ID Tweet Note
01/20/2020 4141 The cat is on the table I bought a table
01/20/2020 4142 The sky is blue Once upon a time
01/20/2020 53 What a wonderful day I have no words
我想选择Tweet中包含的行,或注意以下单词之一:
w=["sky", "table"]
为此,我正在使用以下内容:
def part_is_in(x, values):
output = False
for val in values:
if val in str(x):
return True
break
return output
def fun_1(filename):
w=["sky", "table"]
filename['Logic'] = filename[['Tweet','Note']].apply(part_is_in, values=w)
filename['Low_Tweet']=filename['Tweet']
filename['Low_ Note']=filename['Note']
lower_cols = [col for col in filename if col not in ['Tweet','Note']]
filename[lower_cols]= filename[lower_cols].apply(lambda x: x.astype(str).str.lower(),axis=1)
# NEW COLUMN
filename['Logic'] = pd.Series(index = filename.index, dtype='object')
filename['TF'] = pd.Series(index = filename.index, dtype='object')
for index, row in filename.iterrows():
value = row['ID']
if any(x in str(value) for x in w):
filename.at[index,'Logic'] = True
else:
filename.at[index,'Logic'] = False
filename.at[index,'TF'] = False
for index, row in filename.iterrows():
value = row['Tweet']
if any(x in str(value) for x in w):
filename.at[index,'Logic'] = True
else:
filename.at[index,'Logic'] = False
filename.at[index,'TF'] = False
for index, row in filename.iterrows():
value = row['Note']
if any(x in str(value) for x in w):
filename.at[index,'Logic'] = True
else:
filename.at[index,'Logic'] = False
filename.at[index,'TF'] = False
return(filename)
它应该做的是查找在(w)列表中具有至少一个单词的行并分配一个值:
我的预期输出是:
Date ID Tweet Note Logic TF
01/20/2020 4141 The cat is on the table I bought a table True False
01/20/2020 4142 The sky is blue Once upon a time True False
01/20/2020 53 What a wonderful day I have no words False False
手动检查,发现某些单词分配不正确。我的代码有什么问题?
答案 0 :(得分:0)
我也是熊猫的新手,所以您需要以一粒盐来回答这个问题。如果您正在遍历DataFrame,而您并未按预期使用熊猫,那么我将从本教程中获得印象。
为此,我将指出这一点:
df['Logic'] = df['Tweet'].str.contains('table')
df['Logic'] |= df['Tweet'].str.contains('sky')
收益:
Date ID Tweet Note Logic
0 1/20/20 4141 The cat is on the table I bought a table True
1 1/20/20 4142 The sky is blue Once upon a time True
2 1/20/20 53 What a wonderful day I have no words False
答案 1 :(得分:0)
据我了解,如果关键字在那些特定的列中,则Logic
为True,否则TF
为false和Logic
为false。我不知道何时TF
为假,Logic
为真。所以我不确定这是否有帮助,但是
pattern = '|'.join(w)
df['Logic'] = df.Tweet.str.contains(pattern) | df.Note.str.contains(pattern)
此代码可以帮助您避免apply
。