考虑一下我有这个数据框,我想从主题列中删除玩具作为主题,如果有一个主题作为玩具的行,则删除该行。我们如何在熊猫中做到这一点?
+---+-----------------------------------+-------------------------+
| | Comment | Topic |
+---+-----------------------------------+-------------------------+
| 1 | ----- | toy, bottle, vegetable |
| 2 | ----- | fruit, toy, electronics |
| 3 | ----- | toy |
| 4 | ----- | electronics, fruit |
| 5 | ----- | toy, electronic |
+---+-----------------------------------+-------------------------+
答案 0 :(得分:1)
尝试在str.replace
内使用str.rstrip
和ne
的{{1}}:
[...]
答案 1 :(得分:0)
lambda函数可以派上用场
df['topic'] = df['topic'].apply(lambda x: "" if len(x.split(','))==1 and x.split(',')[0]=='toy'))
答案 2 :(得分:0)
# create data dummy test data
data = {'comment':[1,2,3,4,5],
'Topic':['toy, bottle, vegetable','fruit, toy, electronics','toy','electronics, fruit','toy, electronic']}
# create dataframe
df = pd.DataFrame(data)
# create function to tidy your data and remove toy
def remove_toy(row):
row = [i.strip() for i in row.split(',')]
row = [i for i in row if i != 'toy']
return ', '.join(row)
# apply function to series
df['Topic'] = df['Topic'].apply(remove_toy)
#remove empy rows in the Topic series
df = df[df['Topic']!='']