我有一个数据框,我想在其中放置一些包含文本的行。
Date Campaign
3/24/20 GA Shoes Search Campaign
3/24/20 GA Shoes Display Campaign
3/24/20 GA Bag Search Campaign
3/24/20 GA Bag Display Campaign
3/24/20 IG Shoes Campaign
3/24/20 IG Bag Campaign
3/24/20 FB Shoes Campaign
3/24/20 FB Bag Campaign
3/24/20 Email Campaign
我想删除所有行并保留包含GA的行。这是我想要的结果:
Date Campaign
3/24/20 GA Shoes Search Campaign
3/24/20 GA Shoes Display Campaign
3/24/20 GA Bag Search Campaign
3/24/20 GA Bag Display Campaign
我试图这样做:
mask = df['Campaign'].str.contains('FB')
idx = df.index[mask]
new = df.drop(idx,axis=0)
但是,仅当我一次输入一个字符串时,它才有效。我试图这样做是为了节省时间,但是没有用:
mask = df['Campaign'].str.contains('FB', 'Email', 'IG')
idx = df.index[mask]
new = df.drop(idx,axis=0)
答案 0 :(得分:4)
除了删除包含其他字母的行外,您还可以应用一个函数来捕获要做包含“ GA”的行:
new = df[df['Campaign'].apply(lambda x: 'GA' in x)]
答案 1 :(得分:3)
这里的假设是,对于所有相关行,GA
位于句子的开头。熊猫str startswith在这里可以提供帮助:
df.loc[df.Campaign.str.startswith("GA")]
Date Campaign
0 3/24/20 GA Shoes Search Campaign
1 3/24/20 GA Shoes Display Campaign
2 3/24/20 GA Bag Search Campaign
3 3/24/20 GA Bag Display Campaign
但是,如果GA
可能被嵌入句子中而不是一开始,那么如果您提供类似的数据将很有帮助。这样,就可以确定GA
是在单词中,还是单独存在,还是希望找到合适的解决方案
答案 2 :(得分:1)
如果您有数据框:
df = pd.DataFrame({'x': ['A0', 'A1', 'B2', 'A3'],
'y': ['B0', 'B1', 'B2', 'B3'],
'z': ['A0', 'C1', 'C2', 'C3'],
'w': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
外观如下:
假设您要在A
列中创建包含x
的行。
str.contains
:您可以这样做:
df[df['x'].str.contains('A')]
df[['A' in each for each in df['x']]]
就足够了。
apply()
:如果您对apply()
感兴趣,可以这样做:
df[df['x'].apply(lambda x: 'A' in x)]
所有这些方法都会为您提供:
最后的笔记 一般来说,
str.contains
方法: df[df[name_of_column_which_should_contain_something].str.contains(what_should_it_contain)]
df[[what_to_search_for in each for each in df[whichcolumn]]]
apply()
方法:
df[df[which_column_to_search_in].apply(lambda x: what_to_search_for in x)]