可能只是一个简单的答案,所以提前致歉(最少的编码经验)。
我正在尝试从ANY列中删除具有特定字符串(经济7)的任何行,并一直试图退出该线程:
How to drop rows from pandas data frame that contains a particular string in a particular column?
无法正常工作,但在以前的DataFrame(现在的df = energy)上尝试了此代码,尽管现在出现了错误,但它似乎仍然有效:
no_eco = energy[~energy.apply(lambda series: series.str.contains('Economy 7')).any(axis=1)]
AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', 'occurred at index existingProductCodeGas')
有什么建议吗? ps DataFrame非常大。
谢谢
答案 0 :(得分:1)
您只能选择对象列,显然可以选择select_dtypes
的字符串:
df = energy.select_dtypes(object)
#added regex=False for improve performance like mentioned @jpp, thank you
mask = ~df.apply(lambda series: series.str.contains('Economy 7', regex=False)).any(axis=1)
no_eco = energy[mask]
示例:
energy = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('adabbb')
})
print (energy)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 d
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df = energy.select_dtypes(object)
mask = ~df.apply(lambda series: series.str.contains('d')).any(axis=1)
no_eco = energy[mask]
print (no_eco)
A B C D E F
0 a 4 7 1 5 a
2 c 4 9 5 6 a
4 e 5 2 1 2 b
5 f 4 3 0 4 b
答案 1 :(得分:0)
如果任何列包含特定字符串,我们可以使用to_string方法删除行
df.drop(df[df.apply(lambda row: 'Tony' in row.to_string(header=False), axis=1)].index, inplace=True)
完整的例子是
import pandas as pd
df = pd.DataFrame(columns = ['Name', 'Location'])
df.loc[len(df)] = ['Mathew', 'Houston']
df.loc[len(df)] = ['Tony', 'New York']
df.loc[len(df)] = ['Jerom', 'Los Angeles']
df.loc[len(df)] = ['Aby', 'Dallas']
df.loc[len(df)] = ['Elma', 'Memphis']
df.loc[len(df)] = ['Zack', 'Chicago']
df.loc[len(df)] = ['Lisa', 'New Orleans']
df.loc[len(df)] = ['Nita', 'Las Vegas']
df.drop(df[df.apply(lambda row: 'Tony' in row.to_string(header=False), axis=1)].index, inplace=True)
print(df)
输出:
Name Location
0 Mathew Houston
2 Jerom Los Angeles
3 Aby Dallas
4 Elma Memphis
5 Zack Chicago
6 Lisa New Orleans
7 Nita Las Vegas
[Finished in 1.4s]