这是我的数据框:
cols = ['Country', 'Year', 'Orange', 'Apple', 'Plump']
data = [['US', 2008, 17, 29, 19],
['US', 2009, 11, 12, 16],
['US', 2010, 14, 16, 38],
['Spain', 2008, 11, None, 33],
['Spain', 2009, 12, 19, 17],
['France', 2008, 17, 19, 21],
['France', 2009, 19, 22, 13],
['France', 2010, 12, 11, 0],
['France', 2010, 0, 0, 0],
['Italy', 2009, None, None, None],
['Italy', 2010, 15, 16, 17],
['Italy', 2010, 0, None, None],
['Italy', 2011, 42, None, None]]
我想选择橙色苹果和丰满不仅仅由“无”组成的行,只有0或混合它们。所以结果输出应该是:
Country Year Orange Apple Plump
0 US 2008 17.0 29.0 19.0
1 US 2009 11.0 12.0 16.0
2 US 2010 14.0 16.0 38.0
3 Spain 2008 11.0 NaN 33.0
4 Spain 2009 12.0 19.0 17.0
5 France 2008 17.0 19.0 21.0
6 France 2009 19.0 22.0 13.0
7 France 2010 12.0 11.0 0.0
10 Italy 2010 15.0 16.0 17.0
12 Italy 2011 42.0 NaN NaN
其次,我想放弃我三年没有观察到的国家。因此产生的产出应该只包括我们和法国。我怎么能得到它们? 我尝试过类似的东西:
df = df[(df['Orange'].notnull())| \
(df['Apple'].notnull()) | (df['Plump'].notnull()) | (df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0)]
我也试过了:
df = df[((df['Orange'].notnull())| \
(df['Apple'].notnull()) | (df['Plump'].notnull())) & ((df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0))]
答案 0 :(得分:6)
In [307]: df[~df[['Orange','Apple','Plump']].fillna(0).eq(0).all(1)]
Out[307]:
Country Year Orange Apple Plump
0 US 2008 17.0 29.0 19.0
1 US 2009 11.0 12.0 16.0
2 US 2010 14.0 16.0 38.0
3 Spain 2008 11.0 NaN 33.0
4 Spain 2009 12.0 19.0 17.0
5 France 2008 17.0 19.0 21.0
6 France 2009 19.0 22.0 13.0
7 France 2010 12.0 11.0 0.0
10 Italy 2010 15.0 16.0 17.0
12 Italy 2011 42.0 NaN NaN
答案 1 :(得分:1)
没有值将被读取为NaN,因此您可以替换0并将它们转换为NaN。之后你可以做MaxU建议你做的事。那将是这样的:
In: df = df.replace(0,np.nan)
df = df[df[['Orange','Apple','Plump']].notnull().any(1)]
Out:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
3 Spain 2008 11 NaN 33
4 Spain 2009 12 19 17
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
10 Italy 2010 15 16 17
12 Italy 2011 42 NaN NaN
对于您理解的第二个问题,在这种情况下,您希望摆脱对2008,2009,2010没有观察的国家/地区。 为此你可以做类似的事情:
countries = []
for group,values in enumerate(df.groupby('Country')):
lista = values[1].Year.unique() == [2008,2009,2010]
if (np.all(lista)):
countries.append(values[0])
df = df[df.Country.isin(countries)]
这会产生类似的结果:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
8 France 2010 NaN NaN NaN
最后,您可以同时应用两种解决方案:
df[df[['Orange','Apple','Plump']].notnull().any(1) & df.Country.isin(countries)])
获得:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN