Question

我有一个看起来像这样的数据框（df）

category | amount | freq
green         10     1
blue          5      2
orange        7      3
purple        5      4

我想只选择'frequency'和'amount'列，以及除紫色列之外的所有行

我知道我可以使用df.ix来选择像这样的列

df.ix[['green','blue','orange'],['freq','amount']]

但是，如何获取类别列中的唯一值，并选择不是紫色的列？

df.set_index(['category'])

更新

请参阅Roman Pekar的过滤掉您不想要的行的解决方案。

对于多行创建一个系列或一个列表（即account_group）并像这样引用它。

names = sorted_data[sorted_data.account.isin(account_group)]

以这种方式完成名称是一个数据框。

然而，这是类似但不正确的语法，这将返回一个系列。

names = sorted_data['account'].isin(account_group)

Answer 1

>>> df
  category  amount  freq
0    green      10     1
1     blue       5     2
2   orange       7     3
3   purple       5     4

>>> df[df['category'] != 'purple'][['amount','freq']]
   amount  freq
0      10     1
1       5     2
2       7     3

更新不确定我是否正确理解了OP，但他也想要 by subtracting lists: the first list is all the rows in the dataframe, the second is purple, and the third would be list-one minus list-two which would be green, blue, orange 。因此另一个解决方案：

>>> l1
['green', 'blue', 'orange', 'purple']
>>> l2
['purple']
>>> l3 = [x for x in l1 if x not in l2]
>>> l3
['green', 'blue', 'orange']
>>> df[df['category'].isin(l3)][['amount','freq']]
   amount  freq
0      10     1
1       5     2
2       7     3

切片pandas数据帧按列显示除提供的列以外的所有内容

1 个答案: