我有两个维度的pandas数据框:'col1'和'col2'
我可以使用以下方法过滤这两列的某些值:
df[ (df["col1"]=='foo') & (df["col2"]=='bar')]
我有什么方法可以一次过滤两列?
我天真地尝试将数据帧的限制用于两列,但我对最后两部分的最佳猜测不起作用:
df[df[["col1","col2"]]==['foo','bar']]
让我犯这个错误
ValueError: Invalid broadcasting comparison [['foo', 'bar']] with block values
我需要这样做,因为列的名称,以及设置条件的列数会有所不同
答案 0 :(得分:1)
据我所知,熊猫没有办法让你做你想做的事。但是,虽然以下解决方案可能不是我最漂亮的,但您可以按如下方式压缩一组并行列表:
cols = ['col1', 'col2']
conditions = ['foo', 'bar']
df[eval(" & ".join(["(df['{0}'] == '{1}')".format(col, cond)
for col, cond in zip(cols, conditions)]))]
字符串连接会产生以下结果:
>>> " & ".join(["(df['{0}'] == '{1}')".format(col, cond)
for col, cond in zip(cols, conditions)])
"(df['col1'] == 'foo') & (df['col2'] == 'bar')"
然后使用eval
进行有效评估:
df[eval("(df['col1'] == 'foo') & (df['col2'] == 'bar')")]
例如:
df = pd.DataFrame({'col1': ['foo', 'bar, 'baz'], 'col2': ['bar', 'spam', 'ham']})
>>> df
col1 col2
0 foo bar
1 bar spam
2 baz ham
>>> df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond))
for col, cond in zip(cols, conditions)]))]
col1 col2
0 foo bar
答案 1 :(得分:1)
我想指出接受答案的替代方法,因为eval
不是解决此问题所必需的。
df = pd.DataFrame({'col1': ['foo', 'bar', 'baz'], 'col2': ['bar', 'spam', 'ham']})
cols = ['col1', 'col2']
values = ['foo', 'bar']
conditions = zip(cols, values)
def apply_conditions(df, conditions):
assert len(conditions) > 0
comps = [df[c] == v for c, v in conditions]
result = comps[0]
for comp in comps[1:]:
result &= comp
return result
def apply_conditions(df, conditions):
assert len(conditions) > 0
comps = [df[c] == v for c, v in conditions]
return reduce(lambda c1, c2: c1 & c2, comps[1:], comps[0])
df[apply_conditions(df, conditions)]
答案 2 :(得分:0)
我知道我在这一方上迟到了,但是如果你知道你的所有价值观都使用相同的标志,那么你可以使用functools.reduce
。我有一个类似64列的CSV,我不想复制和粘贴它们。这就是我解决的问题:
from functools import reduce
players = pd.read_csv('players.csv')
# I only want players who have any of the outfield stats over 0.
# That means they have to be an outfielder.
column_named_outfield = lambda x: x.startswith('outfield')
# If a column name starts with outfield, then it is an outfield stat.
# So only include those columns
outfield_columns = filter(column_named_outfield, players.columns)
# Column must have a positive value
has_positive_value = lambda c:players[c] > 0
# We're looking to create a series of filters, so use "map"
list_of_positive_outfield_columns = map(has_positive_value, outfield_columns)
# Given two DF filters, this returns a third representing the "or" condition.
concat_or = lambda x, y: x | y
# Apply the filters through reduce to create a primary filter
is_outfielder_filter = reduce(concat_or, list_of_positive_outfield_columns)
outfielders = players[is_outfielder_filter]
答案 3 :(得分:0)
发帖是因为我遇到了类似的问题,并找到了解决方案,尽管效率低下,但可以在一行中完成
cols, vals = ["col1","col2"],['foo','bar']
pd.concat([df.loc[df[cols[i]] == vals[i]] for i in range(len(cols))], join='inner')
这实际上是列之间的&
。要使各列中有一个|
,您可以省略join='inner'
并在末尾添加一个drop_duplicates()