Question

我有一个pandas DataFrame，我想通过其中一个列的子串分组。子串在另一个pandas Series（或列表）中给出。我尝试了很多东西，但我根本无法让它发挥作用。我有这个： tst = pd.DataFrame（{'id'：[0,11,222,3333,44444]， 'bla'：['ab'，'ba'，'ca'，'bc'，'db']}） test = pd.Series（['a'，'b'，'c'，'d']）我想根据tst ['bla']中是否包含'a'，'b'，'c'，'d'（来自测试）来对tst进行分组。

Answer 1

df.apply（）在这里最好。

import pandas as pd
def funcx(x, test_str):
    return test_str in x['bla']


tst = pd.DataFrame({'id': [0, 11, 222, 3333, 44444],
                'bla': ['ab', 'ba', 'ca', 'bc', 'db']})
test = pd.Series(['a', 'b', 'c', 'd'])
result = {}
for xstring in test:
    result[xstring] = tst.apply(funcx, args=( xstring), axis=1)

print result

给我们;

{'a': 0     True
1     True
2     True
3    False
4    False
dtype: bool, 'c': 0    False
1    False
2     True
3     True
4    False
dtype: bool, 'b': 0     True
1     True
2    False
3     True
4     True
dtype: bool, 'd': 0    False
1    False
2    False
3    False
4     True
dtype: bool}

然后可以用它来选择相关的行;

>>print tst[result['a']]
  bla   id
  0  ab    0
  1  ba   11
  2  ca  222

使用一系列子串的DataFrame的pandas groupby

1 个答案: