Question

所以我有一个如下所示的DataFrame：

In [5]: import pandas as pd, numpy as np
np.random.seed(seed=43525)
descriptors = 'abcdefghi'
df = pd.DataFrame([{'Value':np.random.randint(0,100), 
                       'Group':descriptors[np.random.randint(0, len(descriptors)): 
                                           np.random.randint(0, len(descriptors))]} for i in range(0,10)])
print(df)

  Group  Value
0            4
1   abc     37
2  efgh     99
3     a     67
4           37
5           52
6           46
7     b     41
8     d     17
9           36

描述符列表中的每个字符应该成为它自己的组（以及空组）。我怎么能执行groupby来完成这个？

因此组'a'将包含索引1和3，组'b'将包含索引1和7等。这是使用groupby的相当非标准的方法（如果它可以完成它）所以我不知道该怎么办。

Answer 1

建立Edchum答案我想出了以下内容。该结构也类似于groupby对象的结构：

indices = {}
for val in np.unique(''.join(df.Group.values)):
    indices[val] = df[df.Group.str.contains(val)]
print(indices)

给出以下格式错误但正确的答案：

{'a':   Group  Value
1   abc     37
3     a     67, 'c':   Group  Value
1   abc     37, 'b':   Group  Value
1   abc     37
7     b     41, 'e':   Group  Value
2  efgh     99, 'd':   Group  Value
8     d     17, 'g':   Group  Value
2  efgh     99, 'f':   Group  Value
2  efgh     99, 'h':   Group  Value
2  efgh     99}

Answer 2

听起来你真正想要的是MultiIndex。 groupby会为您提供独特的群组 - 基本上是您在Group列中的内容，但MultiIndex会让您更接近您想要的内容。

例如，

descriptors = 'abcdefghi'
df = pd.DataFrame([{'Value':np.random.randint(0,100), 
                       'Group':descriptors[np.random.randint(0, len(descriptors)): 
                                           np.random.randint(0, len(descriptors))]} for i in range(0,10)])

groups = df.Group.map(lambda x : tuple(desc if desc in x else '-' for desc in descriptors))
df.index = pd.MultiIndex.from_tuples(groups.values, names=list(descriptors))
df

Out[4]: 
                  Group  Value
a b c d e f g h i             
- - - - - - - - -            4
a b c - - - - - -   abc     37
- - - - e f g h -  efgh     99
a - - - - - - - -     a     67
- - - - - - - - -           37
                -           52
                -           46
  b - - - - - - -     b     41
  - - d - - - - -     d     17
      - - - - - -           36

现在，您可以使用df.xs或df.ix选择数据。例如，如果您希望所有组中都包含“a”和“c”，则可以使用

df.xs(('a', 'c'), level=('a', 'c'))
Out[5]: 
              Group  Value
b d e f g h i             
b - - - - - -   abc     37

同样，您可以选择包含“b”

的所有组

df.xs('b', level='b')
Out[7]: 
                Group  Value
a c d e f g h i             
a c - - - - - -   abc     37
- - - - - - - -     b     41

要选择未分组的行，您可以使用

df.sort_index(inplace=True) #index must be sorted 
df.ix[('-',) * len(descriptors)]
Out[10]: 
                  Group  Value
a b c d e f g h i             
- - - - - - - - -            4
                -           37
                -           52
                -           46
                -           36

注意：我使用' - '作为填充字符，但这不是必需的。

使用Pandas groupby将每一行拆分成多个组？

2 个答案: