Question

我想知道如何通过一个函数聚合分组的pandas数据帧中的数据，其中我考虑了存储在数据帧的某些列中的值。这在操作顺序很重要的操作中很有用，例如除法。

例如我有：

Sub simpleCellRegex()
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim matches As MatchCollection
    Dim i As Long, cnt As Long


    strPattern = "[A-Z]{1,3}[0-9]{2,4}"
    cnt = 1

    If strPattern <> "" Then
        strInput = ActiveCell.Value
        strReplace = ""

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.test(strInput) Then
         Set objMatches = regEx.Execute(strInput)
         For i = 0 To objMatches.Count - 1
            ActiveCell.Offset(cnt).Value = objMatches.Item(i)
            cnt = cnt + 1
         Next
        End If

    End If

End Sub

我希望按类分组，并为每个In [8]: df Out[8]: class cat xer 0 a 1 2 1 b 1 4 2 c 1 9 3 a 2 6 4 b 2 8 5 c 2 3将class对应xer的值除以cat == 1。换句话说，最终输出中的条目应为：

cat == 2

这可以使用groupby吗？我不知道如何在没有手动迭代每个类的情况下完成它，即使这样也不干净或有趣。

Answer 1

没有做任何太聪明的事情：

In [11]: one = df[df["cat"] == 1].set_index("class")["xer"]

In [12]: two = df[df["cat"] == 2].set_index("class")["xer"]

In [13]: one / two
Out[13]:
class
a    0.333333
b    0.500000
c    3.000000
Name: xer, dtype: float64

Answer 2

根据您的DataFrame，您可以使用以下内容：

df.groupby('class').agg({'xer': lambda L: reduce(pd.np.divide, L)})

这给了你：

            xer
class          
a      0.333333
b      0.500000
c      3.000000

这适合＆gt;每组2个（如果需要），但您可能希望确保您的df首先按cat排序，以确保它们以正确的顺序显示。

Answer 3

这是一种一步一步的方法：

# get cat==1 and cat==2 merged by class
grouped = df[df.cat==1].merge(df[df.cat==2], on='class')
# calculate div
grouped['div'] = grouped.xer_x / grouped.xer_y
# return the final dataframe
grouped[['class', 'div']]

产生：

  class       div
0     a  0.333333
1     b  0.500000
2     c  3.000000

Answer 4

您可能需要重新排列数据以便于查看：

df2 = df.set_index(['class', 'cat']).unstack()

>>> df2
       xer   
cat      1  2
class        
a        2  6
b        4  8
c        9  3

然后，您可以执行以下操作以获得所需的结果：

>>> df2.iloc[:,0].div(df2.iloc[:, 1])

class
a        0.333333
b        0.500000
c        3.000000
Name: (xer, 1), dtype: float64

大熊猫集团利用分裂

4 个答案: