Question

执行以下操作后，我将获得以下数据框：

train_X = icon[['property', 'room', 'date', 'month', 'amount']]
train_frame = train_X.groupby(['property', 'month', 'date', 'room']).median()
print(train_frame)

                          amount
property month date room          
1        6     6    2     3195.000
               12   3     2977.000
               18   2     3195.000
               24   3     3581.000
               36   2     3146.000
                    3     3321.500
               42   2     3096.000
                    3     3580.000
               54   2     3195.000
                    3     3580.000
               60   2     3000.000
               66   3     3810.000
               78   2     3000.000
               84   2     3461.320
                    3     2872.800
               90   2     3461.320
                    3     3580.000
               96   2     3534.000
                    3     2872.800
               102  3     3581.000
               108  3     3580.000
               114  2     3195.000

我的目标是根据（属性，月份，日期，房间）跟踪中位数我是这样做的：

big_list = [[property, month, date, room], ...]
test_list = [property, month, date, room]

if test_list == big_list:
    #I want to get the median amount wrt to that row which matches the test_list

我该怎么做？

我所做的是，尝试了以下方法...

count = 0
test_list = [2, 6, 36, 2]

for j in big_list:
    if test_list == j:
        break

    count += 1

现在，在获得计数后，如何在数据框中按计数访问中位数？他们有办法按索引访问数据帧吗？

请注意：

big_list是列表的列表，其中每个列表都是上述数据框中的[属性，月份，日期，房间]
test_list是一个传入列表，要与big_list匹配，以防出现这种情况。

Answer 1

回答最后一个问题： 他们是一种通过索引访问数据框的方法吗？

当然有-您应该使用df.iloc或loc 取决于您是否希望通过整数获得纯正（我想是这种情况）-您应该使用“ iloc”或例如字符串类型索引-然后就可以使用loc。

文档： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

编辑：回到问题。那么，我认为“金额”是您搜索的中位数。您可以对分组的数据框使用reset_index（）方法，例如

train_frame_reset = train_frame.reset_index()

然后您可以再次访问列名，因此也应该执行以下操作（假设j是找到的行的索引）：

train_frame_reset.iloc[j]['amount'] <- will give you median

Answer 2

如果我正确理解了您的问题，则完全不需要计数，您可以直接通过loc访问这些值。

看看：

A=pd.DataFrame([[5,6,9],[5,7,10],[6,3,11],[6,5,12]],columns=(['lev0','lev1','val']))

那你做到了：

test=A.groupby(['lev0','lev1']).median()

访问lev0 = 6和lev1 = 1组的中位数可以通过以下方式完成：

test.loc[6,5]

访问熊猫groupby（）函数

2 个答案: