Pandas mean()表示多索引

时间:2017-05-06 07:15:17

标签: python pandas

我有df:

CU           Parameters           1       2       3
379-H   Output Energy, (Wh/h)   0.045   0.055   0.042
349-J   Output Energy, (Wh/h)   0.001   0.003   0
625-H   Output Energy, (Wh/h)   2.695   1.224   1.272
626-F   Output Energy, (Wh/h)   1.381   1.494   1.3

我想创建两个单独的dfs,通过在0级(CU)上对索引进行分组来获取列值的平均值:

df1:(379-H和625-H)

Parameters                1     2      3
Output Energy, (Wh/h)    1.37   0.63   0.657

df2 :(其余的)

Parameters                 1     2      3
Output Energy, (Wh/h)     0.69  0.74   0.65

我可以通过分组级别1获得所有使用的均值:

df = df.apply(pd.to_numeric, errors='coerce').dropna(how='all').groupby(level=1).mean()

但如何根据0级对这些进行分组?

解决方案:

lightsonly = ["379-H", "625-H"]
df = df.apply(pd.to_numeric, errors='coerce').dropna(how='all')
mask = df.index.get_level_values(0).isin(lightsonly)
df1 = df[mask].groupby(level=1).mean()
df2 = df[~mask].groupby(level=1).mean()

3 个答案:

答案 0 :(得分:2)

考虑数据框df,其中假设CUParameters位于索引中。

                                 1      2      3
CU    Parameters                                
379-H Output Energy, (Wh/h)  0.045  0.055  0.042
349-J Output Energy, (Wh/h)  0.001  0.003  0.000
625-H Output Energy, (Wh/h)  2.695  1.224  1.272
626-F Output Energy, (Wh/h)  1.381  1.494  1.300

然后我们可以根据第一级值是否在列表['379-H', '625-H']中的真值进行分组。

m = {True: 'Main', False: 'Rest'}
l = ['379-H', '625-H']
g = df.index.get_level_values('CU').isin(l)
df.groupby(g).mean().rename(index=m)

          1       2      3
Rest  0.691  0.7485  0.650
Main  1.370  0.6395  0.657

答案 1 :(得分:2)

使用get_level_values + isin获取TrueFalse索引,然后rename获取dict d = {True: '379-H and 625-H', False: 'the rest'} df.index = df.index.get_level_values(0).isin(['379-H', '625-H']) df = df.mean(level=0).rename(d) print (df) 1 2 3 the rest 0.691 0.7485 0.650 379-H and 625-H 1.370 0.6395 0.657 :< / p>

dfs

对于单独的mask= df.index.get_level_values(0).isin(['379-H', '625-H']) df1 = df[mask].mean().rename('379-H and 625-H').to_frame().T print (df1) 1 2 3 379-H and 625-H 1.37 0.6395 0.657 df2 = df[~mask].mean().rename('the rest').to_frame().T print (df2) 1 2 3 the rest 0.691 0.7485 0.65 ,也可以使用mean

numpy

使用DataFrame构造函数的另一个a1 = df[mask].values.mean(axis=0) #alternatively #a1 = df.values[mask].mean(axis=0) df1 = pd.DataFrame(a1.reshape(-1, len(a1)), index=['379-H and 625-H'], columns=df.columns) print (df1) 1 2 3 379-H and 625-H 1.37 0.6395 0.657 解决方案:

select Outlets.OutletName,
    avg(datediff(ms, Orders.OrderDate, ReceivedOrders.ReceivingDate)) / 60000 as Receive
from dbo.Orders
inner join dbo.Outlets on dbo.Orders.OutletID = dbo.Outlets.OutletID
inner join dbo.ReceivedOrders on dbo.Orders.OrderID = dbo.ReceivedOrders.OrderID
group by dbo.Outlets.OutletName

答案 2 :(得分:1)

*** 1 failure is detected in the test module "cmPTP Test Suite"
# cmPTP-manager-test -l all --run_test=comms_rproc_interface/constructor
Running 1 test case...
Entering test module "cmPTP Test Suite"
interface_test.cpp(17): Entering test suite "comms_rproc_interface"
interface_test.cpp(19): Entering test case "constructor"
interface_test.cpp(23): info: check master_pty != -1 has passed
interface_test.cpp(24): info: check grantpt(master_pty) != -1 has passed
interface_test.cpp(25): info: check unlockpt(master_pty) != -1 has passed
interface_test.cpp(32): info: check slave_pty != -1 has passed
interface_test.cpp(72): info: check in_data.size() == bytes_written has passed
interface_test.cpp(77): error: in "comms_rproc_interface/constructor": check bytes_read == bytes_written has failed [0 != 4]
interface_test.cpp(19): Leaving test case "constructor"; testing time: 15282us
interface_test.cpp(17): Leaving test suite "comms_rproc_interface"; testing time: 15931us
Leaving test module "cmPTP Test Suite"; testing time: 16879us

*** 1 failure is detected in the test module "cmPTP Test Suite"