我跟随pandas dataframe:
data = DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' :[2,1,2,1,2,1,2,1]})
看起来像:
A B C
0 foo one 2
1 bar one 1
2 foo two 2
3 bar three 1
4 foo two 2
5 bar two 1
6 foo one 2
7 foo three 1
我需要的是计算A和B的每个独特组合的平均值。即:
A B C
foo one 2
foo two 2
foo three 1
mean = 1.66666667
并且输出'mean'计算每A
的值,即:
foo 1.666667
bar 1
我尝试过:
data.groupby(['A'], sort=False, as_index=False).mean()
但它让我回复:
foo 1.8
bar 1
有没有办法计算mean of only unique combinations
?怎么样?
答案 0 :(得分:1)
是。这是您想要的解决方案。首先,您创建组对应列以进行唯一组合A and B column
。在制作小组后,您会计算mean()
对应的A列。
您可以这样做:
from pandas import *
data = DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' :[2.0,1,2,1,2,1,2,1]})
data = data.groupby(['A','B'], sort=False, as_index=False).mean()
print data.groupby('A', sort=False, as_index=False).mean()
输出:
A C
0 foo 1.666667
1 bar 1.000000
当您data.groupby(['A'], sort=False, as_index=False).mean()
这样做时,它意味着您根据group_by
计算C column
A Column
的所有值foo 1.8 (9/8)
bar 1.0 (3/3)
。这就是它返回的原因
SELECT
WORKORDER.WONUM,
WORKORDER.PARENT,
WORKORDER.STATUS,
TO_CHAR(WORKORDER.REPORTDATE, 'DD-MON-YY') AS REPORTDATE,
TO_CHAR(WORKORDER.ACTSTART, 'DD-MON-YY') AS ACTSTART,
TO_CHAR(WORKORDER.ACTFINISH, 'DD-MON-YY') AS ACTFINISH,
WORKORDER.HASCHILDREN,
WORKORDER.ACTLABCOST,
WORKORDER.ACTMATCOST,
WORKORDER.ACTTOOLCOST,
WORKORDER.WOACCEPTSCHARGES,
WORKORDER.EXT_JOBCODE,
WORKORDER.WORKTYPE,
WORKORDER.DESCRIPTION,
WORKORDER.ACTSERVCOST,
WORKORDER.EXT_DISTWORKTYPE,
WORKORDER.LOCATION,
LOCATIONS.EXT_OFFICE,
LOCATIONS.EXT_STATECODE,
WORKORDER.OWNERGROUP,
CASE
WHEN LOCATIONS.EXT_SRV_POLYGON IN ('BOF', 'CDA', 'COL', 'DAV', 'GOS', 'KEL', 'KLF', 'LAG', 'LEC', 'MED', 'PUM', 'RIT', 'ROS', 'SAN', 'SPO')
THEN
CASE
WHEN WORKORDER.EXT_DISTWORKTYPE IN ('EC', 'ES', 'ET')
THEN 'WRONG POLYGON'
ELSE 'GAS'
END
WHEN LOCATIONS.EXT_SRV_POLYGON IN ('CDC', 'COC', 'DAC', 'DPC', 'GRC', 'KEC', 'LCC', 'OTC', 'PAC', 'SAC', 'SPC')
THEN
CASE
WHEN WORKORDER.EXT_DISTWORKTYPE IN ('GC', 'GS', 'GT')
THEN 'WRONG POLYGON'
ELSE 'ELECTRIC'
END
ELSE 'MISSING'
END AS TYPE,
TO_CHAR(WORKORDER.SCHEDSTART, 'DD-MON-YY') AS SCHEDSTART,
TO_CHAR(WORKORDER.SCHEDFINISH, 'DD-MON-YY') AS SCHEDFINISH,
TO_CHAR(WORKORDER.TARGCOMPDATE, 'DD-MON-YY') AS TARGCOMPDATE,
TO_CHAR(WORKORDER.TARGSTARTDATE, 'DD-MON-YY') AS TARGSTARTDATE,
WORKORDER.REPORTEDBY
FROM WORKORDER
INNER JOIN LOCATIONS
ON WORKORDER.LOCATION = LOCATIONS.LOCATION
WHERE ((WORKORDER.EXT_JOBCODE NOT LIKE 'A%') AND (WORKORDER.EXT_JOBCODE NOT LIKE 'B%') OR (WORKORDER.EXT_JOBCODE IS NULL))
AND WORKORDER.STATUS IN ('COMP', 'CLOSE') --COMMENT OUT FOR BLANKET WORKORDERS
--AND WORKORDER.WONUM LIKE 'B%' --FOR BLANKET WORKORDERS
AND WORKORDER.ACTFINISH > '01-FEB-15'--WORKORDER COMPLETED OR CLOSED INCLUDING WOS FROM CONVERSION THAT WERE OPEN / COMMENT OUT FOR BLANKET WOS
AND WORKORDER.SITEID = 'OPS'
--AND WORKORDER.EXT_DISTWORKTYPE IN ('EC','GC') --Only enable this line if I am running report for Lamont's request
--AND WORKORDER.ACTLABCOST != '0' --USED FOR TROUBLESHOOTING TO SEE LABORCOSTS ONLY
ORDER BY WORKORDER.WONUM;
--AND WORKORDER.EXT_JOBCODE NOT IN ('K008','K009','I006','I007','I008')--Per Rodeny not to worry about taking out these job codes since they are still being handled by gas and electric construction
--AND TO_CHAR(WORKORDER.ACTFINISH,'MM') = TO_CHAR(SYSDATE,'MM')-1
--AND TO_CHAR(WORKORDER.ACTFINISH, 'YY') = TO_CH`enter code here`AR(SYSDATE,'YY')
我认为你应该找到你的答案:) :)
答案 1 :(得分:1)
这与@ S_A的答案基本相同,但更简洁一点。
您可以使用以下方法计算A 和 B的均值:
In [41]: df.groupby(['A', 'B']).mean()
Out[41]:
C
A B
bar one 1
three 1
two 1
foo one 2
three 1
two 2
然后使用:
计算A
以上的平均值
In [42]: df.groupby(['A', 'B']).mean().groupby(level='A').mean()
Out[42]:
C
A
bar 1.000000
foo 1.666667
答案 2 :(得分:0)
这对我有用
test = data
test = test.drop_duplicates()
test = test.groupby(['A']).mean()
输出:
C
A
bar 1.000000
foo 1.666667