我有一些月度数据,我正在尝试使用Pandas进行总结,我需要计算每个月发生的唯一条目数。这是一些示例代码,显示了我正在尝试做的事情:
import pandas as pd
mnths = ['JAN','FEB','MAR','APR']
custs = ['A','B','C',]
testFrame = pd.DataFrame(index=custs, columns=mnths)
testFrame['JAN']['A'] = 'purchased Prod'
testFrame['JAN']['B'] = 'No Data'
testFrame['JAN']['C'] = 'Purchased Competitor'
testFrame['FEB']['A'] = 'purchased Prod'
testFrame['FEB']['B'] = 'purchased Prod'
testFrame['FEB']['C'] = 'purchased Prod'
testFrame['MAR']['A'] = 'No Data'
testFrame['MAR']['B'] = 'No Data'
testFrame['MAR']['C'] = 'Purchased Competitor'
testFrame['APR']['A'] = 'Purchased Competitor'
testFrame['APR']['B'] = 'purchased Prod'
testFrame['APR']['C'] = 'Purchased Competitor'
uniqueValues = pd.Series(testFrame.values.ravel()).unique()
#CODE TO GET COUNT OF ENTRIES IN testFrame BY UNIQUE VALUE
期望的输出:
JAN FEB MAR APR
purchased Prod ? ? ? ?
Purchased Competitor ? ? ? ?
No Data ? ? ? ?
我可以获取唯一值并使用正确的轴/列创建新的数据框
我从这里开始: Pandas: Counting unique values in a dataframe Find unique values in a Pandas dataframe, irrespective of row or column location
但仍然无法将输出完全转换为我需要的格式。我不太确定如何将df.groupby语法或df.apply语法应用于我正在使用的语法。
答案 0 :(得分:5)
填充是可选的。
In [40]: testFrame.apply(Series.value_counts).fillna(0)
Out[40]:
JAN FEB MAR APR
No Data 1 0 2 0
Purchased Competitor 1 0 1 2
purchased Prod 1 3 0 1
这是一个巧妙的应用技巧。我将创建一个函数并打印出传入的内容(甚至可以调试它们)。然后很容易看出发生了什么。
In [20]: def f(x):
....: print(x)
....: return x.value_counts()
....:
In [21]: testFrame.apply(f)
A purchased Prod
B No Data
C Purchased Competitor
Name: JAN, dtype: object
A purchased Prod
B No Data
C Purchased Competitor
Name: JAN, dtype: object
A purchased Prod
B purchased Prod
C purchased Prod
Name: FEB, dtype: object
A No Data
B No Data
C Purchased Competitor
Name: MAR, dtype: object
A Purchased Competitor
B purchased Prod
C Purchased Competitor
Name: APR, dtype: object
Out[21]:
JAN FEB MAR APR
No Data 1 NaN 2 NaN
Purchased Competitor 1 NaN 1 2
purchased Prod 1 3 NaN 1
[3 rows x 4 columns]
所以它执行此操作然后将它们连接在一起(使用正确的标签)
In [22]: testFrame.iloc[0].value_counts()
Out[22]:
purchased Prod 2
Purchased Competitor 1
No Data 1
dtype: int64
答案 1 :(得分:0)
li = [testFrame.ix[:,i].value_counts() for i in range(len(mnths))]
frame = pd.DataFrame(li, index=mnths)
frame.fillna(value=0).swapaxes(0,1)
Out[42]:
JAN FEB MAR APR
No Data 1 0 2 0
Purchased Competitor 1 0 1 2
purchased Prod 1 3 0 1
[3 rows x 4 columns]