我有一个这样的数据框:
evt pcle bin_0 bin_1 bin_2 ... bin_49
1 pi 1 0 0 0
1 pi 1 0 0 0
1 k 0 0 0 1
1 pi 0 0 1 0
2 pi 0 0 1 0
2 k 0 1 0 0
3 J 0 1 0 0
3 pi 0 0 0 1
3 pi 1 0 0 0
3 k 0 1 0 0
...
5000 J 0 0 1 0
5000 pi 0 1 0 0
5000 k 0 0 0 1
有了这些信息,我想创建其他几个数据框df_ {evt}(或者也许字典应该更好?):
df_1 :
pcle cant bin_0 bin_1 bin_2 ... bin_49
pi 3 2 0 1 0
k 1 0 0 0 1
df_2 :
pcle cant bin_0 bin_1 bin_2 ... bin_49
pi 1 0 0 1 0
k 0 1 0 0 0
总共会有5000个数据帧(每个evt 1个),其中每个数据帧:
*the column "cant" has the ocurrences of "pcle" in the particular "evt".
*bin_0 ... bin_49 have the sum of the values for this particular "pcle" in
the particular "evt".
哪个是实现此目标的最佳方法?
答案 0 :(得分:1)
这是一个可能的解决方案:
import pandas as pd
import numpy as np
columns = ["evt", "pcle", "bin_0", "bin_1", "bin_2", "bin_3"]
data = [[1, "pi", 1, 0, 0, 0],
[1, "pi", 0, 0, 0, 0],
[1, "k", 0, 0, 0, 1],
[1, "pi", 0, 0, 1, 0],
[2, "pi", 0, 0, 1, 0],
[2, "k", 0, 1, 0, 0],
[3, "J", 0, 1, 0, 0],
[3, "pi", 0, 0, 0, 1],
[3, "pi", 1, 0, 0, 0],
[3, "k", 0, 1, 0, 0]]
df = pd.DataFrame(data=data, columns=columns)
# group your data by the columns you want
grouped = df.groupby(["evt", "pcle"])
# compute the aggregates for the bin_X
df_t = grouped.aggregate(np.sum)
# move pcle from index to column
df_t.reset_index(level=["pcle"], inplace=True)
# count occurrences of pcle
df_t["cant"] = grouped.size().values
# filter evt with .loc
df_t.loc[1]
如果要将其制成字典,则可以运行:
d = {i:j.reset_index(drop=True) for i, j in df_t.groupby(df_t.index)}