从一个数据框创建几个新的数据框或字典

时间:2019-02-16 22:36:05

标签: python pandas dataframe

我有一个这样的数据框:

evt    pcle    bin_0    bin_1    bin_2    ...    bin_49
 1      pi      1        0         0               0 
 1      pi      1        0         0               0 
 1      k       0        0         0               1 
 1      pi      0        0         1               0 
 2      pi      0        0         1               0 
 2      k       0        1         0               0 
 3      J       0        1         0               0 
 3      pi      0        0         0               1 
 3      pi      1        0         0               0 
 3      k       0        1         0               0 
 ...
 5000   J       0        0         1               0 
 5000   pi      0        1         0               0 
 5000   k       0        0         0               1

有了这些信息,我想创建其他几个数据框df_ {evt}(或者也许字典应该更好?):

df_1 : 
pcle    cant    bin_0    bin_1    bin_2   ...    bin_49        
 pi      3        2        0        1              0
  k      1        0        0        0              1

df_2 : 
pcle    cant    bin_0    bin_1    bin_2   ...    bin_49        
 pi      1        0        0        1              0
  k      0        1        0        0              0

总共会有5000个数据帧(每个evt 1个),其中每个数据帧:

*the column "cant" has the ocurrences of "pcle" in the particular "evt". 

*bin_0 ... bin_49 have the sum of the values for this particular "pcle" in 
 the particular "evt".

哪个是实现此目标的最佳方法?

1 个答案:

答案 0 :(得分:1)

这是一个可能的解决方案:

import pandas as pd
import numpy as np
columns = ["evt", "pcle", "bin_0", "bin_1", "bin_2", "bin_3"]
data = [[1, "pi", 1, 0, 0, 0],
        [1, "pi", 0, 0, 0, 0],
        [1, "k", 0, 0, 0, 1],
        [1, "pi", 0, 0, 1, 0],
        [2, "pi", 0, 0, 1, 0],
        [2, "k", 0, 1, 0, 0],
        [3, "J", 0, 1, 0, 0],
        [3, "pi", 0, 0, 0, 1],
        [3, "pi", 1, 0, 0, 0],
        [3, "k", 0, 1, 0, 0]]

df = pd.DataFrame(data=data, columns=columns)

# group your data by the columns you want
grouped = df.groupby(["evt", "pcle"])

# compute the aggregates for the bin_X
df_t = grouped.aggregate(np.sum)

# move pcle from index to column
df_t.reset_index(level=["pcle"], inplace=True)

# count occurrences of pcle
df_t["cant"] = grouped.size().values

# filter evt with .loc
df_t.loc[1]

如果要将其制成字典,则可以运行:

d = {i:j.reset_index(drop=True) for i, j in df_t.groupby(df_t.index)}