合并Python中的多索引系列列表

时间:2017-12-01 15:08:12

标签: python-3.x pandas numpy

我正在使用多个索引处理数千个pd.series,其中包含2个静态索引,一个动态索引,然后是时间戳:

start = np.concatenate((np.random.rand(3), [np.nan]*3))
end = np.concatenate(([np.nan]*3, np.random.rand(3)))

index1 = pd.MultiIndex(levels = [["X"], ["Y"], ["A"], ["d1","d2","d3","d4","d5","d6"]],
                       labels = [[0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,1,2,3,4,5]],
                       names = ["static1", "static2", "dynamo", "timestamps"])
i1_start = pd.Series(start, index=index1, name="col1")
i1_end = pd.Series(end, index=index1, name="col2")

index2 = index1 = pd.MultiIndex(levels = [["X"], ["Y"], ["B"], ["d1","d2","d3","d4","d5","d6"]],
                       labels = [[0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,1,2,3,4,5]],
                       names = ["static1", "static2", "dynamo", "timestamps"])
i2_start = pd.Series(start, index=index2, name="col1")
i2_end = pd.Series(end, index=index2, name="col2")

data = [i1_start, i1_end, i2_start, i2_end]
df = pd.DataFrame(data).T
df

以下是将其转换为数据框的结果:

                                    col1    col2    col1    col2
static1 static2 dynamo  timestamps              
X           Y      A          d1    0.248504    NaN NaN NaN
                              d2    0.424774    NaN NaN NaN
                              d3    0.333638    NaN NaN NaN
                              d4    NaN 0.987744    NaN NaN
                              d5    NaN 0.093231    NaN NaN
                              d6    NaN 0.918666    NaN NaN
                    B         d1    NaN NaN 0.248504    NaN
                              d2    NaN NaN 0.424774    NaN
                              d3    NaN NaN 0.333638    NaN
                              d4    NaN NaN NaN 0.987744
                              d5    NaN NaN NaN 0.093231
                              d6    NaN NaN NaN 0.918666

我正在寻找有关如何使用相同的series.namesconcat/merge/join对系列进行分组的建议,以便列排列,而不是只有空值的整个三角形。

2 个答案:

答案 0 :(得分:2)

我认为concatsum需要maxaxis=1需要参数level=0

data = [i1_start, i1_end, i2_start, i2_end]
df = pd.concat(data, 1).sum(axis=1, level=0)
#same as
#df = pd.concat(data, 1).groupby(axis=1, level=0).sum()

#alternative 
df = pd.concat(data, 1).max(axis=1, level=0)

print (df)
                                       col1      col2
static1 static2 dynamo timestamps                    
X       Y       A      d1          0.771148       NaN
                       d2          0.074757       NaN
                       d3          0.526310       NaN
                       d4               NaN  0.975088
                       d5               NaN  0.992226
                       d6               NaN  0.465135
                B      d1          0.771148       NaN
                       d2          0.074757       NaN
                       d3          0.526310       NaN
                       d4               NaN  0.975088
                       d5               NaN  0.992226
                       d6               NaN  0.465135

答案 1 :(得分:1)

这个怎么样?

df.fillna(0).sum(1)

即,将NaN替换为零,并将每行的所有列相加。