如何更改数据框中保存的数据格式?

时间:2017-02-28 13:58:56

标签: python python-2.7 pandas

我有以下数据:

DF1

0     (AG, AD, AE)
1     (AG, AM, AF)
dtype: object

DF2

0    [99.0, 45.0, 99.0, 92.0, 140.0, 53.0, 185.0, 8...
1    [78.0, 52.0, 74.0, 29.0, 30.0, 57.0, 48.0, 39....

DF3

0    [19.0, 22.0, 13.0, 24.0, 70.0, 50.0, 185.0, 8...
1    [18.0, 33.0, 74.0, 29.0, 30.0, 77.0, 48.0, 39....

我想将这些系列保存为数据帧。如果我做df = pd.DataFrame({"TYPE-1":df1,"TYPE-2":df2,"TYPE-2":df2}),那么我明白了:

TYPE-1        TYPE-2                          TYPE-3
(AG, AD, AE)  [99.0, 45.0, 99.0, 92.0,...]    [78.0, 52.0, 74.0, 29.0, ...]
(AG, AM, AF)  [78.0, 52.0, 74.0, 29.0,...]    [18.0, 33.0, 74.0, 29.0,...]

如何将格式更改为此格式?:

TYPE-1        TYPE-2         TYPE-3
(AG, AD, AE)  99.0           78.0
(AG, AD, AE)  45.0           52.0
...

1 个答案:

答案 0 :(得分:1)

您需要numpy.repeat来创建新的重复列,并按chain.from_iterable展平其他列:

from itertools import chain
#sample from another solution
df1 = pd.DataFrame(dict(tups = [('A', 'B'), ('C', 'D')]))
df2 = pd.DataFrame(dict(lsts=[[1, 2, 3, 4], [5, 6, 7, 8]])) 
df3 = pd.DataFrame(dict(lsts=[[9, 10, 11, 12], [14, 15, 6, 4]]))


df2 = pd.DataFrame({
        "a": np.repeat(df1.tups.values, df2.lsts.str.len()),
        "b": list(chain.from_iterable(df2.lsts)),
        "c": list(chain.from_iterable(df3.lsts))})

print (df2)

        a  b   c
0  (A, B)  1   9
1  (A, B)  2  10
2  (A, B)  3  11
3  (A, B)  4  12
4  (C, D)  5  14
5  (C, D)  6  15
6  (C, D)  7   6
7  (C, D)  8   4