如何迭代行,将值存储在向量中并使用向量计算新列?

时间:2017-11-10 22:07:00

标签: python pandas

My Pandas dataframe df,如下所示:

          Column    

0     0 [
        { “weight": “40", “height": 4,”age”:13 },
        { “weight": “50", “height": 10,”age”:15 },
        { “weight": “30", “height": 5,”age”:25 },
        { “weight": “25", “height”:5,”age”:35 }
        ]

1     1 [
        { “weight": “60", “height": 6, “age":45 },
        { “weight": “80", “height": 8, “age”:30 },
        { “weight": “90", “height": 9, “age”:20 },
        { “weight": “70", “height": 7, “age”:50 }
        ]

输出:

        weight            height              New_column (compute Weight/Height )
0     (40,50,30,25)     (4,10,5,5)             (10,5,6,5)
1     (60,80,90,70)     (6,8,9,7)             (10,10,10,10)

有人可以为此写一个伪代码或算法吗?我想在熊猫中这样做。我想不出办法。

2 个答案:

答案 0 :(得分:0)

简化:

df   # original

                                              Column
0  [{'weight': '40', 'height': 4, 'age': 13}, {'w...
1  [{'weight': '60', 'height': 6, 'age': 45}, {'w...

df = pd.DataFrame(np.concatenate(df.Column).tolist()).astype(int)
df

   age  height weight
0   13       4     40
1   15      10     50
2   25       5     30
3   35       5     25
4   45       6     60
5   30       8     80
6   20       9     90
7   50       7     70

创建新列,并按4

的间隔进行分组
df['New_column'] = df.weight / df.height

g = df.groupby(df.index // 4 * 4)\
       ['weight', 'height', 'New_column'].agg(lambda x: tuple(x.values))

g
             weight         height                New_column
0  (40, 50, 30, 25)  (4, 10, 5, 5)     (10.0, 5.0, 6.0, 5.0)
4  (60, 80, 90, 70)   (6, 8, 9, 7)  (10.0, 10.0, 10.0, 10.0)

答案 1 :(得分:0)

您可以将数据保持为宽格式,并仍然可以获得所需的weight:height比率:

orig
                                             Columns
0  [{'weight': '40', 'height': 4, 'age': 13}, {'w...
1  [{'weight': '60', 'height': 6, 'age': 45}, {'w...

def extract(row, field):
    return [int(x[field]) for x in row.Columns]

df = orig.assign(weight=orig.apply(extract, args=("weight",), axis=1).values,
                 height=orig.apply(extract, args=("height",), axis=1).values)

df['ratio'] = df.apply(lambda x: pd.Series(x.weight)/pd.Series(x.height), 
                       axis=1).values.tolist()

df
          height            weight                     ratio
0  [4, 10, 5, 5]  [40, 50, 30, 25]     [10.0, 5.0, 6.0, 5.0]
1   [6, 8, 9, 7]  [60, 80, 90, 70]  [10.0, 10.0, 10.0, 10.0]