在另一列上使用groupby后更改列的值(pandas dataframe)

时间:2017-11-08 16:07:42

标签: python pandas pandas-groupby

我有两个数据框,一个是地点坐标

coord = pd.DataFrame()  
coord['Index'] = ['A','B','C']
coord['x'] = np.random.random(coord.shape[0])  
coord['y'] = np.random.random(coord.shape[0])


coord 

    Index   x   y
0   A   0.888025    0.376416
1   B   0.052976    0.396243
2   C   0.564862    0.30138

并且在地方测量了几个值

df = pd.DataFrame()
df['Index'] = ['A','A','B','B','B','C','C','C','C']
df['Value'] = np.random.random(df.shape[0])

df
    Index   Value
    0   A   0.930298
    1   A   0.144550
    2   B   0.393952
    3   B   0.680941
    4   B   0.657807
    5   C   0.704954
    6   C   0.733328
    7   C   0.099785
    8   C   0.871678

我想找到一种将坐标分配给df数据帧的有效方法。目前我已经尝试了

df['x'] = np.zeros(df.shape[0])
df['y'] = np.zeros(df.shape[0])
for i in df.Index.unique():
    df.loc[df.Index == i, 'x'] = coord.loc[coord.Index == i,'x'].values
    df.loc[df.Index == i, 'y'] = coord.loc[coord.Index == i,'y'].values

有效和产量

Index   Value   x   y
0   A   0.220323    0.983739    0.121289
1   A   0.115075    0.983739    0.121289
2   B   0.432688    0.809586    0.639811
3   B   0.106178    0.809586    0.639811
4   B   0.259465    0.809586    0.639811
5   C   0.804018    0.827192    0.156095
6   C   0.552053    0.827192    0.156095
7   C   0.412345    0.827192    0.156095
8   C   0.235106    0.827192    0.156095

但这很邋and,效率很低。我尝试使用像这样的groupby操作

df['x'] =np.zeros(df.shape[0])
df['y'] =np.zeros(df.shape[0])
gb = df.groupby('Index')
for k in gb.groups.keys():
    gb.get_group(k)['x'] = coord.loc[coord.Index == i ,'x']
    gb.get_group(k)['y'] = coord.loc[coord.Index == i ,'y']

但我在这里收到此错误

/anaconda/lib/python2.7/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

我理解这个问题,但我不知道如何克服它。

有什么建议吗?

1 个答案:

答案 0 :(得分:1)

merge是您正在寻找的。

df

  Index     Value
0     A  0.930298
1     A  0.144550
2     B  0.393952
3     B  0.680941
4     B  0.657807
5     C  0.704954
6     C  0.733328
7     C  0.099785
8     C  0.871678

coord

  Index         x         y
0     A  0.888025  0.376416
1     B  0.052976  0.396243
2     C  0.564862  0.301380

df.merge(coord, on='Index')

  Index     Value         x         y
0     A  0.930298  0.888025  0.376416
1     A  0.144550  0.888025  0.376416
2     B  0.393952  0.052976  0.396243
3     B  0.680941  0.052976  0.396243
4     B  0.657807  0.052976  0.396243
5     C  0.704954  0.564862  0.301380
6     C  0.733328  0.564862  0.301380
7     C  0.099785  0.564862  0.301380
8     C  0.871678  0.564862  0.301380