当数据帧和元组值匹配

时间:2016-04-07 16:59:25

标签: python python-2.7 pandas dataframe tuples

我试图通过有条件地填充列来更新我的数据帧。我想将dataframe行中的值与元组的值进行比较,然后使用元组中的另一个值填充同一数据帧行的不同列。

例如:

foo = pd.DataFrame({"TIME":([1,1,2,2,3,3,4,4,5,5,6,6]),
                 "PLACE": (["place1","place2","place1","place2","place1","place2","place1","place2","place1","place2","place1","place2"]),
                 "Xcords" :(["","","","","","","","","","","",""]),
                 "Ycords" :(["","","","","","","","","","","",""])})

和一个带有地点及其x和y坐标的元组:

bar = [('place1','1','11'),('place3','3','33'),('place2','2','22')]

最后,我希望得到以下内容:

     PLACE  TIME Xcords Ycords
0   place1     1      1     11
1   place2     1      2     22
2   place1     2      1     11
3   place2     2      2     22
4   place1     3      1     11
5   place2     3      2     22
6   place1     4      1     11
7   place2     4      2     22
8   place1     5      1     11
9   place2     5      2     22
10  place1     6      1     11
11  place2     6      2     22

因此,如果数据帧“PLACE”列值与同一数据帧行中的元组第一个值匹配,则应使用元组的第2个和第3个值填充Xcords和Ycords。这应该适用于所有实例,因为它们可能出现多次。

这样的东西的正确语法是什么?我试过这个但是我的所有X_cords和Y_cords行最终都得到了相同的值:

for i in bar:
    if any(foo["PLACE"]==i[0]):
        foo.X_cords = i[1]
        foo.Y_cords = i[2]

也可以避免for循环,因为两个数据集都非常大?

1 个答案:

答案 0 :(得分:4)

是你想要的吗?

In [191]: bar_df = pd.DataFrame(bar, columns=['PLACE','Xcords','Ycords'])

In [192]: bar_df
Out[192]:
    PLACE Xcords Ycords
0  place1      1     10
1  place3      3     30
2  place2      2     20

In [193]: pd.merge(foo[['PLACE','TIME']], bar_df, on='PLACE', how='left')
Out[193]:
     PLACE  TIME Xcords Ycords
0   place1     1      1     10
1   place2     1      2     20
2   place1     2      1     10
3   place2     2      2     20
4   place1     3      1     10
5   place2     3      2     20
6   place1     4      1     10
7   place2     4      2     20
8   place1     5      1     10
9   place2     5      2     20
10  place1     6      1     10
11  place2     6      2     20

或@Alexander提到:

In [235]: foo[['PLACE','TIME']].merge(bar_df, on='PLACE', how='left')
Out[235]:
     PLACE  TIME Xcords Ycords
0   place1     1      1     10
1   place2     1      2     20
2   place1     2      1     10
3   place2     2      2     20
4   place1     3      1     10
5   place2     3      2     20
6   place1     4      1     10
7   place2     4      2     20
8   place1     5      1     10
9   place2     5      2     20
10  place1     6      1     10
11  place2     6      2     20