我试图通过有条件地填充列来更新我的数据帧。我想将dataframe行中的值与元组的值进行比较,然后使用元组中的另一个值填充同一数据帧行的不同列。
例如:
foo = pd.DataFrame({"TIME":([1,1,2,2,3,3,4,4,5,5,6,6]),
"PLACE": (["place1","place2","place1","place2","place1","place2","place1","place2","place1","place2","place1","place2"]),
"Xcords" :(["","","","","","","","","","","",""]),
"Ycords" :(["","","","","","","","","","","",""])})
和一个带有地点及其x和y坐标的元组:
bar = [('place1','1','11'),('place3','3','33'),('place2','2','22')]
最后,我希望得到以下内容:
PLACE TIME Xcords Ycords
0 place1 1 1 11
1 place2 1 2 22
2 place1 2 1 11
3 place2 2 2 22
4 place1 3 1 11
5 place2 3 2 22
6 place1 4 1 11
7 place2 4 2 22
8 place1 5 1 11
9 place2 5 2 22
10 place1 6 1 11
11 place2 6 2 22
因此,如果数据帧“PLACE”列值与同一数据帧行中的元组第一个值匹配,则应使用元组的第2个和第3个值填充Xcords和Ycords。这应该适用于所有实例,因为它们可能出现多次。
这样的东西的正确语法是什么?我试过这个但是我的所有X_cords和Y_cords行最终都得到了相同的值:
for i in bar:
if any(foo["PLACE"]==i[0]):
foo.X_cords = i[1]
foo.Y_cords = i[2]
也可以避免for循环,因为两个数据集都非常大?
答案 0 :(得分:4)
是你想要的吗?
In [191]: bar_df = pd.DataFrame(bar, columns=['PLACE','Xcords','Ycords'])
In [192]: bar_df
Out[192]:
PLACE Xcords Ycords
0 place1 1 10
1 place3 3 30
2 place2 2 20
In [193]: pd.merge(foo[['PLACE','TIME']], bar_df, on='PLACE', how='left')
Out[193]:
PLACE TIME Xcords Ycords
0 place1 1 1 10
1 place2 1 2 20
2 place1 2 1 10
3 place2 2 2 20
4 place1 3 1 10
5 place2 3 2 20
6 place1 4 1 10
7 place2 4 2 20
8 place1 5 1 10
9 place2 5 2 20
10 place1 6 1 10
11 place2 6 2 20
或@Alexander提到:
In [235]: foo[['PLACE','TIME']].merge(bar_df, on='PLACE', how='left')
Out[235]:
PLACE TIME Xcords Ycords
0 place1 1 1 10
1 place2 1 2 20
2 place1 2 1 10
3 place2 2 2 20
4 place1 3 1 10
5 place2 3 2 20
6 place1 4 1 10
7 place2 4 2 20
8 place1 5 1 10
9 place2 5 2 20
10 place1 6 1 10
11 place2 6 2 20