根据另一个数据框

时间:2017-11-29 20:24:57

标签: python pandas dataframe spreadsheet

我有两个包含浓度数据和坐标的数据框:

浓度数据(浓):

    Sample  analParam                 Conc  Units
0   CW7-1   1,1,1-Trichloroethane     0     UG/L
1   CW7-1   1,1,2,2-Tetrachloroethane 0     UG/L
2   CW7-1   1,1,2-Trichloroethane     0     UG/L
3   CW7-1   1,1-Dichloroethane        0     UG/L
4   CW7-1   1,1-Dichloroethylene      0     UG/L
5   CW7-1   1,1-Dichloropropene       0     UG/L
6   CW7-1   1,2,3-Trichlorobenzene    0     UG/L
... ... ... ... ...
50311   VOA2-2  Tetrachloroethylene  1.8    MG/KG
50312   VOA2-2  Toluene              1.2    MG/KG
50313   VOA2-2  Trichloroethylene    1.8    MG/KG
50314   VOA2-2  Vinyl Chloride       1.8    MG/KG

协调数据(协调):

    Sample  x            y
0   CW7-1   320800.000  396500.000
1   CW7-2   320800.000  396500.000
2   CW7-3   320800.000  396500.000
3   FB06-17 0.000       0.000
4   FB06-18 0.000       0.000
5   FB06-19 0.000       0.000
6   FB07-08 0.000       0.000
... ... ... ...
453 TP21-1  318807.281  398547.485
454 TP21-2  318807.281  398547.485
455 TP24-1  318489.248  398544.797
456 VOA1-1  318500.582  398573.558
457 VOA1-2  318500.582  398573.558
458 VOA2-1  318536.337  398589.805
459 VOA2-2  318536.337  398589.805

我想在我的浓度数据框中添加两列,其中包含每个浓度的相应样品ID的所有坐标。例如,浓度数据中的前六行将具有x = 320800和y = 396500的列,因为它们都具有CW7-1的样品ID:

    Sample  analParam                 Conc  Units   x       y
0   CW7-1   1,1,1-Trichloroethane     0     UG/L    320800  396500   
1   CW7-1   1,1,2,2-Tetrachloroethane 0     UG/L    320800  396500
2   CW7-1   1,1,2-Trichloroethane     0     UG/L    320800  396500  
3   CW7-1   1,1-Dichloroethane        0     UG/L    320800  396500  
4   CW7-1   1,1-Dichloroethylene      0     UG/L    320800  396500  
5   CW7-1   1,1-Dichloropropene       0     UG/L    320800  396500

我尝试过使用double for循环,但由于我有这么多数据点,所以它的速度太慢了:

for index, row in conc.iterrows():
    for cindex, crow in coord.iterrows():
        if conc.iloc[index,0] == coord.iloc[cindex,0]:
            conc.at[index,4] = coord.iloc[cindex,1]
            conc.at[index,5] = coord.iloc[cindex,2]

我尝试过使用apply功能,但我一直都会遇到错误。对于这个演绎,我得到了TypeError:调用()需要1到2个位置参数,但是给出了3个。

def xcoord (i):
    for index, row in coord.iterrows():
        if i == coord.iloc[index,0] :
            return coord.iloc(index,4)
conc['Sample'].apply(xcoord)

1 个答案:

答案 0 :(得分:0)

谢谢温!

In[1]:
conc.merge(coord,on='Sample',how='left')

Out[1]:
Sample  analParam   Conc    Units   x   y
0   CW7-1   1,1,1-Trichloroethane   0   UG/L    320800.000  396500.000
1   CW7-1   1,1,2,2-Tetrachloroethane   0   UG/L    320800.000  396500.000
2   CW7-1   1,1,2-Trichloroethane   0   UG/L    320800.000  396500.000
3   CW7-1   1,1-Dichloroethane  0   UG/L    320800.000  396500.000
4   CW7-1   1,1-Dichloroethylene    0   UG/L    320800.000  396500.000
5   CW7-1   1,1-Dichloropropene 0   UG/L    320800.000  396500.000
6   CW7-1   1,2,3-Trichlorobenzene  0   UG/L    320800.000  396500.000
... ... ... ... ... ... ...
50311   VOA2-2  Tetrachloroethylene 1.8 MG/KG   318536.337  398589.805
50312   VOA2-2  Toluene 1.2 MG/KG   318536.337  398589.805
50313   VOA2-2  trans-1,3-Dichloropropene   1.8 MG/KG   318536.337  398589.805
50314   VOA2-2  Trichloroethylene   1.8 MG/KG   318536.337  398589.805
50315   VOA2-2  Vinyl Chloride  1.8 MG/KG   318536.337  398589.805
50316   VOA2-2  Xylenes (Total) 2.6 MG/KG   318536.337  398589.805