Question

我是pandas数据框架的新手，并提出以下问题。

我有3个数据帧来自读取CSV文件：

DataFrame 1的名称为WITH cte AS ( SELECT id, <other_columns>, sub.email FROM your_table t OUTER APPLY (SELECT DISTINCT email FROM (VALUES (t.Email1) ,(t.Email2) ,(t.Email3) ,(t.Email4) ) s(email) ) sub ) SELECT id, <other_columns>, STRING_AGG(cte.email, ';') AS concatenated_email FROM cte GROUP BY id,<other_columns>;，其中包含以下条目：
```
pdDop
```
DataFrame 2名为DOP_WNC DOP_TOW DOP_NRSVS DOP_PDOP DOP_VDOP DOP_HDOP DOP_TDOP 1928 424800.0 4 5.81 5.36 2.24 2.72 1928 424801.0 4 5.81 5.36 2.24 2.72 1928 424802.0 4 5.80 5.35 2.24 2.72 1928 424803.0 4 5.80 5.35 2.24 2.72 1928 424804.0 4 5.80 5.35 2.24 2.72 1928 424805.0 4 5.80 5.35 2.24 2.72，其中包含以下条目：
```
pdGeod
```
Dataframe 3名为GEOD_TOW GEOD_MODE GEOD_2D/3D GEOD_Error GEOD_NrSV GEOD_Latitude GEOD_Longitude GEOD_Height 424800.0 1 0 0 4 0.8874 0.0767 150.4975 424801.0 1 0 0 4 0.8874 0.0767 150.5277 424802.0 1 0 0 4 0.8874 0.0767 150.5579 424803.0 1 0 0 4 0.8874 0.0767 150.5931 424804.0 1 0 0 4 0.8874 0.0767 150.6214，其中包含以下条目：
```
pdSatVis
```

我想创建一个数据框，它基于每个数据框中的VISIBILITY_TOW VISIBILITY_SVID VISIBILITY_AZIMUTH VISIBILITY_ELEVATION 426175.0 92 54.50 35.43 426175.0 100 108.22 26.00 426175.0 88 49.29 10.48 426175.0 89 278.29 17.39 426176.0 92 54.50 35.43 426176.0 100 108.22 26.00 426176.0 88 49.29 10.48 426176.0 89 278.29 17.39 426177.0 92 54.48 35.42 426177.0 100 108.23 25.98 426177.0 88 49.28 10.45 426177.0 89 278.27 17.38 426178.0 92 54.48 35.42（周时间）列组合。请注意，最后一个数据框*_TOW有几行，pdSatVis值仅对应VISIBILTY_TOW和pdDop中的1行。

Answer 1

您可以添加要合并的新列：

pdDop['TOW'] = pdDop['DOP_TOW']
pdGeod['TOW'] = pdGeod['GEOD_TOW']
pdSatVis['TOW'] = pdSatVis['VISIBILITY_TOW']
pd.merge(pd.merge(pdDop, pdGeod, how='outer'), pdSatVis, how='outer')

或提供明确合并的列：

m1 = pd.merge(pdDop, pdGeod, how='outer', left_on='DOP_TOW', right_on='GEOD_TOW')
pd.merge(m1, pdSatVis, how='outer', left_on='DOP_TOW', right_on='VISIBILITY_TOW')

合并三个数据帧

1 个答案: