我是pandas数据框架的新手,并提出以下问题。
我有3个数据帧来自读取CSV文件:
DataFrame 1的名称为WITH cte AS (
SELECT id, <other_columns>, sub.email
FROM your_table t
OUTER APPLY (SELECT DISTINCT email
FROM (VALUES (t.Email1)
,(t.Email2)
,(t.Email3)
,(t.Email4)
) s(email)
) sub
)
SELECT id, <other_columns>, STRING_AGG(cte.email, ';') AS concatenated_email
FROM cte
GROUP BY id,<other_columns>;
,其中包含以下条目:
pdDop
DataFrame 2名为DOP_WNC DOP_TOW DOP_NRSVS DOP_PDOP DOP_VDOP DOP_HDOP DOP_TDOP
1928 424800.0 4 5.81 5.36 2.24 2.72
1928 424801.0 4 5.81 5.36 2.24 2.72
1928 424802.0 4 5.80 5.35 2.24 2.72
1928 424803.0 4 5.80 5.35 2.24 2.72
1928 424804.0 4 5.80 5.35 2.24 2.72
1928 424805.0 4 5.80 5.35 2.24 2.72
,其中包含以下条目:
pdGeod
Dataframe 3名为GEOD_TOW GEOD_MODE GEOD_2D/3D GEOD_Error GEOD_NrSV GEOD_Latitude GEOD_Longitude GEOD_Height
424800.0 1 0 0 4 0.8874 0.0767 150.4975
424801.0 1 0 0 4 0.8874 0.0767 150.5277
424802.0 1 0 0 4 0.8874 0.0767 150.5579
424803.0 1 0 0 4 0.8874 0.0767 150.5931
424804.0 1 0 0 4 0.8874 0.0767 150.6214
,其中包含以下条目:
pdSatVis
我想创建一个数据框,它基于每个数据框中的VISIBILITY_TOW VISIBILITY_SVID VISIBILITY_AZIMUTH VISIBILITY_ELEVATION
426175.0 92 54.50 35.43
426175.0 100 108.22 26.00
426175.0 88 49.29 10.48
426175.0 89 278.29 17.39
426176.0 92 54.50 35.43
426176.0 100 108.22 26.00
426176.0 88 49.29 10.48
426176.0 89 278.29 17.39
426177.0 92 54.48 35.42
426177.0 100 108.23 25.98
426177.0 88 49.28 10.45
426177.0 89 278.27 17.38
426178.0 92 54.48 35.42
(周时间)列组合。请注意,最后一个数据框*_TOW
有几行,pdSatVis
值仅对应VISIBILTY_TOW
和pdDop
中的1行。
答案 0 :(得分:0)
您可以添加要合并的新列:
pdDop['TOW'] = pdDop['DOP_TOW']
pdGeod['TOW'] = pdGeod['GEOD_TOW']
pdSatVis['TOW'] = pdSatVis['VISIBILITY_TOW']
pd.merge(pd.merge(pdDop, pdGeod, how='outer'), pdSatVis, how='outer')
或提供明确合并的列:
m1 = pd.merge(pdDop, pdGeod, how='outer', left_on='DOP_TOW', right_on='GEOD_TOW')
pd.merge(m1, pdSatVis, how='outer', left_on='DOP_TOW', right_on='VISIBILITY_TOW')