Question

我有多个csv文件，这些文件在同一时间段的不同时间点都有数据。当我尝试将数据集合并在一起时，我得到一个数据框，该数据框的第一点然后是第二点。我想查看同一行中每个点的值（假设它们同时出现）

    for file in envData:
            tmp_df = pd.read_csv(f'{enviormentDataPath}/{eventFolder}/{file}')
            tmp_df.set_index("time [UTC]", inplace=True)
            station=tmp_df.values[0][1]

            for header in list(tmp_df):
                if 'time' not in header:
                    tmp_df = tmp_df.rename(columns={header: f"{station}_{header}"})

            if env_df is None:
                env_df=tmp_df
            else:
                env_df=pd.merge(env_df,tmp_df, how='outer', on='time [UTC]')

示例CSV1：

time [utc], u [kt], v [kt]
2015-10-17 10:00:00, 12, -14
2015-10-17 11:00:00, 13, -13

示例CSV2：

time [utc], u [kt], v [kt]
2015-10-17 10:00:00, 11, -12
2015-10-17 11:00:00, 10, -13

然而，env_df=pd.merge(env_df,tmp_df, how='outer', on='time [UTC]')命令只是创建了一个看起来像这样的表：

time[utc]            sample1_u sample1_v sample2_u sample2_v
2015-10-17 10:00:00  12        -14       NaN       NaN
2015-10-17 11:00:00  13        -13       NaN       NaN
2015-10-17 10:00:00  NaN       NaN       11        -12
2015-10-17 11:00:00  NaN       NaN       10        -13

任何帮助或建议将不胜感激。

Answer 1

合并时，我无法重现您的问题，您的“时间[utc]”列不是日期时间格式吗？

使用python 3.8和pandas 1.0.3

# import pandas
import pandas as pd
# read sample 1
sample_1_df = pd.read_csv("sample_1.csv", parse_dates=['time [utc]'], infer_datetime_format=True)
# read sample 2
sample_2_df = pd.read_csv("sample_2.csv", parse_dates=['time [utc]'], infer_datetime_format=True)
# Show sample 1
sample_1_df.info()     
"""                                                
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   time [utc]  2 non-null      datetime64[ns]
 1    u [kt]     2 non-null      int64         
 2    v [kt]     2 non-null      int64         
dtypes: datetime64[ns](1), int64(2)
memory usage: 176.0 bytes
"""
# Show sample 2 df
sample_2_df.info()
"""                                                     
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   time [utc]  2 non-null      datetime64[ns]
 1    u [kt]     2 non-null      int64         
 2    v [kt]     2 non-null      int64         
dtypes: datetime64[ns](1), int64(2)
memory usage: 176.0 bytes
"""
# Merge sample_1 and sample_2 on the time [utc] column
pd.merge(sample_1_df, sample_2_df, on='time [utc]')                    
Out[17]: 
           time [utc]   u [kt]_x   v [kt]_x   u [kt]_y   v [kt]_y
0 2015-10-17 10:00:00         12        -14         11        -12
1 2015-10-17 11:00:00         13        -13         10        -13

请注意，列u [kt]和v [kt]现在具有后缀_x和_y。可以使用pd.merge

中的后缀关键字参数更改此设置

在DateTime上合并熊猫数据框

1 个答案: