Question

将多个文件串联成一个大文件后，datetime列的顺序不遵循原始文件。

我有许多.csv气象数据文件。一天一档。间隔5分钟。原始文件使用以下日期时间格式：24.03.2016 18:35。

我使用以下命令连接所有文件：

    globbed_files = glob.glob(path + "\*Raw2*.csv")
    data = []

    for csv in globbed_files:
       df = pd.read_csv(csv, encoding = "ISO-8859-1", header = 0, 
       low_memory=False)
       data.append(df) 

    combined = pd.concat(data, ignore_index=True, sort=True)
    combined['DateTime'] = pd.to_datetime(combined['DateTime'])
    combined.set_index('DateTime', inplace=True)
    combined.index = combined.index.map(lambda t: t.strftime('%d/%m/%Y %H:%M:%S'))

    combined.to_csv(path + "\year1.txt", sep='\t', header=True, index=True)

结果是三个文件。每个文件都包含特定年份的数据。我检查了原始文件中所有文件的日期时间顺序是否正确。

因为我不知道如何将原始的datetime格式转换为Python可以理解的DateTime格式，所以我手动进行了操作。我将datetime列复制到记事本中，添加第二个（：00），删除不必要的空格，替换所有“。”。使用'/'，最后将其复制粘贴回csv。为了确保在csv上，我再次将ecxel bult-in日期格式用于datetime列。新的日期时间格式为：24/03/2016 18:35:00。

接下来，使用新的日期时间格式，我将“年度文件”连接到最终的大文件中。

但是发生了什么事？ Python通过交换日期和月份来读取日期时间不一致。因此，08/03/2016 18:35:00可能会误读为第8个月和第3天，或者正确地读为第8个月的第3个月。现在，我的新文件未按照原始文件排序。

感谢您的帮助。

Answer 1

应简化解决方案，以向read_csv添加参数，最后通过DatetimeIndex.strftime将索引转换为自定义格式：

globbed_files = glob.glob(path + "\*Raw2*.csv")
data = []

for csv in globbed_files:
   df = pd.read_csv(csv, 
                    encoding = "ISO-8859-1", 
                    header = 0, 
                    low_memory=False,
                    parse_dates=['DateTime'], #convert column to datetimes
                    dayfirst=True,  #avoid inconsistency  for specify first value is day
                    index_col=['DateTime'] #create DatetimeIndex
                    )
   data.append(df) 

combined = pd.concat(data, sort=True)

combined.index = combined.index.strftime.strftime('%d/%m/%Y %H:%M:%S')

combined.to_csv(path + "\year1.txt", sep='\t', header=True, index=True)

Python中的datetime列顺序问题

1 个答案: