Question

我编写了一个读取多个文件的代码，但是在我的某些文件中，日期时间交换了一天和一天。当每天小于13时，从第13天或以上，即13/06/11的任何一天保持正确（DD / MM / YY）。我尝试通过这样做来解决它，但它没有用。

我的数据框如下所示：实际日期时间是2015年6月12日至2015年6月13日当我将我的日期时间列作为字符串读取时，日期保持正确dd / mm / yyyy

tmp                     p1 p2 
11/06/2015 00:56:55.060  0  1
11/06/2015 04:16:38.060  0  1
12/06/2015 16:13:30.060  0  1
12/06/2015 21:24:03.060  0  1
13/06/2015 02:31:44.060  0  1
13/06/2015 02:37:49.060  0  1

但是当我将列的类型更改为datetime列时，它会将每天的日期和月份换成小于13的日期。

输出：

print(df)
tmp                  p1 p2 
06/11/2015 00:56:55  0  1
06/11/2015 04:16:38  0  1
06/12/2015 16:13:30  0  1
06/12/2015 21:24:03  0  1
13/06/2015 02:31:44  0  1
13/06/2015 02:37:49  0  1

这是我的代码：

我循环浏览文件：

df = pd.read_csv(PATH+file, header = None,error_bad_lines=False , sep = '\t')

然后当我的代码完成读取我的所有文件时，我将它们连接起来，问题是我的datetime列需要处于日期时间类型，因此当我通过pd_datetime（）更改其类型时，它会交换当天的日期和月份不到13岁。

将我的日期时间列转换为日期正确（字符串类型）

print(tmp) # as a result I get 11.06.2015 12:56:05 (11june2015)

但是当我更改列类型时，我得到了这个：

tmp = pd.to_datetime(tmp, unit = "ns")
tmp = temps_absolu.apply(lambda x: x.replace(microsecond=0))
print(tmp) # I get 06-11-2016 12:56:05 (06november2015 its not the right date)

问题是：当日期少于13时，我应该使用或更改什么命令以停止日期和月份交换？

更新此命令交换列的所有日期和月份

tmp =  pd.to_datetime(tmp, unit='s').dt.strftime('%#m/%#d/%Y %H:%M:%S')

所以为了只交换错误的日期，我写了一个条件：

for t in tmp:
        if (t.day < 13):
            t = datetime(year=t.year, month=t.day, day=t.month, hour=t.hour, minute=t.minute, second = t.second)

但它不起作用

Answer 1

您可以使用dayfirst中的pd.to_datetime参数。

pd.to_datetime(df.tmp, dayfirst=True)

输出：

0   2015-06-11 00:56:55
1   2015-06-11 04:16:38
2   2015-06-12 16:13:30
3   2015-06-12 21:24:03
4   2015-06-13 02:31:44
5   2015-06-13 02:37:49
Name: tmp, dtype: datetime64[ns]

Answer 2

好吧我解决了我的问题，但是在内存消耗方法中，我首先将我的tmp列拆分为日期和时间列然后我将日期列重新拆分为日月和年，这样我就可以找到那些日子小于13并用相应的月份替换它们

df['tmp'] = pd.to_datetime(df['tmp'], unit='ns')
df['tmp'] = df['tmp'].apply(lambda x: x.replace(microsecond=0))
df['date'] = [d.date() for d in df['tmp']]
df['time'] = [d.time() for d in df['tmp']]
df[['year','month','day']] = df['date'].apply(lambda x: pd.Series(x.strftime("%Y-%m-%d").split("-")))

df['day'] = pd.to_numeric(df['day'], errors='coerce')
df['month'] = pd.to_numeric(df['month'], errors='coerce')
df['year'] = pd.to_numeric(df['year'], errors='coerce')


#Loop to look for days less than 13 and then swap the day and month
for index, d in enumerate(df['day']):
        if(d <13): 
 df.loc[index,'day'],df.loc[index,'month']=df.loc[index,'month'],df.loc[index,'day']

＃将系列转换为字符串类型以便合并它们

 df['day'] = df['day'].astype(str)
 df['month'] = df['month'].astype(str)
 df['year'] = df['year'].astype(str)
 df['date']=  pd.to_datetime(df[['year', 'month', 'day']])
 df['date'] = df['date'].astype(str)
 df['time'] = df['time'].astype(str)

＃将时间和日期合并到我们的专栏

中

df['tmp'] =pd.to_datetime(df['date']+ ' '+df['time'])

＃删除添加的列

df.drop(df[['date','year', 'month', 'day','time']], axis=1, inplace = True)

Answer 3

我遇到了同样的问题。在我的情况下，日期是索引列（称为“日期”）。上面提到的解决方案直接在具有索引列“ Date”的数据帧上使用to_datetime（）对我不起作用。我必须先使用read_csv（）而不将索引设置为“ Date”，然后在其上应用to_datetime（），然后才将索引设置为“ Date”。

df= pd.read_csv(file, parse_dates=True)
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df = df.set_index('Date')

Python Pandas：pandas.to_datetime（）正在切换日期和时间。当天不到13的月份

3 个答案: