如何使用两个文件更改日期格式

时间:2018-05-20 00:44:58

标签: python pandas csv dataframe

我有两个表,想要使用pandas将它们转换成一个看起来完全像这样的表。订单必须相同,日期格式完全相同。

我的table1.csv

Tweet, Month, Day, Year
Hello World, 6, 2, 2013
I want ice-cream!, 7, 23, 2013
Friends will be friends, 9, 30, 2017
Done with school, 12, 12, 2017

我的table2.csv

Month, Day, Year, Hour, Tweet
January, 2, 2015, 12, Happy New Year
March, 21, 2016, 7, Today is my final
May, 30, 2017, 23, Summer is about to begin
July, 15, 2018, 11, Ocean is still cold

这是我到目前为止所做的:

import numpy as np

import pandas as pd

from datetime import *


df1=pd.read_csv('data1.csv', index_col=False, header=0)

df2=pd.read_csv('data2.csv', index_col=False, header=0)

#creating Date column from Day,Month and Year columns
df1['Date']= df1.apply(lambda x:datetime.strptime("{0} {1} {2}"
                .format(x['Year'],x['Month'], x['Day']), "%Y %m %d"),axis=1)



df2['Date']= df2.apply(lambda x:datetime.strptime("{0} {1} {2}"
                .format(x['Year'],x['Month'], x['Day']), "%Y %B %d"),axis=1)

#Selecting only desired columns
df1=df1[['Date','Tweet']]
df2=df2[['Date','Tweet']]

#combining both data frames
combine=df1.append(df2)

#Sort the data frame based on Date column.
combine.sort_values(by='Date', ascending=False, inplace=True)

#convert date to required format
combine['Date'] = combine['Date'].dt.strftime('%m-%b-%Y')

#writing to csv
combine.to_csv('combine.csv', encoding='utf-8', index=False)

这是我得到的输出:

Date,Tweet

07-Jul-2018,Ocean is still cold

12-Dec-2017,Done with school

09-Sep-2017,Friends will be friends

05-May-2017,Summer is about to begin

03-Mar-2016,Today is my final

01-Jan-2015,Happy New Year

07-Jul-2013,I want ice-cream!

06-Jun-2013,Hello World

显然,这一天是完全错误的,有人知道如何解决它吗?

2 个答案:

答案 0 :(得分:3)

你不能简单地

QUERY()

E.g:

df1['Date'] = pd.to_datetime(df1[['Year', 'Month', 'Day']])
df2['Month'] = df2.Month.apply(lambda x: datetime.strptime(x, '%B').month)
df2['Date'] = pd.to_datetime(df2[['Year', 'Month', 'Day']])

df = pd.concat([df1, df2])[['Date','Tweet']]

答案 1 :(得分:0)

  • pd.to_datetime相关列
  • 上使用df1
  • 将年份,月份,日期拼接成一个字符串,然后转到pd.to_datetime df2
  • 使用pd.concat加入
  • 使用assignlambda用格式化字符串覆盖Date
pd.concat([
    df1[['Tweet']].assign(Date=pd.to_datetime(df1.drop('Tweet', 1))),
    df2[['Tweet']].assign(Date=pd.to_datetime(
        [f'{y}-{m}-{d}' for _, m, d, y, *_ in df2.itertuples()]))
])[['Date', 'Tweet']].assign(Date=lambda d: d.Date.dt.strftime('%d-%b-%y'))

        Date                     Tweet
0  02-Jun-13               Hello World
1  23-Jul-13         I want ice-cream!
2  30-Sep-17   Friends will be friends
3  12-Dec-17          Done with school
0  02-Jan-15            Happy New Year
1  21-Mar-16         Today is my final
2  30-May-17  Summer is about to begin
3  15-Jul-18       Ocean is still cold