我有两个表,想要使用pandas将它们转换成一个看起来完全像这样的表。订单必须相同,日期格式完全相同。
我的table1.csv
Tweet, Month, Day, Year
Hello World, 6, 2, 2013
I want ice-cream!, 7, 23, 2013
Friends will be friends, 9, 30, 2017
Done with school, 12, 12, 2017
我的table2.csv
Month, Day, Year, Hour, Tweet
January, 2, 2015, 12, Happy New Year
March, 21, 2016, 7, Today is my final
May, 30, 2017, 23, Summer is about to begin
July, 15, 2018, 11, Ocean is still cold
这是我到目前为止所做的:
import numpy as np
import pandas as pd
from datetime import *
df1=pd.read_csv('data1.csv', index_col=False, header=0)
df2=pd.read_csv('data2.csv', index_col=False, header=0)
#creating Date column from Day,Month and Year columns
df1['Date']= df1.apply(lambda x:datetime.strptime("{0} {1} {2}"
.format(x['Year'],x['Month'], x['Day']), "%Y %m %d"),axis=1)
df2['Date']= df2.apply(lambda x:datetime.strptime("{0} {1} {2}"
.format(x['Year'],x['Month'], x['Day']), "%Y %B %d"),axis=1)
#Selecting only desired columns
df1=df1[['Date','Tweet']]
df2=df2[['Date','Tweet']]
#combining both data frames
combine=df1.append(df2)
#Sort the data frame based on Date column.
combine.sort_values(by='Date', ascending=False, inplace=True)
#convert date to required format
combine['Date'] = combine['Date'].dt.strftime('%m-%b-%Y')
#writing to csv
combine.to_csv('combine.csv', encoding='utf-8', index=False)
这是我得到的输出:
Date,Tweet
07-Jul-2018,Ocean is still cold
12-Dec-2017,Done with school
09-Sep-2017,Friends will be friends
05-May-2017,Summer is about to begin
03-Mar-2016,Today is my final
01-Jan-2015,Happy New Year
07-Jul-2013,I want ice-cream!
06-Jun-2013,Hello World
显然,这一天是完全错误的,有人知道如何解决它吗?
答案 0 :(得分:3)
你不能简单地
QUERY()
E.g:
df1['Date'] = pd.to_datetime(df1[['Year', 'Month', 'Day']])
df2['Month'] = df2.Month.apply(lambda x: datetime.strptime(x, '%B').month)
df2['Date'] = pd.to_datetime(df2[['Year', 'Month', 'Day']])
df = pd.concat([df1, df2])[['Date','Tweet']]
答案 1 :(得分:0)
pd.to_datetime
相关列df1
pd.to_datetime
df2
pd.concat
加入assign
和lambda
用格式化字符串覆盖Date
列pd.concat([
df1[['Tweet']].assign(Date=pd.to_datetime(df1.drop('Tweet', 1))),
df2[['Tweet']].assign(Date=pd.to_datetime(
[f'{y}-{m}-{d}' for _, m, d, y, *_ in df2.itertuples()]))
])[['Date', 'Tweet']].assign(Date=lambda d: d.Date.dt.strftime('%d-%b-%y'))
Date Tweet
0 02-Jun-13 Hello World
1 23-Jul-13 I want ice-cream!
2 30-Sep-17 Friends will be friends
3 12-Dec-17 Done with school
0 02-Jan-15 Happy New Year
1 21-Mar-16 Today is my final
2 30-May-17 Summer is about to begin
3 15-Jul-18 Ocean is still cold