我必须加入两个表并创建一个包含日期的表,但是我的代码很长,我相信我已经完成了很长的路。显然,这个只有22行。有没有其他方法和更短的方法来解决这个问题。这是问题
这是我的代码,我相信这很长,我认为有一个更短的方法来做到这一点。
import numpy as np
import pandas as pd
import datetime
#YOUR CODE GOES HERE#
def get_month(i):
"""this function returns the number of the month based on stringinput"""
if i == "January":
return 1
elif i == "February":
return 2
elif i == "March":
return 3
elif i == "April":
return 4
elif i == "May":
return 5
elif i == "June":
return 6
elif i == "July":
return 7
elif i == "August":
return 8
elif i == "September":
return 9
elif i == "October":
return 10
elif i == "November":
return 11
elif i == "December":
return 12
def get_reformatted_date(s):
"""this function reformats a datetime object to the output we're looking for"""
return s.strftime("%d-%b-%y")
month_names = []
tab1 = pd.read_csv("data1.csv")
tab2 = pd.read_csv("data2.csv")
tab1_tweets = tab1['Tweet'].tolist()[::-1]
tab2_tweets = tab2['Tweet'].tolist()[::-1]
tab1_months = tab1['Month'].tolist()[::-1]
tab2_months = tab2['Month'].tolist()[::-1]
tab1_days = tab1['Day'].tolist()[::-1]
tab2_days = tab2['Day'].tolist()[::-1]
tab1_years = tab1['Year'].tolist()[::-1]
tab2_years = tab2['Year'].tolist()[::-1]
all_dates = []
all_tweets = []
tab1_count = 0
tab2_count = 0
for i in range(len(tab1_tweets) + len(tab2_tweets)):
if(tab1_count < len(tab1_years) and tab2_count < len(tab2_years)):
t1_date = datetime.date(tab1_years[tab1_count], tab1_months[tab1_count], tab1_days[tab1_count])
t2_date = datetime.date(tab2_years[tab2_count], get_month(tab2_months[tab2_count]), tab2_days[tab2_count])
if t1_date > t2_date:
all_dates.append(t1_date)
all_tweets.append(tab1_tweets[tab1_count])
tab1_count += 1
else:
all_dates.append(t2_date)
all_tweets.append(tab2_tweets[tab2_count])
tab2_count += 1
elif(tab2_count < len(tab2_years)):
t2_date = datetime.date(tab2_years[tab2_count], get_month(tab2_months[tab2_count]), tab2_days[tab2_count])
all_dates.append(t2_date)
all_tweets.append(tab2_tweets[tab2_count])
tab2_count += 1
else:
t1_date = datetime.date(tab1_years[tab1_count], tab1_months[tab1_count], tab1_days[tab1_count])
all_dates.append(t1_date)
all_tweets.append(tab1_tweets[tab1_count])
tab1_count += 1
table_data = {'Date': all_dates, 'Tweet': all_tweets}
df = pd.DataFrame(table_data)
df['Date'] = df['Date'].apply(get_reformatted_date)
print(df)
data1.csv
是
Tweet Month Day Year
Hello World 6 2 2013
I want ice-cream! 7 23 2013
Friends will be friends 9 30 2017
Done with school 12 12 2017
data2.csv
是
Month Day Year Hour Tweet
January 2 2015 12 Happy New Year
March 21 2016 7 Today is my final
May 30 2017 23 Summer is about to begin
July 15 2018 11 Ocean is still cold
答案 0 :(得分:1)
我认为你理论上可以在一行中完成这一切:
finaldf = (pd.concat([pd.read_csv('data1.csv',
parse_dates={'Date':['Year', 'Month', 'Day']}),
pd.read_csv('data2.csv',
parse_dates={'Date':['Year', 'Month', 'Day']})
[['Date', 'Tweet']]])
.sort_values('Date', ascending=False))
但为了便于阅读,最好将其分成几行:
df1 = pd.read_csv('data1.csv', parse_dates={'Date':['Year', 'Month','Day']})
df2 = pd.read_csv('data2.csv', parse_dates={'Date':['Year', 'Month','Day']})
finaldf = (pd.concat([df1, df2[['Date', 'Tweet']]])
.sort_values('Date', ascending=False))
我认为,对于您尝试做的事情,要阅读的主要内容是pandas read_csv
的parse_dates
参数和pd.concat
来连接数据框< / p>
修改:为了按照示例输出中的格式获取日期,您可以使用Series.dt.strftime()
在上面的代码后调用此日期:
finaldf['Date'] = finaldf['Date'].dt.strftime('%d-%b-%y')