我正在尝试对匹配的两个数据帧进行索引,并将数据写回到excel。必须填充的Excel文件如下所示:
"""
A simple health checkpoint for python
"""
import json
def handler(event, context):
return {
'statusCode': 200,
'body':
json.dumps({
'message':'Healthy Python!',
'input': event
})
}
“ TimeH”和“ TimeL”应该从看起来像这样的数据帧中索引
Name Location Date Open High TimeH Low TimeL Close
1 Orange New York 20200501.0 5.5 5.58 18:00 5.45 16:00 5.7
0 Apple Minsk 20200505.0 3.5 3.85 NaN 3.45 NaN 3.65
2 Steak Dallas 20200506.0 8.5 8.85 NaN 8.45 NaN 8.65
然后粘贴到excel文件中,一切正常后应如下所示:
Name Date Time Open High Low Close Volume VWAP Trades
4 Apple 20200505 15:30:00 3.50 3.85 3.45 3.70 1500 3.73 95
5 Apple 20200505 17:00:00 3.65 3.70 3.50 3.60 1600 3.65 54
6 Apple 20200505 20:00:00 3.80 3.85 3.35 3.81 1700 3.73 41
7 Apple 20200505 22:00:00 3.60 3.84 3.45 3.65 1800 3.75 62
4 Steak 20200506 10:00:00 8.50 8.85 8.45 8.70 1500 8.73 95
5 Steak 20200506 12:00:00 8.65 8.70 8.50 8.60 1600 8.65 54
6 Steak 20200506 14:00:00 8.80 8.85 8.45 8.81 1700 8.73 41
7 Steak 20200506 16:00:00 8.60 8.84 8.45 8.65 1800 8.75 62
我正在使用以下代码来索引值“ Open”,“ High”,“ Low”,“ Close”,效果很好:
Name Location Date Open High TimeH Low TimeL Close
1 Orange New York 20200501.0 5.5 5.58 18:00:00 5.45 16:00:00 5.7
0 Apple Minsk 20200505.0 3.5 3.85 10:00:00 3.45 20:00:00 3.65
2 Steak Dallas 20200506.0 8.5 8.85 15:30:00 8.45 14:00:00 8.65
但是当我使用以下代码将其附加到索引TimeH值时:
rdf13 = rdf12.groupby(['Name','Date']).agg(Open=('Open','first'),High=('High','max'),Low=('Low','min'), Close=('Close','last'),Volume=('Volume','sum'),VWAP=('VWAP','mean'),Trades=('Trades','sum')).reset_index()
result11 = pd.merge(rdf13, rdf11, how='inner', on=['Name', 'Date']).iloc[:,:-4].dropna(1).rename(columns = {"Open_x": "Open", "High_x": "High", "Low_x": "Low", "Close_x": "Close", "Volume_x": "Volume", "VWAP_x": "VWAP", "Trades_x": "Trades"})
result12 = result11.reindex(index=result11.index[::-1])
result13 = result12[['Name', 'Location', 'Date', 'Check_2','Open', 'High', 'Low', 'Close', 'Volume', 'VWAP', 'Trades']].reset_index()
readfile11 = pd.read_excel("Trackers\TEST Tracker.xlsx")
readfile11['Count'] = np.arange(len(readfile11))
df11 = readfile11.set_index(['Name', 'Location', 'Date'])
df12 = result13.set_index(['Name', 'Location', 'Date'])
fdf11 = df12.combine_first(df11).reset_index().reindex(readfile11.columns, axis=1).sort_values('Count')
print("Updated Day1 Data Frame")
print(fdf11)
writefdf10 = fdf11.to_excel("Trackers\TEST Tracker.xlsx", "Entries", index=False)
它似乎由于某种原因而无法工作并且什么也不返回,因此'TimeH'列中的'NaN'值保持为'NaN'。我弄乱了变量,但是由于执行错误而出现错误,或者它仍然向我返回“ NaN”值。
这里有人可以帮助我使python索引时间值吗?
答案 0 :(得分:0)
显然我的代码中有一点错字。
esult41 = pd.merge(colnames40, rdf11, how='inner', on=['Name', 'Date', 'High']).iloc[:,:-4].dropna(1).rename(columns = {"TimeH_x": "TimeH"})
应该是
esult41 = pd.merge(colnames40, rdf31, how='inner', on=['Name', 'Date', 'High']).iloc[:,:-4].dropna(1).rename(columns = {"TimeH_x": "TimeH"})
现在的问题是数据返回重复的值,这由于引用rdf31
才有意义,但问题是df.drop_duplicates(keep='first', inplace=False)
由于某种原因返回“ None”值,但这超出了范围这个问题。