熊猫指数与时间值匹配

时间:2020-05-27 17:23:01

标签: python python-3.x pandas

我正在尝试对匹配的两个数据帧进行索引,并将数据写回到excel。必须填充的Excel文件如下所示:

"""
    A simple health checkpoint for python
"""
import json
def handler(event, context):
    return {
            'statusCode': 200,
            'body':
            json.dumps({
              'message':'Healthy Python!',
              'input': event
              })
            }

“ TimeH”和“ TimeL”应该从看起来像这样的数据帧中索引

     Name  Location        Date  Open   High   TimeH   Low    TimeL    Close
1  Orange  New York  20200501.0   5.5   5.58   18:00   5.45   16:00     5.7
0   Apple     Minsk  20200505.0   3.5   3.85    NaN    3.45    NaN      3.65
2   Steak    Dallas  20200506.0   8.5   8.85    NaN    8.45    NaN      8.65

然后粘贴到excel文件中,一切正常后应如下所示:

    Name      Date     Time  Open  High   Low  Close  Volume  VWAP  Trades
4   Apple  20200505  15:30:00  3.50  3.85  3.45   3.70    1500  3.73      95
5   Apple  20200505  17:00:00  3.65  3.70  3.50   3.60    1600  3.65      54
6   Apple  20200505  20:00:00  3.80  3.85  3.35   3.81    1700  3.73      41
7   Apple  20200505  22:00:00  3.60  3.84  3.45   3.65    1800  3.75      62
4   Steak  20200506  10:00:00  8.50  8.85  8.45   8.70    1500  8.73      95
5   Steak  20200506  12:00:00  8.65  8.70  8.50   8.60    1600  8.65      54
6   Steak  20200506  14:00:00  8.80  8.85  8.45   8.81    1700  8.73      41
7   Steak  20200506  16:00:00  8.60  8.84  8.45   8.65    1800  8.75      62

我正在使用以下代码来索引值“ Open”,“ High”,“ Low”,“ Close”,效果很好:

     Name  Location        Date  Open   High    TimeH     Low    TimeL      Close
1  Orange  New York  20200501.0   5.5   5.58   18:00:00   5.45   16:00:00    5.7
0   Apple     Minsk  20200505.0   3.5   3.85   10:00:00   3.45   20:00:00    3.65
2   Steak    Dallas  20200506.0   8.5   8.85   15:30:00   8.45   14:00:00    8.65

但是当我使用以下代码将其附加到索引TimeH值时:

rdf13 = rdf12.groupby(['Name','Date']).agg(Open=('Open','first'),High=('High','max'),Low=('Low','min'), Close=('Close','last'),Volume=('Volume','sum'),VWAP=('VWAP','mean'),Trades=('Trades','sum')).reset_index()
result11 = pd.merge(rdf13, rdf11, how='inner', on=['Name', 'Date']).iloc[:,:-4].dropna(1).rename(columns = {"Open_x": "Open", "High_x": "High", "Low_x": "Low", "Close_x": "Close", "Volume_x": "Volume", "VWAP_x": "VWAP", "Trades_x": "Trades"})
result12 = result11.reindex(index=result11.index[::-1])
result13 = result12[['Name', 'Location', 'Date', 'Check_2','Open', 'High', 'Low', 'Close', 'Volume', 'VWAP', 'Trades']].reset_index()

readfile11 = pd.read_excel("Trackers\TEST Tracker.xlsx")

readfile11['Count'] = np.arange(len(readfile11))
df11 = readfile11.set_index(['Name', 'Location', 'Date'])
df12 = result13.set_index(['Name', 'Location', 'Date'])

fdf11 = df12.combine_first(df11).reset_index().reindex(readfile11.columns, axis=1).sort_values('Count')

print("Updated Day1 Data Frame")
print(fdf11)

writefdf10 = fdf11.to_excel("Trackers\TEST Tracker.xlsx", "Entries", index=False)

它似乎由于某种原因而无法工作并且什么也不返回,因此'TimeH'列中的'NaN'值保持为'NaN'。我弄乱了变量,但是由于执行错误而出现错误,或者它仍然向我返回“ NaN”值。

这里有人可以帮助我使python索引时间值吗?

1 个答案:

答案 0 :(得分:0)

显然我的代码中有一点错字。

esult41 = pd.merge(colnames40, rdf11, how='inner', on=['Name', 'Date', 'High']).iloc[:,:-4].dropna(1).rename(columns = {"TimeH_x": "TimeH"})

应该是

esult41 = pd.merge(colnames40, rdf31, how='inner', on=['Name', 'Date', 'High']).iloc[:,:-4].dropna(1).rename(columns = {"TimeH_x": "TimeH"})

现在的问题是数据返回重复的值,这由于引用rdf31才有意义,但问题是df.drop_duplicates(keep='first', inplace=False)由于某种原因返回“ None”值,但这超出了范围这个问题。

相关问题