将值添加到数据框保留索引

时间:2020-08-12 23:28:23

标签: python pandas dataframe indexing

我有一个<data-table #toolChangeTable [columns]="columns" [data]="data" [loading]="loading" (lazyLoad)="onLazyLoad($event)" [lazy]="true" [lazyLoadOnInit]="false" [pageSize]="pageOptions.size" [multiselect]="true" [paging] = "true" [totalRecords]="total" defaultSortField="necLec" (edit)="updateToolChange($event, toolChangeTable)" (select)="onSelect($event, toolChangeTable)" (unSelect)="onUnSelect($event)"> <ng-container actionStart> <button mat-button (click)="onMultiRowUpdateClick()" (keypress.enter)="onMultiRowUpdateClick()" [disabled]="this.resourced || hasSelectedNone">Multi-Edit</button> <button mat-button (click)="clearSelections()" (keypress.enter)="clearSelections()">Clear All</button> <button mat-button (click)="onAddToolChangeClick()" [disabled]="this.resourced">Add Tool Change</button> <button mat-button (click)="onDeleteToolChangeClick()" (keypress.enter)="onDeleteToolChangeClick()" [disabled]="!hasSelectedSingle">Delete Tool Change</button> <button mat-button [disabled]="!hasSelectedSingle" (click)="onEditAuthoritiesClick()" (keypress.enter)="onEditAuthoritiesClick()">Edit Tool Change Authorities</button> </ng-container> </data-table> (test3),看起来像这样(DataFramedate格式)

pd.datetime

其中第一列import pandas as pd data = {'date': ['1890-07-01 00:00:00', '1890-07-08 00:00:00', '1890-07-15 00:00:00', '1890-07-22 00:00:00', '1890-07-29 00:00:00'], 'date.1': ['1890-07-07', '1890-07-14', '1890-07-21', '1890-07-28', '1890-08-04'], 'mean_temp(℃)': [23.3, 23.9, 28.3, 26.1, 26.8], 'max_temp(℃)': [32.3, 33.2, 35.8, 33.3, 34.6], 'min_temp(℃)': [18.9, 17.0, 22.5, 22.0, 22.3]} df = pd.DataFrame(data) df.set_index('date', inplace=True) date.1 mean_temp(℃) max_temp(℃) min_temp(℃) date 1890-07-01 00:00:00 1890-07-07 23.3 32.3 18.9 1890-07-08 00:00:00 1890-07-14 23.9 33.2 17.0 1890-07-15 00:00:00 1890-07-21 28.3 35.8 22.5 1890-07-22 00:00:00 1890-07-28 26.1 33.3 22.0 1890-07-29 00:00:00 1890-08-04 26.8 34.6 22.3 是数据帧的索引。 我正在渲染新数据(第一列的daterendered_date var,第三列是(pd.to_datetimenext_value_ var array([[28.330473]], dtype=float32)的另一列)。

rendered_date = render_date(last_day.index.date) # rendering new datetime object
rendered_date = pd.to_datetime(rendered_date, format='%Y/%m/%d') # making it for pandas
d = {'date':[rendered_date], 'mean_temp(℃)':[next_value_]}
new_df = pd.DataFrame(data=d) # making new dataframe
new_df = new_df.set_index("date") # setting the same index

fr = [test3, new_df] # concating new DF with existing df (test3)
result = pd.concat(fr)

使结果底部看起来像

....some values ....
2020-07-31 00:00:00          2020-08-06     28.7            35.0    23.9
[2020-08-07 00:00:00]        NaT            [[28.330473]]   NaN     NaN

这不是我想要的。 我只想在result数据帧(或test3都可接受)的末尾附加数据,保留形状和索引。我该如何设置相同的格式?

....some values ....
2020-07-31 00:00:00          2020-08-06     28.7            35.0    23.9
2020-08-07 00:00:00          NaT            28.330473       NaN     NaN

1 个答案:

答案 0 :(得分:1)

编辑

您的代码对我来说很好:

data = {'date': ['1890-07-01 00:00:00', '1890-07-08 00:00:00', '1890-07-15 00:00:00', '1890-07-22 00:00:00', '1890-07-29 00:00:00'],
    'date.1': ['1890-07-07', '1890-07-14', '1890-07-21', '1890-07-28', '1890-08-04'],
    'mean_temp': [23.3, 23.9, 28.3, 26.1, 26.8],
    'max_temp': [32.3, 33.2, 35.8, 33.3, 34.6],
    'min_temp': [18.9, 17.0, 22.5, 22.0, 22.3]}
df = pd.DataFrame(data)
df.date = pd.to_datetime(df.date)
df.set_index('date', inplace=True)

rendered_date = pd.to_datetime('2020-08-07')
next_value_ = 28.330473
d = {'date': [rendered_date], 'mean_temp': [next_value_]}
df = pd.concat([df, pd.DataFrame(d).set_index('date')])

输出

                date.1  mean_temp  max_temp  min_temp
date
1890-07-01  1890-07-07  23.300000      32.3      18.9
1890-07-08  1890-07-14  23.900000      33.2      17.0
1890-07-15  1890-07-21  28.300000      35.8      22.5
1890-07-22  1890-07-28  26.100000      33.3      22.0
1890-07-29  1890-08-04  26.800000      34.6      22.3
2020-08-07         NaN  28.330473       NaN       NaN

添加行的一种更惯用的方法是

df.loc[rendered_date] = {'mean_temp': next_value_}
# # or
# df.loc[rendered_date] = [np.nan, next_value_, np.nan, np.nan]
# # or even
# df.loc[rendered_date, 'mean_temp'] = next_value_

所有选项的输出均相同


但是,如果可以获取所有新数据并将其作为批处理追加,则比一次串联一行要快。使用numpy / C实现,使用Python更快地将标量值分配给本地对象(如列表或字典),而使用矢量化调用的熊猫则更快。

相关问题