我有一个带有位置,日期和计数的熊猫数据框。日期存储为字符串,仅用于2018年11月。有68个位置。某些日期和位置有多个计数,我想将这些行保持原样。我正在寻求帮助的是,如果某个位置在2018-11-01和2018-11-30之间没有日期,我想添加一个包含该位置的行,缺少的日期(字符串)以及一个“ 0”。这是我的数据框:
Location Date Count
0 location_one 2018-11-06 102
1 location_one 2018-11-06 16
2 location_one 2018-11-06 650
3 location_one 2018-11-07 4
4 location_one 2018-11-06 12
5 location_one 2018-11-06 191
6 location_one 2018-11-06 58
7 location_one 2018-11-07 149
所需的输出:
Location Date Count
0 location_one 2018-11-01 0
1 location_one 2018-11-02 0
2 location_one 2018-11-03 0
3 location_one 2018-11-04 0
4 location_one 2018-11-05 0
5 location_one 2018-11-06 102
6 location_one 2018-11-06 16
7 location_one 2018-11-06 650
8 location_one 2018-11-07 4
9 location_one 2018-11-06 12
10 location_one 2018-11-06 191
11 location_one 2018-11-06 58
12 location_one 2018-11-07 149
答案 0 :(得分:2)
扩展先前的答案,以便在每个OP中使用多个位置。
import pandas as pd
input_df = pd.DataFrame([
['location_one', '2018-11-06', '102'],
['location_one', '2018-11-06', '16'],
['location_one', '2018-11-06', '650'],
['location_one', '2018-11-07', '4'],
['location_one', '2018-11-06', '12'],
['location_one', '2018-11-06', '191'],
['location_one', '2018-11-06', '58'],
['location_one', '2018-11-07', '149'],
['location_two', '2018-11-06', '110'] # Added
], columns=['location', 'date', 'count'])
# (1) Create dataframe for all dates in Nov 2018
month = '2018-11'
date_df = pd.DataFrame(
{'date':pd.DatetimeIndex(start='2018-11-01',end='2018-11-30',freq='d')}
)
date_df.date = date_df.date.apply(lambda x: x.strftime('%Y-%m-%d'))
# (2) Create dataframe with every location/date combination
index = pd.MultiIndex.from_product([
input_df.location.unique(),
date_df.date
], names = ['location', 'date'])
master_df = pd.DataFrame(index=index).reset_index()
# (3) Populate count column and fill missing entries with zero
results = pd.merge(master_df, input_df, on=['location', 'date'], how='left').fillna(0)
print(results)
答案 1 :(得分:1)
此答案基于W-B的评论:
假设您以df
开头为:
Location Date Count
0 location_one 2018-11-06 102
1 location_one 2018-11-06 16
2 location_one 2018-11-06 650
3 location_one 2018-11-07 4
4 location_one 2018-11-06 12
5 location_one 2018-11-06 191
6 location_one 2018-11-06 58
7 location_one 2018-11-07 149
您可以这样做:
t_df = pd.DataFrame({'date':pd.DatetimeIndex(start='2018-11-01',end='2018-11-30',freq='d')})
result = t_df.merge(df,how='left').fillna(0) #Assumes no nas in other fields