熊猫将每个位置的缺失日期(字符串)替换为0

时间:2019-01-02 21:56:37

标签: python pandas

我有一个带有位置,日期和计数的熊猫数据框。日期存储为字符串,仅用于2018年11月。有68个位置。某些日期和位置有多个计数,我想将这些行保持原样。我正在寻求帮助的是,如果某个位置在2018-11-01和2018-11-30之间没有日期,我想添加一个包含该位置的行,缺少的日期(字符串)以及一个“ 0”。这是我的数据框:

    Location        Date        Count
0   location_one    2018-11-06  102
1   location_one    2018-11-06  16
2   location_one    2018-11-06  650
3   location_one    2018-11-07  4
4   location_one    2018-11-06  12
5   location_one    2018-11-06  191
6   location_one    2018-11-06  58
7   location_one    2018-11-07  149

所需的输出:

    Location        Date        Count
0   location_one    2018-11-01  0
1   location_one    2018-11-02  0
2   location_one    2018-11-03  0
3   location_one    2018-11-04  0
4   location_one    2018-11-05  0
5   location_one    2018-11-06  102
6   location_one    2018-11-06  16
7   location_one    2018-11-06  650
8   location_one    2018-11-07  4
9   location_one    2018-11-06  12
10  location_one    2018-11-06  191
11  location_one    2018-11-06  58
12  location_one    2018-11-07  149

2 个答案:

答案 0 :(得分:2)

扩展先前的答案,以便在每个OP中使用多个位置。

import pandas as pd

input_df = pd.DataFrame([
    ['location_one', '2018-11-06', '102'],
    ['location_one', '2018-11-06', '16'],
    ['location_one', '2018-11-06', '650'],
    ['location_one', '2018-11-07', '4'],
    ['location_one', '2018-11-06', '12'],
    ['location_one', '2018-11-06', '191'],
    ['location_one', '2018-11-06', '58'],
    ['location_one', '2018-11-07', '149'],
    ['location_two', '2018-11-06', '110'] # Added
], columns=['location', 'date', 'count'])

# (1) Create dataframe for all dates in Nov 2018
month = '2018-11'
date_df = pd.DataFrame(
    {'date':pd.DatetimeIndex(start='2018-11-01',end='2018-11-30',freq='d')}
)
date_df.date = date_df.date.apply(lambda x: x.strftime('%Y-%m-%d'))

# (2) Create dataframe with every location/date combination
index = pd.MultiIndex.from_product([
    input_df.location.unique(), 
    date_df.date
], names = ['location', 'date'])
master_df = pd.DataFrame(index=index).reset_index()

# (3) Populate count column and fill missing entries with zero
results = pd.merge(master_df, input_df, on=['location', 'date'], how='left').fillna(0)
print(results)

答案 1 :(得分:1)

此答案基于W-B的评论:

假设您以df开头为:

  Location        Date        Count
 0   location_one    2018-11-06  102
 1   location_one    2018-11-06  16
 2   location_one    2018-11-06  650
 3   location_one    2018-11-07  4
 4   location_one    2018-11-06  12
 5   location_one    2018-11-06  191
 6   location_one    2018-11-06  58
 7   location_one    2018-11-07  149

您可以这样做:

 t_df = pd.DataFrame({'date':pd.DatetimeIndex(start='2018-11-01',end='2018-11-30',freq='d')})
 result = t_df.merge(df,how='left').fillna(0) #Assumes no nas in other fields