向行添加条目以使其统一

时间:2017-03-01 04:37:57

标签: python csv pandas

我有csv个文件,其中包含日期,repair_id,现场维修次数和非现场维修次数,因此我的数据显示为:

data        repair_id    num_onsite     num_offsite
2016-02-01          A             3              0
2016-02-01          B             2              1
2016-02-01          D             0              4
2016-02-02          A             1              3
2016-02-02          C             1              1
2016-02-02          E             0              6
...
2016-02-14          A             1              3
2016-02-14          B             0              4
2016-02-14          D             2              0
2016-02-14          E             3              0

有5种不同的repair_id,即:A, B, C, D, E。如果修复人员(repair_id)在给定日期没有工作,则他们不在该日期的csv文件中。我想通过包含它们来改变它并且具有0值 对于num_onsitenum_offsite,以便我的表格类似于:

data        repair_id    num_onsite     num_offsite
2016-02-01          A             3              0
2016-02-01          B             2              1
2016-02-01          C             0              0 # added
2016-02-01          D             0              4
2016-02-01          E             0              0 # added
2016-02-02          A             1              3
2016-02-02          B             0              0 # added
2016-02-02          C             1              1
2016-02-02          D             0              0 # added
2016-02-02          E             0              6
...
2016-02-14          A             1              3
2016-02-14          B             0              4
2016-02-14          C             0              0 # added
2016-02-14          D             2              0
2016-02-14          E             3              0

我看过了:

Pandas DataFrame insert / fill missing rows from previous dates

Missing data, insert rows in Pandas and fill with NAN

Add missing dates to pandas dataframe

但是我无法正确输出

3 个答案:

答案 0 :(得分:4)

a = 3;
b = 5;
b += a;
a = b - a;
b -= a;
console.log(a); //5
console.log(b); //3

答案 1 :(得分:3)

设置索引,使用fill_value重新索引,然后reset_index

mux = pd.MultiIndex.from_product(
    [df.data.unique(), df.repair_id.unique()],
    names=['data', 'repair_id']
)

df.set_index(['data', 'repair_id']).reindex(mux, fill_value=0).reset_index()

          data repair_id  num_onsite  num_offsite
0   2016-02-01         A           3            0
1   2016-02-01         B           2            1
2   2016-02-01         D           0            4
3   2016-02-01         C           0            0
4   2016-02-01         E           0            0
5   2016-02-02         A           1            3
6   2016-02-02         B           0            0
7   2016-02-02         D           0            0
8   2016-02-02         C           1            1
9   2016-02-02         E           0            6
10  2016-02-14         A           1            3
11  2016-02-14         B           0            4
12  2016-02-14         D           2            0
13  2016-02-14         C           0            0
14  2016-02-14         E           3            0

答案 2 :(得分:2)

对于我们这些具有SQL心态的人,请考虑在一组所有可能的日期和ID组合上合并(左连接):

import itertools
...
combns = pd.DataFrame(list(itertools.product(df['data'].unique(), df['repair_id'].unique())),
                      columns=['data', 'repair_id'])

new_df = combns.merge(df, on=['data', 'repair_id'], how='left')\
               .fillna(0).sort_values(['data', 'repair_id']).reset_index(drop=True)

#           data repair_id  num_onsite  num_offsite
# 0   2016-02-01         A         3.0          0.0
# 1   2016-02-01         B         2.0          1.0
# 2   2016-02-01         C         0.0          0.0
# 3   2016-02-01         D         0.0          4.0
# 4   2016-02-01         E         0.0          0.0
# 5   2016-02-02         A         1.0          3.0
# 6   2016-02-02         B         0.0          0.0
# 7   2016-02-02         C         1.0          1.0
# 8   2016-02-02         D         0.0          0.0
# 9   2016-02-02         E         0.0          6.0
# 10  2016-02-14         A         1.0          3.0
# 11  2016-02-14         B         0.0          4.0
# 12  2016-02-14         C         0.0          0.0
# 13  2016-02-14         D         2.0          0.0
# 14  2016-02-14         E         3.0          0.0