我有csv
个文件,其中包含日期,repair_id,现场维修次数和非现场维修次数,因此我的数据显示为:
data repair_id num_onsite num_offsite
2016-02-01 A 3 0
2016-02-01 B 2 1
2016-02-01 D 0 4
2016-02-02 A 1 3
2016-02-02 C 1 1
2016-02-02 E 0 6
...
2016-02-14 A 1 3
2016-02-14 B 0 4
2016-02-14 D 2 0
2016-02-14 E 3 0
有5种不同的repair_id
,即:A, B, C, D, E
。如果修复人员(repair_id
)在给定日期没有工作,则他们不在该日期的csv文件中。我想通过包含它们来改变它并且具有0
值
对于num_onsite
和num_offsite
,以便我的表格类似于:
data repair_id num_onsite num_offsite
2016-02-01 A 3 0
2016-02-01 B 2 1
2016-02-01 C 0 0 # added
2016-02-01 D 0 4
2016-02-01 E 0 0 # added
2016-02-02 A 1 3
2016-02-02 B 0 0 # added
2016-02-02 C 1 1
2016-02-02 D 0 0 # added
2016-02-02 E 0 6
...
2016-02-14 A 1 3
2016-02-14 B 0 4
2016-02-14 C 0 0 # added
2016-02-14 D 2 0
2016-02-14 E 3 0
我看过了:
Pandas DataFrame insert / fill missing rows from previous dates
Missing data, insert rows in Pandas and fill with NAN
Add missing dates to pandas dataframe
但是我无法正确输出
答案 0 :(得分:4)
a = 3;
b = 5;
b += a;
a = b - a;
b -= a;
console.log(a); //5
console.log(b); //3
答案 1 :(得分:3)
设置索引,使用fill_value
重新索引,然后reset_index
mux = pd.MultiIndex.from_product(
[df.data.unique(), df.repair_id.unique()],
names=['data', 'repair_id']
)
df.set_index(['data', 'repair_id']).reindex(mux, fill_value=0).reset_index()
data repair_id num_onsite num_offsite
0 2016-02-01 A 3 0
1 2016-02-01 B 2 1
2 2016-02-01 D 0 4
3 2016-02-01 C 0 0
4 2016-02-01 E 0 0
5 2016-02-02 A 1 3
6 2016-02-02 B 0 0
7 2016-02-02 D 0 0
8 2016-02-02 C 1 1
9 2016-02-02 E 0 6
10 2016-02-14 A 1 3
11 2016-02-14 B 0 4
12 2016-02-14 D 2 0
13 2016-02-14 C 0 0
14 2016-02-14 E 3 0
答案 2 :(得分:2)
对于我们这些具有SQL心态的人,请考虑在一组所有可能的日期和ID组合上合并(左连接):
import itertools
...
combns = pd.DataFrame(list(itertools.product(df['data'].unique(), df['repair_id'].unique())),
columns=['data', 'repair_id'])
new_df = combns.merge(df, on=['data', 'repair_id'], how='left')\
.fillna(0).sort_values(['data', 'repair_id']).reset_index(drop=True)
# data repair_id num_onsite num_offsite
# 0 2016-02-01 A 3.0 0.0
# 1 2016-02-01 B 2.0 1.0
# 2 2016-02-01 C 0.0 0.0
# 3 2016-02-01 D 0.0 4.0
# 4 2016-02-01 E 0.0 0.0
# 5 2016-02-02 A 1.0 3.0
# 6 2016-02-02 B 0.0 0.0
# 7 2016-02-02 C 1.0 1.0
# 8 2016-02-02 D 0.0 0.0
# 9 2016-02-02 E 0.0 6.0
# 10 2016-02-14 A 1.0 3.0
# 11 2016-02-14 B 0.0 4.0
# 12 2016-02-14 C 0.0 0.0
# 13 2016-02-14 D 2.0 0.0
# 14 2016-02-14 E 3.0 0.0