Question

我有一个Dataframe，其中有三列store, hour, count。我遇到的问题是某些hours缺少某些stores，我希望它们为0。

这就是dataframe的样子

#     store_id   hour   count
# 0         13      0      56
# 1         13      1      78
# 2         13      2      53
# 3         23     13      14
# 4         23     14      13

正如您所看到的，身份为13的store并没有3-23小时的值，与商店23类似，它没有其他许多小时的值。

我尝试通过创建包含两列id和count并执行right outer join的时态数据框来解决此问题，但是没有效果。

Answer 1

如果每组hour错字且没有重复，则解决方案为reindex MultiIndex.from_product：

df = df.set_index(['store_id','hour'])
mux = pd.MultiIndex.from_product([df.index.levels[0], range(23)], names=df.index.names)
df = df.reindex(mux, fill_value=0).reset_index()

print (df)
    store_id  hour  count
0         13     0     56
1         13     1     78
2         13     2     53
3         13     3      0
4         13     4      0
5         13     5      0
6         13     6      0
7         13     7      0
8         13     8      0
9         13     9      0
10        13    10      0
11        13    11      0
12        13    12      0
13        13    13      0
14        13    14      0
15        13    15      0
16        13    16      0
17        13    17      0
18        13    18      0
19        13    19      0
20        13    20      0
21        13    21      0
22        13    22      0
23        23     0      0
24        23     1      0
25        23     2      0
26        23     3      0
27        23     4      0
28        23     5      0
29        23     6      0
30        23     7      0
31        23     8      0
32        23     9      0
33        23    10      0
34        23    11      0
35        23    12      0
36        23    13     14
37        23    14      0
38        23    15      0
39        23    16      0
40        23    17      0
41        23    18      0
42        23    19      0
43        23    20      0
44        23    21      0
45        23    22      0

Answer 2

试试这个：

all_hours = set(range(24))
for sid in set(df['store_id']):
    misshours = list(all_hours - set(df['hour'][df['store_id'] == sid]))
    nmiss = len(misshours)
    df = pandas.concat([df, DataFrame({'store_id': nmiss * [sid], misshours, 'count': nmiss * [0]})])

如何按列添加值到Dataframe中

2 个答案: