熊猫:如何填写缺少的“年,周”列?

时间:2019-01-30 17:03:12

标签: python pandas

我有一个[年]和[周]列有时不见的数据框。我有另一个数据框,可以作为日历参考,从中可以获取这些缺失的值。如何使用熊猫填充这些缺失的列?

我尝试使用reindex进行设置,但是出现以下错误

ValueError:缓冲区的维数错误(预期为1,为2)

import pandas as pd

d1 = {'Year': [2019,2019,2019,2019,2019], 'Week':[1,2,4,6,7], 'Value': 
[20,40,60,75,90]}
d2 = {'Year': [2019,2019,2019,2019,2019,2019,2019,2019,2019,2019], 'Week':[1,2,3,4,5,6,7,8,9,10]}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1 = df1.set_index(['Year', 'Week'])
df2 = df2.set_index(['Year', 'Week'])

df1 = df1.reindex(df2, fill_value=0)

print(df1)

2 个答案:

答案 0 :(得分:2)

您应该添加index,以便df2.index

df1.reindex(df2.index,fill_value=0)
Out[851]: 
           Value
Year Week       
2019 1        20
     2        40
     3         0
     4        60
     5         0
     6        75
     7        90

df2.index.difference(df1.index)
Out[854]: 
MultiIndex(levels=[[2019], [3, 5]],
           labels=[[0, 0], [0, 1]],
           names=['Year', 'Week'],
           sortorder=0)

更新

s=df1.reindex(df2.index)
s[s.bfill().notnull().values].fillna(0)
Out[877]: 
           Value
Year Week       
2019 1      20.0
     2      40.0
     3       0.0
     4      60.0
     5       0.0
     6      75.0
     7      90.0

答案 1 :(得分:2)

import pandas as pd

d1 = {'Year': [2019,2019,2019,2019,2019], 'Week':[1,2,4,6,7], 'Value': 
[20,40,60,75,90]}
d2 = {'Year': [2019,2019,2019,2019,2019,2019,2019], 'Week':[1,2,3,4,5,6,7]}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1 = df1.set_index(['Year', 'Week'])
df2 = df2.set_index(['Year', 'Week'])

fill_value = df1['Value'].mean() #value to fill `NaN` rows with - can choose another logic if you do not want the mean
df1 = df1.join(df2, how='right')


df1.fillna(value=fill_value,axis=1) # Fill missing data here
print(df1)