数据缺失填充NAN加10

时间:2018-02-11 13:45:10

标签: python missing-data

df = pd.DataFrame({'From_To': ['LoNDon_paris', 'MAdrid_miLAN', 'londON_StockhOlm','Budapest_PaRis', 'Brussels_londOn'],
'FlightNumber': [10045, np.nan, 10065, np.nan, 10085],
'RecentDelays':  [[23, 47], [], [24, 43, 87], [13], [67, 32]], 
'Airline': ['KLM(!)', '<Air France> (12)', '(British Airways. )', '12. Air France', '"Swiss Air"']})

df

               Airline  FlightNumber           From_To  RecentDelays
0               KLM(!)       10045.0      LoNDon_paris      [23, 47]
1    <Air France> (12)           NaN      MAdrid_miLAN            []
2  (British Airways. )       10065.0  londON_StockhOlm  [24, 43, 87]
3       12. Air France           NaN    Budapest_PaRis          [13]
4          "Swiss Air"       10085.0   Brussels_londOn      [67, 32]

缺少FlightNumber列中的某些值。这些数字意味着每行增加10,因此需要设置10055和10075。填写这些缺失的数字,并使列成为整数列(而不是浮点列)。

3 个答案:

答案 0 :(得分:1)

似乎是pd.Series.interpolate的一个很好的用例:

df['FlightNumber'] = df['FlightNumber'].interpolate().astype(int)
df

               Airline  FlightNumber           From_To  RecentDelays
0               KLM(!)         10045      LoNDon_paris      [23, 47]
1    <Air France> (12)         10055      MAdrid_miLAN            []
2  (British Airways. )         10065  londON_StockhOlm  [24, 43, 87]
3       12. Air France         10075    Budapest_PaRis          [13]
4          "Swiss Air"         10085   Brussels_londOn      [67, 32]

默认方法是'linear',只要FlightNumber线性增加,就是这里所需要的。

答案 1 :(得分:0)

希望这行得通。

for i in range(1, df['FlightNumber'].count() + 1):
      if pd.isnull(df.loc[i,'FlightNumber']):
          df.loc[i, 'FlightNumber'] = df.loc[i-1, 'FlightNumber'] + 10

答案 2 :(得分:0)

尝试以下代码:-

df['FlightNumber'] = df['FlightNumber'].interpolate().astype(int)