Question

我正在使用pandas.get_dummies在拟合和分类时对分类要素进行编码，我只是注意到Imputer()将平均值放在{{1}中添加的“关闭”分类开关中在分类新样本时。

我看到这个post建议在dataframe.reindex()调用上使用fill_value=0这似乎是一个不错的解决方案，但在我将此代码投入生产之前，我有一个唠叨的问题。

有没有人知道pandas DataFrame.reindex函数是否会将所有NaN设置为fill_value中的值或仅添加它添加的新列？我想确保任何带有NaN的非分类数据都由reindex处理。

Answer 1

如果我正确理解您的问题，我相信它会填充所有列中的NaN值。

来自[http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html][1]

import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10','Chrome']
df = pd.DataFrame({
      'http_status': [200,200,404,404,301],
      'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
       index=index)

df

返回：

                http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

虽然df.reindex(new_index, fill_value='missing')返回：

                  http_status   response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

这些专栏都不是新的，但是仍然填写了纳米值。在投入生产之前，我肯定会测试我的解释。我不确定我是否有适当的背景。

编辑：

我应该补充一点，好像这些价值观是否为“NaN＆＃39;之前，.reindex不会填充这些值：

import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10','Chrome']
df = pd.DataFrame({
      'http_status': [200,'NaN',404,404,301],
      'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
       index=index)

df

返回：

               http_status  response_time
Safari                404           0.07
Iceweasel             NaN            NaN
Comodo Dragon         NaN            NaN
IE10                  404           0.08
Chrome                NaN           0.02

虽然df.reindex（new_index，fill_value =＆＃39;缺少＆＃39;）返回：

              http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                NaN          0.02

切换索引不会影响HTTP Status-Chrome值。

将reindex与fill_value一起用于同一数据框中的分类和连续要素

1 个答案: