我有一个熊猫数据框,在特定列中有一些NaN值:
1291 NaN
1841 NaN
2049 NaN
Name: some column, dtype: float64
为了解决这个问题,我制作了以下管道:
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
scaler = StandardScaler(with_mean = True)
imputer = SimpleImputer(strategy = 'median')
logistic = LogisticRegression()
pipe = Pipeline([('imputer', imputer),
('scaler', scaler),
('logistic', logistic)])
现在,当我将此管道传递到RandomizedSearchCV
时,出现以下错误:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
实际上比这要长得多-如果需要,我可以在编辑中发布整个错误。无论如何,我很确定此列是唯一包含NaN的列。此外,如果我从SimpleImputer
切换到管道中的Imputer
(现已弃用)RandomizedSearchCV
,则该管道在我的SimpleImputer
中可以正常工作。我检查了文档,但似乎Imputer
的行为(几乎)与Imputer
完全相同。行为上有什么区别?
class user{
/**
* @return array
*/
public function showwinners(){
$query = "SELECT points, memberid, uname FROM user";
$all_answers = array();
if( $query_run = mysql_query( $query ) ){
if( mysql_num_rows( $query_run ) == NULL ){
return 0;
}
while( $query_row = mysql_fetch_assoc( $query_run ) ){
$one = $query_row['points'];
$two = $query_row['memberid'];
$three = $query_row['uname'];
if( $one >=1 ){
$first = $one;
$second = $three;
array_push( $all_answers, ['name'=>$second,'points'=>$first] );
}
}
}
return $all_answers;
}
}
的情况下,在管道中使用imputer?
答案 0 :(得分:0)
我遇到了同样的问题,但这已经解决了:
imputer = SimpleImputer(strategy = 'median', fill_value = 0)
答案 1 :(得分:0)
make_pipeline中的SimpleImputer
preprocess_pipeline = make_pipeline(
FeatureUnion(transformer_list=[
('Handle numeric columns', make_pipeline(
ColumnSelector(columns=['Amount']),
SimpleImputer(strategy='constant', fill_value=0),
StandardScaler()
)),
('Handle categorical data', make_pipeline(
ColumnSelector(columns=['Type', 'Name', 'Changes']),
SimpleImputer(strategy='constant', missing_values=' ', fill_value='missing_value'),
OneHotEncoder(sparse=False)
))
])
)
管道中的SimpleImputer
('features', FeatureUnion ([
('Cat Columns', Pipeline([
('Category Extractor', TypeSelector(np.number)),
('Impute Zero', SimpleImputer(strategy="constant", fill_value=0))
])),
('Numerics', Pipeline([
('Numeric Extractor', TypeSelector("category")),
('Impute Missing', SimpleImputer(strategy="constant", fill_value='missing'))
]))
]))