我对python很陌生,曾经使用R。为此,我将使用as.factor并根据数字进行分类。
以前我曾尝试使用replace和.loc函数,以便根据条件在新列中提供新的类别值,但是这样做只会失败,而我想要做的事。
最终,我创建了以下非常简单的功能:
pipeline.predict(pd.DataFrame({ 'y': [3, 8], 'a': ['a', 'b' ], 'b': [3, 6],}))
但是,当我运行该函数时,它仅返回“ LowFl”,而未更正其他部分。我觉得我缺少什么。
数据信息如下:
new_df = pd.DataFrame({ 'y': [3, 8], 'a': ['a', 'b' ], 'b': [3, 6],})
new_X = new_df[['a', 'b']]
new_y = new_df['y']
pipeline.predict(new_X)
任何评论都会有所帮助!
g['Category'] = ""
for i in g['NumFloorsGroup']:
if i == '0-9' or i == '10-19':
g['Category'] = 'LowFl'
elif i == '50~':
g['Category'] = 'HighFl'
else:
g['Category'] = 'NormalFl'
仅返回LowFl的零件
<class 'pandas.core.frame.DataFrame'>
Int64Index: 596 entries, 128 to 595
Data columns (total 4 columns):
YearBuilt 596 non-null int64
NumFloorsGroup 596 non-null category
Count 596 non-null int64
Category 596 non-null object
dtypes: category(1), int64(2), object(1)
这会将所有类别返回为LowFl
bins = [0, 10, 20, 30, 40, 50, np.inf]
labels = ['0-9', '10-19', '20-29', '30-39', '40-49', '50~']
copy = original_data.copy()
copy['NumFloorsGroup'] = pd.cut(copy['NumFloors'], bins=bins, labels=labels, include_lowest=True)
g = (copy.groupby(['YearBuilt', 'NumFloorsGroup'])['YearBuilt']
.count()
.reset_index(name="Count")
.sort_values(by='Count', ascending=False))
答案 0 :(得分:2)
我建议使用新的bin和新的标签更改cut
函数,因为最好的方法是避免熊猫中的循环,因为如果存在某些矢量化函数,速度会很慢:
df = pd.DataFrame({'Floors':[0,1,10,19,20,25,40, 70]})
bins = [0, 10, 20, 30, 40, 50, np.inf]
labels = ['0-9', '10-19', '20-29', '30-39', '40-49', '50~']
df['NumFloorsGroup'] = pd.cut(df['Floors'],
bins=bins,
labels=labels,
include_lowest=True)
df['Category'] = pd.cut(df['Floors'],
bins=[0, 19, 50, np.inf],
labels=['LowFl','NormalFl','HighFl'],
include_lowest=True)
print (df)
Floors NumFloorsGroup Category
0 0 0-9 LowFl
1 1 0-9 LowFl
2 10 0-9 LowFl
3 19 10-19 LowFl
4 20 10-19 NormalFl
5 25 20-29 NormalFl
6 40 30-39 NormalFl
7 70 50~ HighFl
或者将map
与带有fillna
的字典一起用NaN
替换不在字典(NormalFl
s中)的值:
d = { "0-9": 'LowFl', "10-19": 'LowFl',"50+": 'HighFl'}
df['Category'] = df['NumFloorsGroup'].map(d).fillna('NormalFl')
答案 1 :(得分:1)
您可以尝试以下方法:
d = {
"0-9": 'LowFl',
"10-19": 'LowFl',
"10-19": '50~',
}
g['NumFloorsGroup'].map(lambda key: d.get(key, 'NormalFl'))
答案 2 :(得分:1)
您的解决方案不起作用的原因是您没有在数据帧上进行迭代。因此,要纠正您的解决方案,而不是直接将其分配给该列,而是将值附加到列表中,然后再将该列表分配给数据框。
category = []
for i in g['NumFloorsGroup']:
if i == '0-9' or i == '10-19':
category.append('LowFl')
elif i == '50~':
category.append('HighFl')
else:
category.append('NormalFl')
g.assign(category = category)