我正在使用提供给我的代码对某些分类变量进行一次编码。此行添加一列0和1,其名称格式为prefix_categoricalValue
dataframe = pandas.concat([dataframe,pandas.get_dummies(dataframe[0], prefix='protocol')],axis=1).drop([0],axis=1)
我希望该列具有其索引的名称,而不是prefix_categoricalValue
。
我知道我可以做类似df.rename(columns={'prefix_categoricalValue': '0'}, inplace=True)
的事情,但是我不确定如何对所有带有该前缀的列都这样做。
这是数据框一部分的示例。无论我是否决定保留local_address前缀,每个类别都有其名称。可以使用索引重命名该列吗?
编辑:
我正在尝试这样做:
for column in dataframe:
dataframe.rename(columns={column: 'new_name'}, inplace=True)
print (column)
但是我不确定为什么它不起作用
答案 0 :(得分:1)
import pandas as pd
# 'dataframe' is the name of your data frame in the question, so that's what I use
# in my code below, although I suggest using 'data' or something for it instead,
# as 'DataFrame' is a keyword and its easy to make confusion. But anyway...
features = ['list of column names you want one-hot encoded']
# for example, features = ['Cars', 'Model, 'Year', ... ]
for f in features:
df = dataframe[[f]]
df2 = (pd.get_dummies(df, prefix='', prefix_sep='')
.max(level=0, axis=1)
.add_prefix(f+' - '))
# the new feature names will be "<old_feature_name> - <categorical_value>"
# for example, "Cars" will get transformed to "Cars - Minivan", "Cars - Truck", etc
# add the new one-hot encoded column to the dataframe
dataframe = pd.concat([dataframe, df2], axis=1)
# you can remove the original columns, if you don't need them anymore (optional)
dataframe = dataframe.drop([f], axis=1)
答案 1 :(得分:1)
假设您的前缀为local_address_0.0.0.0
。以下代码根据它们在数据框中显示的顺序将以您指定的前缀开头的列重命名为该列具有的索引:
prefix = 'local_address_0.0.0.0'
cols = list(dataframe)
for idx, val in enumerate(cols):
if val.startswith(prefix):
dataframe.rename(index=str, columns={val: idx}, inplace=True)
这将在控制台中显示警告:
python3.6/site-packages/pandas/core/frame.py:3027: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
return super(DataFrame, self).rename(**kwargs)
但这只是警告,数据帧的列名称已更新。如果您想了解有关警告的更多信息,请参见How to deal with SettingWithCopyWarning in Pandas?
如果有人知道如何做同样的事情而没有警告,请发表评论。
答案 2 :(得分:0)
IIUC
dummydf=pd.get_dummies(df.A)
dummydf.columns=['A']*dummydf.shape[1]
dummydf
Out[1171]:
A A
0 1 0
1 0 1
2 1 0
df
Out[1172]:
A B C
0 a b 1
1 b a 2
2 a c 3