我想扩展此数据框的“功能”列,以便创建一个新的数据框,其中这些功能成为列名。
例如。从此,
对此,
我的解决方案有效,但我认为它不是很好,因为有很多for循环。也许有一种更好的方法可以利用Pandas.DataFrame类的功能?
生成特征矩阵的代码如下,
def feature_data_frame_by_exploding_column(input_df, col_name):
# Create data frame with same columns minus the column you want to explode
df = input_df.copy()
del df[col_name]
# The items that you want to become new features
all_new_features = []
new_feature_list = input_df[col_name].values
for ingred_list in new_feature_list:
all_new_features.extend(ingred_list) # Extend vs append!
# Add new features as columns of zeros
for feature in all_new_features:
df[feature] = 0
# For each row in data frame set values that need to be 1
for index in df.index:
ingreds_arr = new_feature_list[index]
df.loc[index, ingreds_arr] = 1
return df
df = pd.DataFrame(columns = ["id", "features"])
df['id'] = [0,1]
df['features'] = [["A", "B"], ["C", "D"]]
df
feature_data_frame_by_exploding_column(df,"features")
答案 0 :(得分:1)
Scikit learn' s MultiLabelBinarizer根据标签创建二进制矩阵。您可以从pandas dataframe中提取=DATEDIF(M9,G19,"YM")
=IF(M9>G19,Yes,No)
列并应用它:
feature
此外,通过指定mlb = MultiLabelBinarizer()
new_array = mlb.fit_transform(feature)
,您将获得真正稀疏的输出(如果不同要素的数量很大,则非常有用)。
示例输出:
MultiLabelBinarizer(sparse_output=True)