答案 0 :(得分:5)
来源DF:
In [204]: df
Out[204]:
Country
0 Italy
1 Indonesia
2 Canada
3 Italy
我们可以使用pd.get_dummies():
In [205]: pd.get_dummies(df.Country)
Out[205]:
Canada Indonesia Italy
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
或sklearn.feature_extraction.text.CountVectorizer:
In [211]: from sklearn.feature_extraction.text import CountVectorizer
In [212]: cv = CountVectorizer()
In [213]: r = pd.SparseDataFrame(cv.fit_transform(df.Country),
columns=cv.get_feature_names(),
index=df.index,
default_fill_value=0)
In [214]: r
Out[214]:
canada indonesia italy
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
答案 1 :(得分:3)
其他几个选项
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
// Configure model
foreach (var entityType in modelBuilder.Model.GetEntityTypes())
{
foreach (var declaredForeignKey in entityType.GetDeclaredForeignKeys())
{
declaredForeignKey.Relational().Name = "<Construct_FK_Name>";
}
}
}
pd.Series.str.get_dummies
df.Country.str.get_dummies()
Canada Indonesia Italy
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
pd.DataFrame.groupby
value_counts
df.groupby(level=0).Country.value_counts().unstack(fill_value=0)
Country Canada Indonesia Italy
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
+ pd.factorize
np.bincount
f, u = pd.factorize(df.Country.values)
pd.DataFrame(
np.bincount(
f + np.arange(f.size) * u.size, minlength=u.size * f.size
).reshape(f.size, u.size),
df.index, u
)
Italy Indonesia Canada
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
+ pd.factorize
np.eye
f, u = pd.factorize(df.Country.values)
pd.DataFrame(np.eye(u.size, dtype=int)[f], df.index, u)
Italy Indonesia Canada
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
+数组切片分配
pd.factorize