我有这个数据框:
df = pd.DataFrame([['137', 'earn'], ['158', 'earn'],['144', 'ship'],['111', 'trade'],['132', 'trade']], columns=['value', 'topic'] )
print(df)
value topic
0 137 earn
1 158 earn
2 144 ship
3 111 trade
4 132 trade
我想要一个像这样的附加数字列:
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2
所以基本上我想生成一个将字符串列编码为数值的列。我实施了这个解决方案:
topics_dict = {}
topics = np.unique(df['topic']).tolist()
for i in range(len(topics)):
topics_dict[topics[i]] = i
df['topic_id'] = [topics_dict[l] for l in df['topic']]
然而,我确信有更优雅和熊猫的方法可以解决这个问题,但我无法在Google或SO上找到一些东西。 我读到了关于熊猫的事情。 get_dummies但这会为原始列中的每个不同值创建多个列。
我感谢任何方向的帮助或指针!
答案 0 :(得分:2)
选项1
pd.factorize
df['topic_id'] = pd.factorize(df.topic)[0]
df
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2
选项2
np.unique
_, v = np.unique(df.topic, return_inverse=True)
df['topic_id'] = v
df
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2
选项3
pd.Categorical
df['topic_id'] = pd.Categorical(df.topic).codes
df
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2
选项4
dfGroupBy.ngroup
df['topic_id'] = df.groupby('topic').ngroup()
df
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2
答案 1 :(得分:1)
您可以使用
In [63]: df['topic'].astype('category').cat.codes
Out[63]:
0 0
1 0
2 1
3 2
4 2
dtype: int8
答案 2 :(得分:0)
我们可以使用apply函数根据现有列创建新列。如下所示。
topic_list = list(df["topic"].unique())
df['topic_id'] = df.apply(lambda row: topic_list.index(row["topic"]),axis=1)
答案 3 :(得分:0)
可以使用for
循环和列表推导来确定代码列表:
ucols = pd.unique(df.topic)
df['topic_id'] = [ j
for i in range(len(df.topic))
for j in range(len(ucols))
if df.topic[i] == ucols[j] ]
print(df)
输出:
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2
答案 4 :(得分:-1)
试试此代码
df['topic_id'] = pd.Series([0,0,1,2,2], index=df.index)
效果很好
value topic
0 137 earn
1 158 earn
2 144 ship
3 111 trade
4 132 trade
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2