Question

我的实际问题需要在数据框中编码字符串，就像我在以下步骤中所做的那样：

import pandas as pd 
df = pd.DataFrame({"cool": list("ABC"), "not_cool": list("CBA")})
encoding = {"A": [0, 0, 1], "B": [0, 1, 0], "C": [1, 0, 0]}

编码：

df.applymap(encoding.get)

现在，我所拥有的是一个数据框，其中的元素是列表：

cool       not_cool
[0, 0, 1]  [1, 0, 0]
[0, 1, 0]  [0, 1, 0]
[1, 0, 0]  [0, 0, 1]

我需要将其扩展为矩阵。怎么做？我的第一个想法是遍历行并应用numpy.hstack进行连接，存储它和numpy.vstack存储的行，但它不能按预期工作。另一种方法是在此数据框中创建一个新数据框，其中每列将是列表的第n个元素。如果我有这个数据框，pandas.DataFrame.values将得到我需要的东西：

1, 2, 3, 4, 5, 6  # Column names
0, 0, 1, 1, 0, 0
0, 1, 0, 0, 1, 0
1, 0, 0, 0, 0, 1

Answer 1

快速回答：

x = df.applymap(encoding.get)
(x.cool+x.not_cool).values  # gives you matrix without the headers
# should be elementary to get labels you need in there

这会将两列相加（添加列表实际上将它们连接起来）。这些值只是获得列表数组。

更新@mithrado评论

pd.DataFrame(np.vstack((x.cool+x.not_cool).values), columns=range(6))]
# will give you a dataframe with the required values

您似乎要求将列作为DataFrame中的另一行？你为什么要那样？

编码后扩展Pandas.dataframe

1 个答案: