Question

我有一个xlsx文件，看起来像这个表：

name                 place               link 
terry mike           12<longbeach>       http://www.google.com/xyz
jessica mike         12<longbeach>       http://www.google.com/abc
amanda albert        189<washington>     http://www.amazone.com/cali
terry mike           189<longbeach>      http://www.google.com/nnn

我在python中使用Pandas作为数据框导入，并且能够使用以下方法计算每个列的唯一字符串：

len(df.Name.unique())

所以对于名字我有 3 这是正确的：特里迈克，杰西卡迈克，阿曼达艾伯特

问题如何使用pandas dataframe将这3个cloumns转换为二进制唯一字符串，以便Name列输出如下：

Name    place   link
001     001     001
010     001     010
011     010     011
001     011     100

这样每个单元格都是唯一的我知道我在每个列中有多少不同的字符串len(df.Name.unique())在这种情况下为命名为3 为将其3 和链接为4

有一种简单的方法可以用pandas

来做

感谢。

Answer 1

如果你真的不关心'Terry mike'是否获得'001'，你可以使用geom_bar dtype：

graph<-ggplot(df, aes(x=DATE, y= Revenue)) + 
       geom_bar(aes( fill = DATE), stat = "identity") 
ggplotly(graph)

如果你想要3位数的二进制文件：

category

Answer 2

factors, idx = pd.factorize(df.name)
width = int(np.log2(factors.max()) // 1 + 1)

pd.Series(factors, df.index).apply(np.binary_repr, width=width)

0    00
1    01
2    10
3    00
dtype: object

python

2 个答案: