pandas:二进制编码pandas列中的一组值

时间:2017-02-02 06:19:51

标签: python python-3.x pandas

我有以下数据框my_df:

Name      cards
------------------
John      {A,B}
Mary      {B,C,A}
Dan       {D,A}
Peter     {C,A}
Ed        {A,C,D}

我想对设定值进行二进制编码,即我希望输出如下:

Name     Card_A    Card_B    Card_C   Card_D
--------------------------------------------
John      1          1         0        0
Mary      1          1         1        0
Dan       1          0         0        1
Peter     1          0         1        0
Ed        1          0         1        1

是否有现有的python包?或者最好的方法是什么?谢谢!

2 个答案:

答案 0 :(得分:3)

首先将tag转换为set,然后按strip删除str

然后str.get_dummies

上次add_prefix

{}

另一种替代解决方案:

df = pd.DataFrame({'Name':['John','Mary','Dan','Peter','Ed'],
                   'cards':[set(['A','B']), set(['B','C','A']), 
                            set(['D','A']), set(['C','A']), set(['A','C','D'])]})

print (df)
    Name      cards
0   John     {A, B}
1   Mary  {A, C, B}
2    Dan     {A, D}
3  Peter     {A, C}
4     Ed  {A, D, C}

df.cards = df.cards.astype(str).str.strip('{}')
df = df.set_index('Name').cards.str.get_dummies(', ')
df.columns = df.columns.str.strip("'")
df = df.add_prefix('Card_').reset_index()

print (df)
    Name  Card_A  Card_B  Card_C  Card_D
0   John       1       1       0       0
1   Mary       1       1       1       0
2    Dan       1       0       0       1
3  Peter       1       0       1       0
4     Ed       1       0       1       1

答案 1 :(得分:3)

如果cards列为set s

df = pd.DataFrame({'Name':['John','Mary','Dan','Peter','Ed'],
                   'cards':[set(['A','B']), set(['B','C','A']), 
                            set(['D','A']), set(['C','A']), set(['A','C','D'])]})


df[['Name']].join(
    df.cards.apply(
        lambda x: pd.value_counts(list(x))
    ).fillna(0).astype(int).add_prefix('Card_')
)

enter image description here

如果cards列为str ,则

只是为了展示使用str.extractall

进行解析

使用str.extractallapply value_counts

解析它
df[['Name']].join(
    df.cards.str.extractall('([^\{\}, ]+)')[0].groupby(level=0).apply(
        pd.value_counts).unstack(fill_value=0).add_prefix('Card_')
)

enter image description here