将两个不同数据帧的列强制转换为相同的数据类型

时间:2017-08-04 23:36:10

标签: python pandas format

我有两个结构如下的数据框:

print(product_combos1.head(n=5))
             product_id  count  Length
0            (P06, P09)  36340       2
1  (P01, P05, P06, P09)  10085       4
2            (P01, P06)  36337       2
3            (P01, P09)  49897       2
4            (P02, P09)  11573       2

print(testing_df.head(n=5))
                     product_id  Length
transaction_id                         
001                       [P01]       1
002                  [P01, P02]       2
003             [P01, P02, P09]       3
004                  [P01, P03]       2
005             [P01, P03, P05]       3

如何强制testing_df的“product_id”列,使其格式与product_combos1 df中的列格式相同? (即 - 括号而不是括号)

1 个答案:

答案 0 :(得分:1)

python元组显示在括号中。列表显示在括号中。

更改数据框

testing_df['product_id'] = testing_df['product_id'].apply(tuple)
testing_df 

                     product_id  Length
transaction_id                         
1                        (P01,)       1
2                    (P01, P02)       2
3               (P01, P02, P09)       3
4                    (P01, P03)       2
5               (P01, P03, P05)       3

制作副本

testing_df.assign(product_id=testing_df.product_id.apply(tuple))

                     product_id  Length
transaction_id                         
1                        (P01,)       1
2                    (P01, P02)       2
3               (P01, P02, P09)       3
4                    (P01, P03)       2
5               (P01, P03, P05)       3

除非当然,这些实际上是字符串。然后用括号替换括号。

testing_df.assign(product_id=testing_df.product_id.str.replace('\[(.*)\]', r'(\1)'))

                     product_id  Length
transaction_id                         
1                         (P01)       1
2                    (P01, P02)       2
3               (P01, P02, P09)       3
4                    (P01, P03)       2
5               (P01, P03, P05)       3