Question

我有一个带有ID和Prod ID的表，例如：

       _id          _push_product_id_
0   4   43704
1   7   6361 | 6361 | 6361 | 6361
2   9   30252 | 30252 | 8467 | 38988
3   11  18987 | 17706 | 19543 | 33037
4   12  20144 | 7120

我希望它像：

     _id  product_id   count
 0    4      43704       1
 1    7      6361        4

我尝试这样做：

data = pd.melt(transactions.set_index('_id')['_push_product_id_'].apply(pd.Series).reset_index(), 
             id_vars=['_id'],
             value_name='_push_product_id_') \
    .dropna().drop(['variable'], axis=1) \
    .groupby(['_id', '_push_product_id_']) \
    .agg({'_push_product_id_': 'count'}) \
    .rename(columns={'_push_product_id_': 'purchase_count'}) \
    .reset_index() \
    .rename(columns={'_push_product_id_': 'productId'})
data['productId'] = data['productId'].astype(int)

，这将导致错误：以10为底的int（）的无效文字：'6361 | 6361 | 6361 | 6361'

data = pd.melt(transactions.set_index('_id')['_push_product_id_'].apply(pd.Series).reset_index(), 
             id_vars=['_id'],
             value_name='_push_product_id_') \
    .dropna().drop(['variable'], axis=1) \
    .groupby(['_id', '_push_product_id_']) \
    .agg({'_push_product_id_': 'count'}) \
    .rename(columns={'_push_product_id_': 'purchase_count'}) \
    .reset_index() \
    .rename(columns={'_push_product_id_': 'productId'})
data['productId'] = data['productId'].astype(int)

以int（）为基数10的无效文字：'6361 | 6361 | 6361 | 6361'是错误消息

Answer 1

您的_push_product_id_列是一个str值，但您需要将其列为int的列表才能使apply(pd.Series)起作用。

我们可以通过在该列上使用apply()来实现此目的，并具有合适的功能来拆分'|'上的值，并将结果的每个元素转换为一个int：

def split_ints(s):
    return [int(i) for i in s.split(' | ')]

# testing:
print(split_ints('18987 | 17706 | 19543 | 33037'))

# [18987, 17706, 19543, 33037]

然后，您可以使用它来转换数据框中的列：

transactions['_push_product_id_'] = transactions['_push_product_id_'].apply(split_ints)

现在，您的代码将产生更接近所需结果的结果：

   _id  productId  purchase_count
0   11      17706               1
1   11      18987               1
2   11      19543               1
3   11      33037               1
4   12       7120               1
5   12      20144               1
6    4      43704               1
7    7       6361               4
8    9       8467               1
9    9      30252               2
10   9      38988               1

无法从字符串转换为数据中的int列

1 个答案: