Question

我有一个如下所示的df：

user_index  movie_index  genre_index          cast_index
3590        1514         10|12|17|35          46|534
63          563          4|2|1|8              9|27

，是从以下位置生成的：

import pandas as pd
ds = pd.DataFrame({'user_index': [3590,63], 'movie_index': [1514,563], 
'genre_index':['10|12|17|35', '4|2|1|8'], 'cast_index':['46|534', '9|27']})

我需要用'|'分隔每一行（而将每一行都转换为列表），并向每个元素添加一些值以获得这样的df（此处，在'genre_index'列中逐元素添加'5'，在'user_index'列中逐元素添加'2' ）：

    user_index  movie_index  genre_index          cast_index
    [3592]      [1514]       [15,17,22,38]        [46,534]
    [65]        [563]        [9,7,6,13]            [9,27]

为实现这一点，我创建了一个将列作为参数的函数，将其拆分并逐元素添加一个值（我不将'df'作为参数，因为每个列的附加值都会有所不同）像这样：

def df_convertion(input_series, offset):
    column = input_series.str.split('|', expand=False).apply(lambda x: x + offset)
    return (column)

但是很明显，整个事情并没有按要求工作（我已经尝试过'genre_index'列）并返回这样的错误：

TypeError: can only concatenate list (not "int") to list

在修复它方面的任何帮助将不胜感激！

Answer 1

这是我建议使用apply的那些罕见情况之一。尝试看看是否可以对数据使用其他表示形式。

offset_dct = {'user_index': 2, 'genre_index': 5}
df = df.fillna('').astype(str).apply(lambda x: [
    [int(z) + offset_dct.get(x.name, 0) for z in y.split('|')] for y in x])

df
  cast_index       genre_index movie_index user_index
0  [46, 534]  [15, 17, 22, 40]      [1514]     [3592]
1    [9, 27]     [9, 7, 6, 13]       [563]       [65]

分割df中的每一行并为每个元素添加值

1 个答案: