Question

如果我有列数据，例如：

           value
1    [a_1, a_342, a_452]   
2    [a_5, a_99]   
3    [a_482, a_342, a_452, a_888]

我需要将该列修剪为：

           value
1    [1, 342, 452]   
2    [5, 99]   
3    [482, 342, 452, 888]

基本上，我想删除a_并使列的每个条目成为整数列表

我尝试使用基于 pandas python包的replace和map函数，但这些都不起作用。

对于列中的单个条目，例如：

    value
1    a_1 
2    a_5  
3    a_99

我可以使用类似df['value'] = df['value'].str[2:].astype(int)的内容，但是，这不适用于上面的字符串列表。

非常感谢您能否提出我的建议。提前谢谢。

Answer 1

使用：

#get list of strings
df['value'] = df['value'].astype(str).str.findall('\d+')
#convert them to ints
df['value'] = [[int(i) for i in x] for x in df['value']]
#alternative
#df['value'] = [list(map(int, x)) for x in df['value']]
print (df)
                  value
1         [1, 342, 452]
2               [5, 99]
3  [482, 342, 452, 888]

列表推导的解决方案：

import re

df['value'] = [[int(re.findall('\d+', i)[0]) for i in x] for x in df['value']]
print (df)
                  value
1         [1, 342, 452]
2               [5, 99]
3  [482, 342, 452, 888]

替代：

df['value'] = [[int(re.search('\d+', i).group()) for i in x] for x in df['value']]

replace的正则表达式中sub的解决方案：

df['value'] = [[int(re.sub('[_a]', '', i)) for i in x] for x in df['value']]

Answer 2

选项1

为了简化生活，只需转换为str，使用 str.replace ，然后对结果应用 ast.literal_eval 。< / p>

import ast

df['value'] = df['value'].astype(str).str.replace('a_', '')\
           .apply(lambda x: [int(y) for y in ast.literal_eval(x)])
df 

                  value
1         [1, 342, 452]
2               [5, 99]
3  [482, 342, 452, 888]

选项2

使用 df.extractall

df['value'] = df['value'].astype(str).str.extractall('(\d+)').unstack()\
                              .apply(lambda x: list(x.dropna().astype(int)), 1)
df 

                  value
1         [1, 342, 452]
2               [5, 99]
3  [482, 342, 452, 888]

df['value'].tolist()
[[1, 342, 452], [5, 99], [482, 342, 452, 888]]

从列表的数据帧列中的字符串中提取整数部分

2 个答案: