Question

问题非常类似于： Pandas sum integers separeted by commas in a string column

解决方案：df['B'].apply(lambda x: sum(map(int, x.split(','))))

除了系列在字符串中有冒号并且想要将其作为向量。

    A      B                                                        
0   1      0                                                        
1   2    3,1::4                                                        
2   3      1                                                        
3   4      3                                                        
4   5  2,1,2::5                                                        
5   6    2,1                                                        
6   7      0                                                        
7   8      0                                                        
8   9      0                                                        
9  10  4,3,1::8

我正在尝试拆分('::')并使用0，然后添加由","

分隔的第0个元素

Answer 1

设置

from StringIO import StringIO
import pandas as pd

text = """    A      B                                                        
0   1      0                                                        
1   2    3,1::4                                                        
2   3      1                                                        
3   4      3                                                        
4   5  2,1,2::5                                                        
5   6    2,1                                                        
6   7      0                                                        
7   8      0                                                        
8   9      0                                                        
9  10  4,3,1::8"""

df = pd.read_csv(StringIO(text), delim_whitespace=True, index_col=0)

解决方案

import re

def split_dbl_cln_sum_thingy(x):
    # remove :: and anything after
    # () captures whats inside as \1
    # ? tells the * operator not to be greedy
    x = re.sub(r'(.*?)::.*', r'\1', x)
    # split on commas, turn to ints, and sum up
    x = sum([int(i) for i in x.split(',')])
    return x

df.B.apply(split_dbl_cln_sum_thingy)

示范

print df.B.apply(split_dbl_cln_sum_thingy)

0    0
1    4
2    1
3    3
4    5
5    3
6    0
7    0
8    0
9    8

Answer 2

您可能需要某种正则表达式来获得正确的数字。这一个：

pattern = r'(?<!:)\d+'

matches所有未加冒号的数字。因此，您可以将其与apply：

连接起来

df['B'].str.findall(pattern).apply(lambda x : np.sum(list(map(int,x))))

Answer 3

我用.....解决了我想做第二行作为载体

dd = dd.str.split("::").str[0]

dd.apply(lambda x: sum(map(int, x.split(','))))

熊猫用一系列的逗号和冒号解析整数

3 个答案:

设置

解决方案

示范