pandas数据帧中元素和子集的总长度

时间:2018-03-15 21:09:47

标签: python pandas dataframe variable-length

如何计算数据框中的总元素,包括子集,并将结果放入新列?

import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]], \
              index=range(1, len(x)+1))
df = pd.DataFrame({'A': x})

我尝试使用以下代码,但每行中都有2个:

df['Length'] = df['A'].apply(len)

print(df)

                         A  Length
    1       [1, (2, 5, 6)]       2
    2          [2, (3, 4)]       2
    3               [3, 4]       2
    4  [(5, 6), (7, 8, 9)]       2

但是,我想得到的是:

                         A  Length
    1       [1, (2, 5, 6)]       4
    2          [2, (3, 4)]       3
    3               [3, 4]       2
    4  [(5, 6), (7, 8, 9)]       5

感谢

3 个答案:

答案 0 :(得分:1)

假设:

import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]])
df = pd.DataFrame({'A': x}) 

您可以编写一个递归生成器,为每个不可迭代的嵌套元素生成1。这些方面的东西:

import collections 

def glen(LoS):
    def iselement(e):
        return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
    for el in LoS:
        if iselement(el):
            yield 1
        else:
            for sub in glen(el): yield sub    

df['Length'] = df['A'].apply(lambda e: sum(glen(e)))

产量:

>>> df
                     A  Length
0       [1, (2, 5, 6)]       4
1          [2, (3, 4)]       3
2               [3, 4]       2
3  [(5, 6), (7, 8, 9)]       5

这将适用于Python 2或3.使用Python 3.3或更高版本,您可以使用yield from替换循环:

def glen(LoS):
    def iselement(e):
        return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
    for el in LoS:
        if iselement(el):
            yield 1
        else:
            yield from glen(el) 

答案 1 :(得分:0)

使用itertools

df['Length'] = df['A'].apply(lambda x: len(list(itertools.chain(*x))))

答案 2 :(得分:0)

您可以尝试使用此功能,它是递归的,但它可以工作:

def recursive_len(item):
    try:
       iter(item)
       return sum(recursive_len(subitem) for subitem in item)
    except TypeError:
       return 1

然后以这种方式调用apply函数:

df['Length'] = df['A'].apply(recursive_len)