Question

我有一个包含逗号分隔字符串的列的数据框。我想要做的是用逗号分隔它们，计算它们并将计数的数字附加到新的数据框。如果列包含只包含一个元素的列表，我想区分它是字符串还是整数。如果它是一个整数，我想将该行中的值0附加到新的df。我的代码如下：

def decide(dataframe):
    df=pd.DataFrame()

    for liste in DataFrameX['Column']:
        x=liste.split(',')
        if len(x) > 1:
            df.append(pd.Series([len(x)]), ignore_index=True)
        else:
            #check if element in list is int
            for i in x:
                try:
                    int(i)
                    print i
                    x = []

                    df.append(pd.Series([int(len(x))]), ignore_index=True)
                except:
                    print i
                    x = [1]
                    df.append(pd.Series([len(x)]), ignore_index=True)
    return df

输入数据如下所示：

   C1  
0  a,b,c
1  0
2  a
3  ab,x,j

如果我现在以原始数据帧作为输入运行该函数，它将返回一个空数据帧。通过try / except语句中的print语句，我可以看到一切正常。问题是将结果值附加到新数据帧。我需要在代码中更改什么？如果可能的话，请不要提供完全不同的解决方案，但请告诉我在代码中我做错了所以我可以学习。

****************** UPDATE **************************** ********

我编辑了代码，以便可以将其称为lambda函数。它现在看起来像这样：

def decide(x):
    For liste in DataFrameX['Column']:

        x=liste.split(',')
        if len(x) > 1:
            x = len(x)
            print x
        else:
            #check if element in list is int
            for i in x:
                try:
                    int(i)
                    x = []
                    x = len(x)
                    print x

                except: 
                    x = [1]
                    x = len(x)
                    print x

我称之为：

df['Count']=df['C1'].apply(lambda x: decide(x))

它会输出正确的值，但新列只包含None。

任何想法为什么？

Answer 1

这是一个良好的开端，可以简化，但我认为它可以按预期工作。

#I have a dataframe with a column containing comma separated strings.
df = pd.DataFrame({'data': ['apple, peach', 'banana, peach, peach, cherry','peach','0']})

# What I want to do is separate them by comma, count them and append the counted number to a new data frame.
df['data'] = df['data'].str.split(',')
df['count'] = df['data'].apply(lambda row: len(row))
# If the column contains a list with only one element
df['first'] = df['data'].apply(lambda row: row[0])
# I want to differentiate wheather it is a string or an integer
df['first'] = pd.to_numeric(df['first'], errors='coerce')
# if the element in x is an integer, len(x) should be set to zero 
df.loc[pd.notnull(df['first']), 'count'] = 0
# Dropping temp column
df.drop('first', 1, inplace=True)
df

                                data  count
0                    [apple,  peach]      2
1  [banana,  peach,  peach,  cherry]      4
2                            [peach]      1
3                                [0]      0

使用.append（）

1 个答案: