Question

我有一个数据框，其中txt列包含一个列表。我想使用函数clean_text（）清理txt列。

data = {'value':['abc.txt', 'cda.txt'], 'txt':['['2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart']',
                                               '['2019/02/01-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart']']}
df = pandas.DataFrame(data=data)
    df
 value    txt
 abc.txt  ['2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart']
 cda.txt  ['2019/02/01-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart']
def clean_text(text):
    """
    :param text:  it is the plain text
    :return: cleaned text
    """
    patterns = [r"^.{53}",
                r"[A-Za-z]+[\d]+[\w]*|[\d]+[A-Za-z]+[\w]*",
                r"[-=/':,?${}\[\]-_()>.~" ";+]"]

    for p in patterns:
        text = re.sub(p, '', text)

    return text

我的解决方案：

df['txt'] = df['txt'].apply(lambda x: clean_text(x))

但是我遇到了以下错误：错误

df['txt'] = df['txt'].apply(lambda x: clean_text(x))
AttributeError: 'list' object has no attribute 'apply'



clean_text(df['txt'][1]
TypeError: expected string or bytes-like object

我不确定在此问题中如何使用numpy.where。

Answer 1

基于对您的问题的修订以及评论中的讨论，我相信您需要使用以下行：

map<string, double> gerryMap

在这种方法中，df['txt'] = df['txt'].apply(lambda x: [clean_text(z) for z in x])与apply一起使用来循环lambda系列的每个元素，而一个简单的for循环（使用Python的列表推导表达）被用于迭代txt子列表中的每个项目上。

我已经用txt的以下值测试了该代码段：

data

以下是控制台输出的片段，显示了转换前后的数据帧：

data = {
    'value': [
        'abc.txt',
        'cda.txt',
    ],
    'txt':[
        [
            '2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart',
        ],
        [
            '2019/02/01-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart',
        ],
    ]
}

在列表Python上使用Apply时出错

1 个答案: