Question

我在Pandas中有一个数据框，我们称它为df。它包含以下列：

ID-这是列ID号
文件-包含文件名列表

例如：

ID         Files
1       [12, 15, 19] 
2       [15, 18, 103]

以此类推。列表的每个元素对应于一个具有相同名称的文本文件，因此“ 12”对应于“ 12.txt”。

我想要做的是创建一个名为“ Content”的第三列，该列将列表中每个文件中的文本都收集起来，并将其全部串联在一起并放入该列中。我正在尝试for循环，但想知道是否有一种更有效的方法。

谢谢。

Answer 1

将自定义函数与Series.apply一起使用，并以纯python格式读取文件（像熊猫一样快）：

import ast

def f(x):
    out = []
    path = 'files/'
    #if necessary convert string repr of lists to lists
    x = ast.literal_eval(x)
    for file in x:
        with open('{}{}.txt'.format(path, file)) as f:
            c = ' '.join(f.readlines())
            out.append(c)
    return ' '.join(out)


df['content'] = df['Files'].apply(f)
print (df)
   ID          Files              content
0   1   [12, 15, 19]        I like pandas
1   2  [15, 18, 103]  like something else

Python从文本文件填充熊猫中的数据框

1 个答案: