在循环理解内使用open()-获取目录中所有文件的文本内容列表

时间:2019-01-16 21:54:29

标签: python pandas

在for循环内使用with open(file) as f: f.read()机制是否有更好的方法-即对许多文件进行操作的循环理解?

我正在尝试将其放置在数据框中,以便从文件到文件内容进行映射。

这里是我所拥有的-但这似乎效率不高,而且不是pythonic /可读的:

documents = pd.DataFrame(glob.glob('*.txt'), columns = ['files'])
documents['text'] = [np.nan]*len(documents)
for txtfile in documents['files'].tolist():
    if txtfile.startswith('GSE'):
        with open(txtfile) as f:
            documents['text'][documents['files']==txtfile] = f.read()

输出:

    files   text
0   GSE2640_GSM50721.txt    | RNA was extracted from lung tissue using a T...
1   GSE7002_GSM159771.txt   Array Type : Rat230_2 ; Amount to Core : 15 ; ...
2   GSE1560_GSM26799.txt    | C3H denotes C3H / HeJ mice whereas C57 denot...
3   GSE2171_GSM39147.txt    | HIV seropositive , samples used to test HIV ...

2 个答案:

答案 0 :(得分:2)

您的代码看起来完全可读。 也许您正在寻找这样的东西(仅适用于Python3):

import pathlib

documents = pd.DataFrame(glob.glob('*.txt'), columns = ['files'])
documents['text'] = documents['files'].map(
    lambda fname: fname.startswith('GSE') and pathlib.Path(fname).read_text())

答案 1 :(得分:0)

您可以这样做:

# import libraries
import os,pandas

# list filenames, assuming your path is './'
files = [i for i in os.listdir('./') if i[:3]=='GSE' and i[-3:]=='txt']

# get contents of files
contents = []
for i in files:
    with open(i) as f: contents.append(f.read().strip())

# into a nice table 
table = pandas.DataFrame(contents, index=files, columns=['text'])