Question

使用以下代码：

for root, dirs, files in os.walk(corpus_name):
    for file in files:
        if file.endswith(".v4_gold_conll"):
            f= open(file)
            lines = f.readlines()
            tokens = [line.split()[3] for line in lines if line.strip() 
and not line.startswith("#")]
    print(tokens)

我收到以下错误：

回溯（最近一次调用最后一次）：文件＆＃34; text_statistics.py＆＃34;，行 28，在 corpus_reading_pos（corpus_name，option）文件＆＃34; text_statistics.py＆＃34;，第13行，在corpus_reading_pos中 f = open（file）FileNotFoundError：[Errno 2]没有这样的文件或目录：＆＃39; abc_0001.v4_gold_conll＆＃39;

正如您所看到的，该文件实际上位于，但是当我尝试打开该文件时，它无法找到它？

编辑：使用这个更新的代码，它在读取7个文件后停止，但有172个文件。

def corpus_reading_token_count(corpus_name, option="token"):
    for root, dirs, files in os.walk(corpus_name):
        tokens = []
        file_count = 0
        for file in files:
            if file.endswith(".v4_gold_conll"):
                with open((os.path.join(root, file))) as f:
                    tokens += [line.split()[3] for line in f if line.strip() and not line.startswith("#")]
                    file_count += 1
    print(tokens)
    print("File count:", file_count)

Answer 1

file只是没有目录的文件，代码中为root。试试这个：

f = open(os.path.join(root, file)))

此外，您最好使用with打开文件，而不要使用file作为变量名，遮蔽内置类型。另外，根据您的评论判断，您应该扩展令牌列表（使用+=代替=）：

tokens = []
for root, dirs, files in os.walk(corpus_name):
    for filename in files:
        if filename.endswith(".v4_gold_conll"):
            with open(os.path.join(root, filename))) as f:
                tokens += [line.split()[3] for line in f if line.strip() and not line.startswith("#")]
print(tokens)

Answer 2

您必须使用文件名加入root。

for root, dirs, files in os.walk(corpus_name):
    for file in files:
        if file.endswith(".v4_gold_conll"):
            with open(os.path.join(root, file)) as f:
            tokens = [
                line.split()[3]
                for line in f
                if line.strip() and not line.startswith("#")
            ]
            print(tokens)

找不到文件错误，即使找到了文件

2 个答案: