Python - 以递归方式迭代所有文本文件

时间:2017-09-12 16:35:01

标签: python list directory

我正在用python 3.6创建一个文本解析器。我有一个如下文件布局:

(我将使用的真实文件结构比这更广泛。)

-Directory(main folder)
    -amerigroup.txt
    -bcbs.txt
    childfolder
         -medicare.txt

我需要将文本提取到两个不同的列表中(通过并附加到我不断增长的列表中)。每当我运行当前代码时,我似乎无法让我的程序打开我的medicare.txt文件来读取和提取信息。我收到一条错误消息,指出没有这样的文件或目录:' medicare.txt'。

我的目标是从3个文件中获取数据并一次性提取。如何获取amerigroup和bcbs数据然后进入子文件夹并获取medicare.txt,然后对我文件路径的所有分支重复该操作?

我只是试图在此代码段中打开和关闭我的文本文件。这就是我到目前为止所拥有的:

import re
import os
import pandas as pd

#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
#rootdir = r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest'

#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')

claimids = []
dxinfo = []

for dirpath, dirnames, files in os.walk(topdir):
    for name in files:
        cid = []
        dx = []
        if name.lower().endswith(exten):
            data = open(name, 'r')
            data.close()

非常感谢您花时间协助我!

编辑:到目前为止,我尝试使用walk无济于事。我最近的尝试(我尝试使用txtfile_full_path也没有用):

for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        #defining file type
        txtfile=open(filename,"r")
        txtfile_full_path = os.path.join(dirpath, filename)
        print(filename)

edit2对任何有兴趣的人。这是我对这个问题的最终解决方案:

import re
import os
import pandas as pd


#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
base_dir = (r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')

#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')

claimids = []
dxinfo = []

for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        txtfile_full_path = os.path.join(dirpath, filename)
        x12 = open(txtfile_full_path, 'r')
        for i in x12:
            match = claimidRegex.findall(i)
            for word in match:
                claimids.append(word[1])
        x12.seek(0)
        for i in x12:
            match = dxRegex.findall(i)
            for word in match:
                dxinfo.append(word)
        x12.close()

datadic = dict(zip(claimids, dxinfo))

1 个答案:

答案 0 :(得分:0)

您需要将完整路径传递给open。只是在某处创建一个字符串变量对你没有任何作用!因此,以下内容应避免您的错误:

txt_list = []
for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        # create full path
        txtfile_full_path = os.path.join(dirpath, filename)
        with open(txtfile_full_path) as f:
            txt_list.append(f.read())

现在应该很容易根据你的正则表达式整合隔离......