我正在用python 3.6创建一个文本解析器。我有一个如下文件布局:
(我将使用的真实文件结构比这更广泛。)
-Directory(main folder)
-amerigroup.txt
-bcbs.txt
childfolder
-medicare.txt
我需要将文本提取到两个不同的列表中(通过并附加到我不断增长的列表中)。每当我运行当前代码时,我似乎无法让我的程序打开我的medicare.txt文件来读取和提取信息。我收到一条错误消息,指出没有这样的文件或目录:' medicare.txt'。
我的目标是从3个文件中获取数据并一次性提取。如何获取amerigroup和bcbs数据然后进入子文件夹并获取medicare.txt,然后对我文件路径的所有分支重复该操作?
我只是试图在此代码段中打开和关闭我的文本文件。这就是我到目前为止所拥有的:
import re
import os
import pandas as pd
#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
#rootdir = r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest'
#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')
claimids = []
dxinfo = []
for dirpath, dirnames, files in os.walk(topdir):
for name in files:
cid = []
dx = []
if name.lower().endswith(exten):
data = open(name, 'r')
data.close()
非常感谢您花时间协助我!
编辑:到目前为止,我尝试使用walk无济于事。我最近的尝试(我尝试使用txtfile_full_path也没有用):
for dirpath, dirnames, filename in os.walk(base_dir):
for filename in filename:
#defining file type
txtfile=open(filename,"r")
txtfile_full_path = os.path.join(dirpath, filename)
print(filename)
edit2对任何有兴趣的人。这是我对这个问题的最终解决方案:
import re
import os
import pandas as pd
#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
base_dir = (r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')
claimids = []
dxinfo = []
for dirpath, dirnames, filename in os.walk(base_dir):
for filename in filename:
txtfile_full_path = os.path.join(dirpath, filename)
x12 = open(txtfile_full_path, 'r')
for i in x12:
match = claimidRegex.findall(i)
for word in match:
claimids.append(word[1])
x12.seek(0)
for i in x12:
match = dxRegex.findall(i)
for word in match:
dxinfo.append(word)
x12.close()
datadic = dict(zip(claimids, dxinfo))
答案 0 :(得分:0)
您需要将完整路径传递给open
。只是在某处创建一个字符串变量对你没有任何作用!因此,以下内容应避免您的错误:
txt_list = []
for dirpath, dirnames, filename in os.walk(base_dir):
for filename in filename:
# create full path
txtfile_full_path = os.path.join(dirpath, filename)
with open(txtfile_full_path) as f:
txt_list.append(f.read())
现在应该很容易根据你的正则表达式整合隔离......