Question

我有一组（大）XML文件，我想搜索一组字符串 - 我正在尝试使用以下Python代码来执行此操作：

[:-1]

其中Strings.txt是一个包含我感兴趣的字符串的文件，第一行是架构URI。

这似乎一开始就运行正常，但经过一段时间后给了我一个：

import collections

thestrings = []
with open('Strings.txt') as f:
  for line in f:
    text = line.strip()
    thestrings.append(text)

print('Searching for:')
print(thestrings)
print('Results:')

try:
  from os import scandir
except ImportError:
  from scandir import scandir

def scantree(path):
  """Recursively yield DirEntry objects for given directory."""
  for entry in scandir(path):
    if entry.is_dir(follow_symlinks=False) and (not entry.name.startswith('.')):
      yield from scantree(entry.path)
    else:
      yield entry

if __name__ == '__main__':
  for entry in scantree('//path/to/folder'):
    if ('.xml' in entry.name) and ('.zip' not in entry.name):
      with open(entry.path) as f:
        data = f.readline()
        if (thestrings[0] in data):
          print('')
          print('****** Schema found in: ', entry.name)
          print('')
          data = f.read()
          if (thestrings[1] in data) and (thestrings[2] in data) and (thestrings[3] in data):
            print('Hit at:', entry.path)

  print("Done!")

这让我很困惑，因为路径是在运行时构建的？

注意，如果我按如下方式设置代码：

FileNotFoundError: [WinError 3] The system cannot find the path specified: //some/path

成为：

with open(entry.path) as f:
  data = f.readline()
  if (thestrings[0] in data):

然后我看到在错误发生之前找到了许多潜在的文件。

Answer 1

我意识到我的脚本正在查找一些非常长的UNC路径名，对于Windows来说太长了，所以我现在也在尝试打开文件之前检查路径长度，如下所示：

if name.endswith('.xml'):
  fullpath = os.path.join(root, name)
  if (len(fullpath) > 255): ##Too long for Windows!
    print('File-extension-based candidate: ', fullpath)
  else:
    if os.path.isfile(fullpath):
      with open(fullpath) as f:
        data = f.readline()
        if (thestrings[0] in data):
          print('Schema-based candidate: ', fullpath)

注意，我还决定检查文件是否真的是一个文件，并且我改变了我的代码以使用os.walk，如上所述。同时使用.endswith（）

简化对.xml文件扩展名的检查

现在一切似乎都可行......

os.scandir给出[WinError 3]系统找不到指定的路径

1 个答案: