Question

我正在使用Python来匹配列表（数组），但我确定问题在于正则表达式本身。

假设我有以下内容：

foo.html
bar.html
teep.html

我使用以下正则表达式：.*(?=.html)

其中.*将匹配任何内容，(?=.html)要求字符串存在，但不包含在结果中

因此，我应该留下.html

之前的内容

当我检查时，它只匹配数组中的第一个项目（在本例中为foo），但为什么不匹配其他项目

my_regex = re.compile('.html$')
r2 = re.compile('.*(?=.html)')
start = '/path/to/folder'
os.chdir(start)
a = os.listdir(start)
for item in a:
    if my_regex.search(item) != None and os.path.isdir(item):
        print 'FOLDER MATCH: '+ item # this is a folder and not a file
        starterPath = os.path.abspath(item)
        outer_file = starterPath + '/index.html'
        outer_js = starterPath + '/outliner.js'
        if r2.match(item) != None:
            filename = r2.match(item).group() # should give me evertying before .html
        makePage(outer_file, outer_js, filename) # self defined function
    else:
        print item + ': no'

Answer 1

filename = r2.match(item).group()

应该是

filename = r2.match(item).groups()  # plural !

根据文档，group将返回一个或多个子组，而groups将全部返回。

Answer 2

找出问题所在。在我的函数中，我更改了目录，但从未更改过。因此，当函数结束并返回for循环时，它现在正在错误的位置查找文件夹名称。它就像

一样简单

def makePage(arg1, arg2, arg3):
    os.chdir('path/to/desktop')
    # write file to new location
    os.chdir(start)  # go back to start and continue original search
    return

同样.group()为我工作并在字符串.html之前返回文件夹名称中的所有内容，而.groups()刚刚返回()

原帖中的代码保持不变。事情如此简单，引起所有这些头痛......

正则表达式与第一个结果不匹配

2 个答案: