Question

我有这个代码应该打开并读取两个文本文件，并在两个文本都存在时匹配。通过打印“SUCESS”并将该单词写入temp.txt文件来表示匹配。

dir = open('listac.txt','r')
path = open('paths.txt','r')
paths = path.readlines()
paths_size = len(paths)
matches = open('temp.txt','w')
dirs = dir.readlines()

for pline in range(0,len(paths)):
        for dline in range(0,len(dirs)):
                p = paths[pline].rstrip('\n').split(".")[0].replace(" ", "")
                dd = dirs[dline].rstrip('\n').replace(" ", "")
                #print p.lower()
                #print dd.lower()
                if (p.lower() == dd.lower()):
                        print "SUCCESS\n"
                        matches.write(str(p).lower() + '\n')

listac.txt的格式为

/teetetet
/eteasdsa
/asdasdfsa
/asdsafads
.
. 
...etc

paths.txt格式化为

/asdadasd.php/asdadas/asdad/asd
/adadad.html/asdadals/asdsa/asd
.
.
...etc

因此我使用split函数来获得点之前的第一个/ asadasda（在paths.txt内）。问题是单词似乎永远不匹配，我甚至在每个IF语句之前打印出每个比较并且它们是相同的，Python在比较字符串之前还有其他的东西吗？

=======

感谢大家的帮助。正如你所建议的那样，我清理了代码，结果就像这样：

dir = open('listac.txt','r')
path = open('paths.txt','r')
#paths = path.readlines()
#paths_size = len(paths)

for line in path:
        p = line.rstrip().split(".")[0].replace(" ", "")
        for lines in dir:
                d = str(lines.rstrip())
                if p == d:
                        print p + " = " + d

显然，在进入第二个for循环之前声明并初始化是一个不同之处。当我在第二个for循环中声明p和d时，它将无效。我不知道原因，但如果有人这样做，我正在听：）

再次感谢！

Answer 1

虽然我们正在将整个数据文件读入内存，但为什么不尝试使用sets来获取交集？：

def format_data(x):
    return x.rstrip().replace(' ','').split('.')[0].lower()

with open('listac.txt') as dirFile:
     dirStuff = set( format_data(dline) for dline in dirFile )

with open('paths.txt') as pathFile:
     intersection = dirStuff.intersection( format_data(pline) for pline in pathFile )

for elem in intersection:
    print "SUCCESS\n"
    matches.write(str(elem)+"\n")

我对两个数据集使用了相同的format_data函数，因为它们看起来大致相同，但如果您愿意，可以使用多个函数。另请注意，此解决方案仅将两个文件中的一个读入内存。应该懒惰地计算与另一个的交集。

正如评论中所指出的，这并没有试图保留订单。但是，如果您确实需要保留订单，请尝试以下操作：

<snip>
...
</snip>

with open('paths.txt') as pathFile:
    for line in pathFile:
        if format_line(line) in dirStuff:
           print "SUCCESS\n"
           #...

Answer 2

我必须看到更多您的数据集，以了解您没有获得匹配的原因。我已经将你的一些代码重构为 pythonic 。

dirFile = open('listac.txt','r')
pathFile = open('paths.txt','r')
paths = pathFile.readlines()
dirs = dirFile.readlines()

matches = open('temp.txt','w')

for pline in paths:
    p = pline.rstrip('\n').split(".")[0].replace(" ", "")
    for dline in dirs:
        dd = dline.rstrip('\n').replace(" ", "")
        #print p.lower()
        #print dd.lower()
        if p.lower() == dd.lower():
            print "SUCCESS\n"
            matches.write(str(p).lower() + '\n')

无法比较Python中的字符串

2 个答案: