Question

我正在遍历XML文档并匹配txt文件中的用户名。

txt看起来像：

DPL bot
Nick Number
White whirlwind
Polisci
Flannel

程序看起来像：

    import xmltodict, json

with open('testarticles.xml', encoding='latin-1') as xml_file:
    dic_xml = xmltodict.parse(xml_file.read())
    for page in dic_xml['mediawiki']['page']:
        for rev in  page['revision']:
            for user in open("usernames.txt", "r"):
                print(user)

                if 'username' in rev['contributor'] and rev['contributor']['username'] == user:
                    print(user)
                    print(rev['timestamp'])
                    timestamp = rev['timestamp'];

                    try:
                        print(rev['comment'])
                        comment = rev['comment'];
                    except:
                        print("no comment")
                        comment = ''

                    print('\n')
                    with open("User data/" + user + ".json", "a") as outfile:
                        json.dump({"timestamp": timestamp, "comment": comment}, outfile)
                        outfile.write('\n')

问题是程序只通过文本文件中最后一行的if语句。它在if语句之前打印所有用户的名字。所有用户在XML文件中都有匹配的帖子，并且通过在结束行更改为另一个用户，该用户的数据将被提取到json文件中。

Answer 1

也许除了最后一行之外的所有行都在最后有一个换行符......

试试这个：

for user in open("usernames.txt", "r"):
    user = user.strip()
    if 'username' in rev['contributor'] and rev...

或者使用这个结构，所以我们不会讨论你的代码是否像with语句一样工作：P

with open("usernames.txt", "r") as f:
    for line in f:
        user = line.strip()
        if 'username' in rev['contributor'] and rev...

主要是user = user.strip()或user = line.strip()

如有疑问，请查看二进制文件。这也适用于所有编码问题，因为编码只是根据某些翻译表/代码页将字和零转换为字符的一种方式。

"\n".encode("hex") == "0a" # True
# so if
user.encode("hex") 
# has "0a" at the end, there is definitely a newline after "user"

For循环只迭代Python中的最后一个元素

1 个答案: