我正在遍历XML文档并匹配txt文件中的用户名。
txt看起来像:
DPL bot
Nick Number
White whirlwind
Polisci
Flannel
程序看起来像:
import xmltodict, json
with open('testarticles.xml', encoding='latin-1') as xml_file:
dic_xml = xmltodict.parse(xml_file.read())
for page in dic_xml['mediawiki']['page']:
for rev in page['revision']:
for user in open("usernames.txt", "r"):
print(user)
if 'username' in rev['contributor'] and rev['contributor']['username'] == user:
print(user)
print(rev['timestamp'])
timestamp = rev['timestamp'];
try:
print(rev['comment'])
comment = rev['comment'];
except:
print("no comment")
comment = ''
print('\n')
with open("User data/" + user + ".json", "a") as outfile:
json.dump({"timestamp": timestamp, "comment": comment}, outfile)
outfile.write('\n')
问题是程序只通过文本文件中最后一行的if语句。它在if语句之前打印所有用户的名字。所有用户在XML文件中都有匹配的帖子,并且通过在结束行更改为另一个用户,该用户的数据将被提取到json文件中。
答案 0 :(得分:1)
也许除了最后一行之外的所有行都在最后有一个换行符......
试试这个:
for user in open("usernames.txt", "r"):
user = user.strip()
if 'username' in rev['contributor'] and rev...
或者使用这个结构,所以我们不会讨论你的代码是否像with
语句一样工作:P
with open("usernames.txt", "r") as f:
for line in f:
user = line.strip()
if 'username' in rev['contributor'] and rev...
主要是user = user.strip()
或user = line.strip()
如有疑问,请查看二进制文件。这也适用于所有编码问题,因为编码只是根据某些翻译表/代码页将字和零转换为字符的一种方式。
"\n".encode("hex") == "0a" # True
# so if
user.encode("hex")
# has "0a" at the end, there is definitely a newline after "user"