Question

我有两个带空格分隔的电子邮件地址的文本文件 - newalias.txt和origalias.txt。基本上这些是我想要合并在一起的电子邮件别名映射，但在第一个索引中有重复。我想在newalias.txt的第一个索引中使用匹配的行，并将dup放在origalias.txt中。另外，删除完全重复。

OrigAlias:

    sam@example.com sam.smith@example.root.org
    jane@example.com jane.maiden@example.root.org
    bob@example.com robert.johnson@example.root.org

NewAlias:

    sam@example.com samuel.smith@example.root.org
    jane@example.com jane.married@example.root.org
    bob@example.com robert.johnson@example.root.org

Results:

    sam@example.com samuel.smith@example.root.org
    jane@example.com jane.married@example.root.org
    bob@example.com robert.johnson@example.root.org

我最近一直在学习Python，我做了一些有趣的事情，但文本解析对我来说仍然是一个挑战。任何帮助都会非常感激，即使只是指出我正确的方向。我仍然熟悉Python中的选项。

编辑：

我没想到会有这么快的反应，所以我自己解决了这个问题一段时间后想出来了：

# Py 3.4.1
# Instructions:
# Rename current domain mapping export to dmapsOrig.txt
# Rename whitespace delimited customer modifications file to dmapsNew.txt
# Place the two text files and this script in the same directory
# Run the script: 'python dmapsMerge.py'

from datetime import date

OrigDict = {}       # Create empty dictionaries for processing
NewAddDict = {}     #
ResultsDict = {}    #

with open('dmapsOrig.txt', 'r') as file1:       # Populate OrigDict dictionary from dmapsOrig.txt file
    for x in file1:
        if not x.startswith("#"):               # Ignore commented lines
            dmaps = x.split()
            OrigDict[(dmaps[0])] = ''.join(dmaps[1])

with open('dmapsNew.txt', 'r') as file2:        # Populate NewAddDict dictionary from dmapsNew.txt file
    for y in file2:
        if not y.startswith("#"):               # Ignore commented lines
            newdmaps = y.split()
            NewAddDict[(newdmaps[0])] = ''.join(newdmaps[1])

with open('dmapsOrig-formatted-%s.txt' % date.today(), 'wt') as file3:
    file3.write('## Generated on %s' % date.today() + '\n') # Insert date stamp
    for alias in sorted(OrigDict.keys()):
        file3.write(alias + ' ' + OrigDict[alias] + '\n')   # Format original input and write to sorted file

ResultsDict = OrigDict.copy()   # Copy OrigDict dictionary keys and values to ResultsDict Dictionary
ResultsDict.update(NewAddDict)  # Merge new dmaps into original

with open('dmapsResults-%s.txt' % date.today(), 'wt') as file4:
    file4.write('## Generated on %s' % date.today() + '\n')     # Insert date stamp
    for alias in sorted(ResultsDict.keys()):
        file4.write(alias + ' ' + ResultsDict[alias] + '\n')    # Format dictionary output and write to results.txt file

file1.close() # Close open files
file2.close() #
file3.close() #
file4.close() #

Answer 1

with open('origalias.txt') as forig, open('newalias.txt') as fnew, open('results.txt', 'w') as fresult:
    dd = {}
    for fn in (forig, fnew): # first pass will load with original, then overwrite with new
        for ln in fn:
            alias, address = ln.split(' ')
            dd[alias] = address

    # just write out all element in dictionary
    for alias, address in dd.iteritems():
         fresult.write('%s %s\n' % (alias, address))

Answer 2

假设您的文件不是太大，最简单的解决方案是在内存中加载origalias.txt，然后加载newalias.txt（必要时更新现有条目），并转储合并后的数据。

aliases = {}
with open("origalias.txt") as f:
    for line in f:
        key, val = line.strip().split(" ")
        aliases[key] = val
with open("newalias.txt") as f:
    for line in f:
        key, val = line.strip().split(" ")
        aliases[key] = val
with open("mergedalias.txt", "w") as f:
    for key, val in aliases.items():
        f.write("{} {}\n".format(key, val))

上述代码的几个关键：

使用dict aliases可以防止重复，因为为键设置新值会替换旧值。
文件是可迭代的（即可与for一起使用），每次迭代都适用于一行，这在您的方案中很方便。
.strip()删除前导和尾随空格;然后.split（“”）根据空格剪切字符串，这两个组件分别受key和val的影响。
请注意，如果一行包含少于或多于两个以空格分隔的部分，则对key, val的影响将引发异常。请考虑使用.split(" ", 1)来表示更宽容的行为。

希望这有帮助。

Answer 3

# construct a dictionary from orig file
original_dict = dict([tuple(i.split(' ')) for i in open('origalias.txt')])
# create a new dictionary and update the original dictionary(this overwrite new values for same key)
original_dict.update(dict([tuple(i.split(' ')) for i in open('newalias.txt')])))

# now write to new file if you want
fp = open('newfile','w')
for key, value in original_dict.iteritems():
    fp.write('%s %s\n'%(key, value))

如何合并两个或多个文本文件并使用Python删除重复的电子邮件地址？

3 个答案: