Python从m3u播放列表中删除重复项

时间:2017-01-19 06:50:38

标签: python-2.7

我有m3u播放列表如下(infilename):

#EXTM3U

#EXTINF:0, name 1
link 1
#EXTINF:0, name 2
link 2
#EXTINF:0, name 3
link 1
#EXTINF:0, name 4
link 4
#EXTINF:0, name 5
link 1
#EXTINF:0, name 6
link 6
.......
.......

这是我喜欢的输出(outfilename):

#EXTM3U

#EXTINF:0, name 1
link 1
#EXTINF:0, name 2
link 2
#EXTINF:0, name 4
link 4
#EXTINF:0, name 6
link 6
.......
.......

每个项目包含两行:名称为1行,链接为1行。如果项目具有相同的链接,则也认为它们是重复的。我使用set()来删除这些重复项,但它只删除了它们的链接并保持其名称不变。如何删除整个重复项?

这是我用过的代码(来自互联网)

infilename = path to infilename
outfilename = path to outfilename

lines_seen = set() # holds lines already seen
outfile = open(outfilename, "w")
for line in open(infilename, "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)
outfile.close()

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

您的解决方案的问题如下:无论链接是否重复,您都将始终写入名称行,因为名称行是文件中的第一个。解决这个问题的一种方法(可能不是最优雅,但它有效):

lines_seen = set() # holds lines already seen
outfile = open(outfilename, "w")

# Flag to keep track if the next line is supposed to be a link
nextLineLink=False 

for line in open(infilename, "r"):

    # Check if the current line is a name line 
    if line.startswith("#EXTINF"):
        info=line
        nextLineLink=True
        continue

    # If we encounter an empty line, set back flag
    # Otherwise we would write the last info too 
    if line.strip()=="":
        nextLineLink=False
        continue

    # Check if this line is supposed to be a link
    if nextLineLink:

        # Check if we already seen the line
        if line not in lines_seen: # not a duplicate

            # Write both lines and add
            outfile.write(info)
            outfile.write(line)
            lines_seen.add(line)

            # Set back flag
            nextLineLink=False


outfile.close()

希望对你有用。