我有m3u播放列表如下(infilename):
#EXTM3U
#EXTINF:0, name 1
link 1
#EXTINF:0, name 2
link 2
#EXTINF:0, name 3
link 1
#EXTINF:0, name 4
link 4
#EXTINF:0, name 5
link 1
#EXTINF:0, name 6
link 6
.......
.......
这是我喜欢的输出(outfilename):
#EXTM3U
#EXTINF:0, name 1
link 1
#EXTINF:0, name 2
link 2
#EXTINF:0, name 4
link 4
#EXTINF:0, name 6
link 6
.......
.......
每个项目包含两行:名称为1行,链接为1行。如果项目具有相同的链接,则也认为它们是重复的。我使用set()来删除这些重复项,但它只删除了它们的链接并保持其名称不变。如何删除整个重复项?
这是我用过的代码(来自互联网)
infilename = path to infilename
outfilename = path to outfilename
lines_seen = set() # holds lines already seen
outfile = open(outfilename, "w")
for line in open(infilename, "r"):
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)
outfile.close()
提前感谢您的帮助。
答案 0 :(得分:0)
您的解决方案的问题如下:无论链接是否重复,您都将始终写入名称行,因为名称行是文件中的第一个。解决这个问题的一种方法(可能不是最优雅,但它有效):
lines_seen = set() # holds lines already seen
outfile = open(outfilename, "w")
# Flag to keep track if the next line is supposed to be a link
nextLineLink=False
for line in open(infilename, "r"):
# Check if the current line is a name line
if line.startswith("#EXTINF"):
info=line
nextLineLink=True
continue
# If we encounter an empty line, set back flag
# Otherwise we would write the last info too
if line.strip()=="":
nextLineLink=False
continue
# Check if this line is supposed to be a link
if nextLineLink:
# Check if we already seen the line
if line not in lines_seen: # not a duplicate
# Write both lines and add
outfile.write(info)
outfile.write(line)
lines_seen.add(line)
# Set back flag
nextLineLink=False
outfile.close()
希望对你有用。