如何使用python删除文件中的重复行?

时间:2014-08-07 15:39:56

标签: python python-2.7

是否可以从读取模式文件中删除重复的行? 不改变文件名。

请任何人帮助我

2 个答案:

答案 0 :(得分:1)

如果文件在读取模式下打开,则无法追加或编辑,请在“写入”模式下重新打开文件。请参阅此前文章:

How might I remove duplicate lines from a file?

答案 1 :(得分:0)

我之前做过类似的事情。下面的代码可用于搜索带有新字符串的旧字符串的所有实例(在给定路径中的任何文件中)。这可以适用于搜索重复的字符串,而不是搜索发送的特定文本。

以下是现有程序的几个示例用法(来自命令行的示例调用):

示例1 - 如果名为example.txt的文件位于当前工作目录中:

python replace.py "old blerb" "new blerb" example.txt

示例2 - 如果要匹配另一个目录中的任何.sql文件(在本例中为当前用户文档目录):

python replace.py "old syntax" "new syntax" "%userprofile%\documents\*.sql"

Replace.py代码

#!/usr/bin/python

from tempfile import mkstemp
from shutil import move
import sys, os, fnmatch

# globals
debug_on = True

def replace(old_str, new_str, file_path):
    with open(file_path, "r+") as f:
        buf = f.readlines()
        f.seek(0)
        cnt = 0
        new = []
        for line in buf:
            if old_str in line:
                l = line.replace(old_str, new_str)
                cnt += 1
            else:
                l = line
            new.append(l)
        if cnt == 0:
            if debug_on:
                print "  no matches found in this file"
        else:
            f.truncate()
            for line in new:
                f.write(line)
            if debug_on:
                print "  "+str(cnt)+" matches replaced"
        f.close()

def get_files(f_str):
    d, ptrn = os.path.split(f_str)
    files = []
    for f in os.listdir(d):
        fx = os.path.split(f)[1]
        if fnmatch.fnmatch(fx, ptrn):
            if '\\' not in f and '/' not in f:
                new_file = os.path.join(d,f)
            else:
                new_file = f
            files.append(new_file)

    if len(files) == 0:
        print "No files found in this directory matching the pattern:", ptrn
        sys.exit()

    return files

def main():
    # check cmd line args provided...
    args = len(sys.argv) -1
    if args <> 3:
        print "\nUsage: python replace.py <old_str> <new_str> <file_path|pattern>\n"
        print "The file path will assume the current working directory if none " \
              "is given."
        print "Search is case-sensitive\n"
        print "Example 1 - if a file named example.txt is in your cwd:\n" \
              'python replace.py "old blerb" "new blerb" example.txt\n'
        print "Example 2 - if you wanted to match any .sql files in another directory:\n" \
              'python replace.py "old syntax" "new syntax" "%userprofile%\documents\*.sql"'
        raw_input("\n...press any key to exit")
        sys.exit()

    # identify files to be evaluated...
    f_str = sys.argv[3]
    if debug_on:
        print "f_str set to:", f_str

    # append path if required
    if '\\' not in f_str and '/' not in f_str:
        f_str = os.path.join(os.getcwd(),f_str)
        if debug_on:
            print "f_str adjusted to:", f_str

    # build list of files
    if '*' in f_str:
        files = get_files(f_str)
    else:
        files = [f_str]

    # do replacement for each file...
    for f in files:
        if debug_on:
            print "\nAbout to call replace, args:\n  ", sys.argv[1], sys.argv[2], f
        replace(sys.argv[1], sys.argv[2], f)

if __name__ == '__main__':
    main()