Question

我正在编写一个Python脚本，以查找电话号码和标题的CSV列表中的重复条目。以下是CSV文件的格式：

920.105，George Mueller
920.105，George Mueller
920.105，George Mueller
327.373，给加拉太书和以弗所书的信
327.371，加拉太书和以弗所书289，现代语言   运动
288.01，基督教的诱惑
288.003，了解邪教和新宗教
288.002，了解邪教和新宗教作者286.061，＆＃34;历史   浸礼会教友，A＆＃34;
286.044，＆＃34;浸信会的历史，A＆＃34;
286.003，浸信会历史的这一天3
286.003，这一天在   浸信会历史3
286.003，浸信会历史3日

我需要做的是找到所有具有不同标题的重复电话号码。所以我不关心大多数条目，因为它们是同一本书的副本。我正在寻找给出相同电话号码的不同书籍。我的脚本将完成没有错误，但是当我打开文件时，脚本创建它是空的。
这是我的代码：

#!/usr/bin/python3

import csv


def readerObject(csvFileName):
    """
    Opens and returns a reader object.
    """
    libFile = open(csvFileName)
    libReader = csv.reader(libFile)
    libData = list(libReader)
    return libData


def main():

    # Initialize the state variable
    state = 0

    # Prompt the user for the CSV file name
    fileName = input('Enter the CSV file to be read (Please use the full path): \n')
    # Open readerObject and copy its contents into a list
    csvToList = readerObject(fileName)
    loopList1 = list(csvToList)

    # Create writer object to... Write to
    fileToWrite = input('Enter the name of the file to write to: \n')
    libOutputFile = open(fileToWrite, 'w', newline='')
    libOutputWriter = csv.writer(libOutputFile)

    # Loop 1:
    for a in range(len(loopList1)):
        if state == 1:
            libOutputWriter.writerow(loopList2[0])
            del loopList1[0]
        loopList2 = list(csvToList)
        state = 0
        # Loop 2:
        for b in range(len(loopList2)):
            if loopList2[0][0] == loopList2[1][0]:
                if loopList2[0][1] != loopList2[1][1]:
                    libOutputWriter.writerow(loopList2[1])
                    del loopList2[1]
                    state = 1

    libOutputFile.close()

if __name__ == "__main__":
    main()

提前致谢！

Answer 1

如果您的输入按图书编号排序，则可以使用itertools.groupby：

import csv
from io import StringIO
from itertools import groupby

text = '''920.105,George Mueller
920.105,George Mueller
920.105,George Mueller 1
327.373,The Letters to the Galatians and Ephesians
327.371,Galatians and Ephesians
289,The Modern Tongues Movement
288.01,The Seduction of Christianity
288.003,Understanding Cults and New Religions
288.002,Understanding Cults and New Religions
286.061,"History of the Baptists, A"
286.044,"History of the Baptists, A"
286.003,This Day in Baptist History 1
286.003,This Day in Baptist History 2
286.003,This Day in Baptist History 3'''

with StringIO(text) as in_file, StringIO() as out_file:
    reader = csv.reader(in_file)
    writer = csv.writer(out_file)

    for number, group in groupby(reader, key=lambda x: x[0]):

        titles = set(item[1] for item in group)
        if len(titles) != 1:
            writer.writerow((number, *titles))

    print(out_file.getvalue())

将输出

920.105,George Mueller 1,George Mueller
286.003,This Day in Baptist History 2,This Day in Baptist History 3,This Day in Baptist History 1

请注意，我必须更改您的输入，因为这不会产生任何输出......

为了使用您需要将with StringIO(text) as file:替换为类似with open('infile.txt', 'r') as file的程序，以便程序读取您的实际文件（类似于输出文件open('outfile.txt', 'w') ）。

再次：如果你的输入按数字排序，这将仅。

Answer 2

这是基于@hiro protaginist的answer，但它允许未分类的副本。

var user = new User();

与上述答案一样，将import csv from io import StringIO from itertools import groupby from collections import defaultdict text = '''286.003,This Day in Baptist History 1 920.105,George Mueller 327.373,The Letters to the Galatians and Ephesians 327.371,Galatians and Ephesians 920.105,George Mueller 1 289,The Modern Tongues Movement 288.01,The Seduction of Christianity 920.105,George Mueller 288.003,Understanding Cults and New Religions 288.002,Understanding Cults and New Religions 286.061,"History of the Baptists, A" 286.044,"History of the Baptists, A" 286.003,This Day in Baptist History 2 286.003,This Day in Baptist History 3''' with StringIO(text) as in_file, StringIO() as out_file: reader = csv.reader(in_file) writer = csv.writer(out_file) grouped = defaultdict(set) # Maps call_numbers to a set of all book_titles under that number for entry in reader: grouped[entry[0]].add(entry[1]) for call_number, titles in grouped.items(): if len(titles) > 1: for title in titles: writer.writerow((call_number, title)) print(out_file.getvalue()) # Remove this line if actually writing to a file替换为StringIO(text)，将open(filename)替换为StringIO()。

用于图书馆预订电话号码的Python排序脚本（CSV文件）

2 个答案: