Question

我有一个带有一列列的CSV。我只想要三列。我将其导入到我的python脚本中，并将三列变成了三个列表

然后将每个列表添加到字典中。列表1是键，其他列表是两个值。（也许有更好的方法吗？）

key is a transaction id
value1 is a filename
value2 is a date

最后这是什么？

遍历字典并找到所有重复的文件名（会有多组重复项）
对于每组重复的文件名，找到一个具有最新（最近）日期值的id（密钥）（如果时间和日期相同，则为最高id（密钥））
打印最新日期的密钥（我只需要输入id）
为每个其他重复项打印“这是一个重复项” +（键）（同样，只需输入每个ID）

我想对所有键重复该操作，直到我本质上只获得列表中最新项目的ID（键）。文件名x可以有5个重复的文件名，文件名y可以有100个重复项，文件名t可以有30个重复项，等等。

我正在使用API实际移动数据，这就是为什么我需要获取最新的ID并将此ID在此外部系统中移动到“ x”，并将所有重复项移动到“ y”的原因。

这是我在构建字典时要考虑的内容（假设其构建顺序正确），但我真的不知道从这里可以去哪里

import csv

def readcsv(filename, column):
    file = open(filename, "rU")
    reader = csv.reader(file, delimiter=",")
    list = []
    for row in reader:
         list.append(row[(column)])
    file.close()
    return list

def makeDict(id, fileName, detDate):
        iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
        return (iList)

id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))

mainDict = makeDict((id), (fileName), (detDate))

样本数据（将各列提取到更简单的表格中进行测试）

Date    fileURL ID
7/24/2018 16:04 https://localhost/file1.docx    2599302
7/24/2018 16:03 https://localhost/file3.docx    2349302
7/24/2018 16:01 https://localhost/file1.docx    2599302
7/24/2018 16:04 https://localhost/fil232.xml    2599303
7/24/2018 16:03 https://localhost/file1.docx    2349333
7/24/2018 16:01 https://localhost/file3.docx    2529374

更新：使用下面的答案，这就是我最终使之起作用的原因：

import csv

def readcsv(filename, column):
    file = open(filename, "rU")
    reader = csv.reader(file, delimiter=",")
    list = []
    for row in reader:
         list.append(row[(column)])
    file.close()
    return list

def makeDict(id, fileName, detDate):
        iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
        return (iList)


## Group Keys by like file names ##
def groupKeys(mainDict):
    same_filename = {}
    for key, line in (mainDict).items():
     name, date = line
     if name not in same_filename:
       same_filename[name] = [key]
     else:
       same_filename[name].append( key )
    return(same_filename)



########################################### Get latest ID ##################
def getLatestID(same_filename, mainDict):
## for each file
    for k in (same_filename.keys()):
     curDate = 0
     curID = 0
 ## get each id value (aka matching ids holding same file)
     for v in (same_filename.get((k))):
      moveDupeList.append(v)   ## add to a list of dupes 

  ## if current id's date is equal to the highest found so far - note:date already set since its same
      if ((mainDict.get((v)))[1]) == (curDate):

    ## check which id is highest and set curId if new high found
       if (v) > (curId):
        curId = (v)

    ## else if date of current is greater than greatest found so far set new highest date and id
      elif ((mainDict.get((v)))[1]) > (curDate):
       curDate = ((mainDict.get((v)))[1])
       curId = (v)
     if (curId) in moveDupeList:
      moveDupeList.remove((curId))   #remove latest from dupe list
     moveProperList.append((curId))  #add latest to proper list
########################################### Get latest ID ##################


id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))

mainDict = makeDict((id), (fileName), (detDate))
same_filename = (groupKeys(mainDict))
getLatestID((same_filename), (mainDict))

Answer 1

一个起点可能是构建另一个字典，为每个文件名提供所有对应键（id）的列表：

DllImport

这是您的第一点。

比较字典中的值，并根据值对每个值进行处理

1 个答案: