我有一个带有一列列的CSV。我只想要三列。 我将其导入到我的python脚本中,并将三列变成了三个列表
然后将每个列表添加到字典中。列表1是键,其他列表是两个值。 (也许有更好的方法吗?)
key is a transaction id
value1 is a filename
value2 is a date
最后这是什么?
我想对所有键重复该操作,直到我本质上只获得列表中最新项目的ID(键)。文件名x可以有5个重复的文件名,文件名y可以有100个重复项,文件名t可以有30个重复项,等等。
我正在使用API实际移动数据,这就是为什么我需要获取最新的ID并将此ID在此外部系统中移动到“ x”,并将所有重复项移动到“ y”的原因。
这是我在构建字典时要考虑的内容(假设其构建顺序正确),但我真的不知道从这里可以去哪里
import csv
def readcsv(filename, column):
file = open(filename, "rU")
reader = csv.reader(file, delimiter=",")
list = []
for row in reader:
list.append(row[(column)])
file.close()
return list
def makeDict(id, fileName, detDate):
iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
return (iList)
id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))
mainDict = makeDict((id), (fileName), (detDate))
样本数据(将各列提取到更简单的表格中进行测试)
Date fileURL ID
7/24/2018 16:04 https://localhost/file1.docx 2599302
7/24/2018 16:03 https://localhost/file3.docx 2349302
7/24/2018 16:01 https://localhost/file1.docx 2599302
7/24/2018 16:04 https://localhost/fil232.xml 2599303
7/24/2018 16:03 https://localhost/file1.docx 2349333
7/24/2018 16:01 https://localhost/file3.docx 2529374
更新: 使用下面的答案,这就是我最终使之起作用的原因:
import csv
def readcsv(filename, column):
file = open(filename, "rU")
reader = csv.reader(file, delimiter=",")
list = []
for row in reader:
list.append(row[(column)])
file.close()
return list
def makeDict(id, fileName, detDate):
iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
return (iList)
## Group Keys by like file names ##
def groupKeys(mainDict):
same_filename = {}
for key, line in (mainDict).items():
name, date = line
if name not in same_filename:
same_filename[name] = [key]
else:
same_filename[name].append( key )
return(same_filename)
########################################### Get latest ID ##################
def getLatestID(same_filename, mainDict):
## for each file
for k in (same_filename.keys()):
curDate = 0
curID = 0
## get each id value (aka matching ids holding same file)
for v in (same_filename.get((k))):
moveDupeList.append(v) ## add to a list of dupes
## if current id's date is equal to the highest found so far - note:date already set since its same
if ((mainDict.get((v)))[1]) == (curDate):
## check which id is highest and set curId if new high found
if (v) > (curId):
curId = (v)
## else if date of current is greater than greatest found so far set new highest date and id
elif ((mainDict.get((v)))[1]) > (curDate):
curDate = ((mainDict.get((v)))[1])
curId = (v)
if (curId) in moveDupeList:
moveDupeList.remove((curId)) #remove latest from dupe list
moveProperList.append((curId)) #add latest to proper list
########################################### Get latest ID ##################
id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))
mainDict = makeDict((id), (fileName), (detDate))
same_filename = (groupKeys(mainDict))
getLatestID((same_filename), (mainDict))
答案 0 :(得分:0)
一个起点可能是构建另一个字典,为每个文件名提供所有对应键(id)的列表:
DllImport
这是您的第一点。