我有几家经销商为我提供库存水平的CSV文件。其中一些分销商有多达3个文件和信息。这些文件非常大,有170,000多行数据。
我尝试做的是编写一个程序,允许我将这些数据重新组织成一个新的CSV文件,这样每个经销商都会按照我希望的方式组织一个文件。
对于我在使用代码时所做的一点点而不是太过技术性的话:
step 1
Open file1
for row1 in file1
Grab partnumber from row[1]
step 2
open file2
for row2 in file2
if partnumber == row2[2]
grab data from row[4]
break
然后对我要拉的每一条数据重复步骤2。我看到的问题是程序运行得非常快,直到它进一步进入数据,因为它仍然读取每一行数据,即使我已经从文本行收集了。如果我在完成数据后开始删除数据行,我甚至还想好了,但我想也许还有一个我不知道的替代方案。任何帮助都会很棒。
def PartSearch():
Partexists = "N"
global SelectedObject
with open(eval("file"+str(SelectedFile))) as f:
r2 = csv.reader(f, delimiter = eval("file"+str(SelectedFile)+"Del"))
for row2 in r2:
if int(SelectedFile) == 1:
if str(row2[int(file1PartNumber)]) == str(PartNumberobject):
Partexists= "Y"
SelectedObject = row2[int(SelectedCol)]
break
if int(SelectedFile) == 2:
if row2[int(file2PartNumber)] == PartNumberobject:
Partexists= "Y"
SelectedObject = row2[int(SelectedCol)]
break
if int(SelectedFile) == 3:
if row2[int(file3PartNumber)] == PartNumberobject:
Partexists= "Y"
SelectedObject = row2[int(SelectedCol)]
break
if int(SelectedFile) == 4:
if row2[int(file4PartNumber)] == PartNumberobject:
Partexists= "Y"
SelectedObject = row2[int(SelectedCol)]
break
if Partexists != "Y":
SelectedObject = "X"
with open("C:\\Python34\\Python34\\Distributors\\ListOfDistributors.txt") as f:
r3 = csv.reader(f, delimiter = "\t")
for row3 in r3:
distributor = row3[0]
with open("C:\\Python34\\Python34\\Distributors\\"+distributor+"files.txt") as f:
r4 = csv.reader(f, delimiter = "\t")
totalRows = sum(1 for _ in f)
i = totalRows
if totalRows == 1:
with open("C:\\Python34\\Python34\\Distributors\\"+distributor+"files.txt") as f:
r4 = csv.reader(f, delimiter = "\t")
for row4 in r4:
file1 = row4[1]
file1Del = row4[2]
file1titles = row4[3]
file1titles = row4[3]
file1PartNumber = row4[4]
if totalRows == 2:
with open("C:\\Python34\\Python34\\Distributors\\"+distributor+"files.txt") as f:
r4 = csv.reader(f, delimiter = "\t")
for row4 in r4:
if i == 2:
file1 = row4[1]
file1Del = row4[2]
file1titles = row4[3]
file1titles = row4[3]
file1PartNumber = row4[4]
if i == 1:
file2 = row4[1]
file2Del = row4[2]
file2titles = row4[3]
file2titles = row4[3]
file2PartNumber = row4[4]
i = i-1
if totalRows == 3:
with open("C:\\Python34\\Python34\\Distributors\\"+distributor+"files.txt") as f:
r4 = csv.reader(f, delimiter = "\t")
for row4 in r4:
if i == 3:
file1 = row4[1]
file1Del = row4[2]
file1titles = row4[3]
file1titles = row4[3]
file1PartNumber = row4[4]
if i == 2:
file2 = row4[1]
file2Del = row4[2]
file2titles = row4[3]
file2titles = row4[3]
file2PartNumber = row4[4]
if i == 1:
file3 = row4[1]
file3Del = row4[2]
file3titles = row4[3]
file3titles = row4[3]
file3PartNumber = row4[4]
i = i-1
if totalRows == 4:
with open("C:\\Python34\\Python34\\Distributors\\"+distributor+"files.txt") as f:
r4 = csv.reader(f, delimiter = "\t")
for row4 in r4:
if i == 4:
file1 = row4[1]
file1Del = row4[2]
file1titles = row4[3]
file1titles = row4[3]
file1PartNumber = row4[4]
if i == 3:
file2 = row4[1]
file2Del = row4[2]
file2titles = row4[3]
file2titles = row4[3]
file2PartNumber = row4[4]
if i == 2:
file3 = row4[1]
file3Del = row4[2]
file3titles = row4[3]
file3titles = row4[3]
file3PartNumber = row4[4]
if i == 1:
file4 = row4[1]
file4Del = row4[2]
file4titles = row4[3]
file4titles = row4[3]
file4PartNumber = row4[4]
i = i-1
with open("C:\\Python34\\Python34\\Distributors\\"+distributor+"structure.txt") as f:
r5 = csv.reader(f, delimiter = "\t")
i=1
for row5 in r5:
if i == 1:
DistributorName = row5[0]
PartNumberFile = row5[2]
PartNumberCol = row5[3]
AltPartNumberFile = row5[5]
AltPartNumberCol = row5[6]
VendorPartNumberFile = row5[8]
VendorPartNumberCol = row5[9]
AltVendorPartNumberFile = row5[11]
AltVendorPartNumberCol = row5[12]
DescriptionFile = row5[14]
DescriptionCol = row5[15]
BrandFile = row5[17]
BrandCol = row5[18]
CostFile = row5[20]
CostCol = row5[21]
RetailFile = row5[23]
RetailCol = row5[24]
StatusFile = row5[26]
StatusCol = row5[27]
WeightFile = row5[29]
WeightCol = row5[30]
if i == 2:
if row5[2] == 0:
NumofOnedaywarehouse = row5[2]
if row5[2] == 1:
NumofOnedaywarehouse = row5[2]
Oneday1WarehouseFile = row5[4]
Oneday1WarehouseCol = row5[5]
if row5[2] == 2:
NumofOnedaywarehouse = row5[2]
Oneday1WarehouseFile = row5[4]
Oneday1WarehouseCol = row5[5]
Oneday2WarehouseFile = row5[7]
Oneday2WarehouseCol = row5[8]
if row5[2] == 3:
NumofOnedaywarehouse = row5[2]
Oneday1WarehouseFile = row5[4]
Oneday1WarehouseCol = row5[5]
Oneday2WarehouseFile = row5[7]
Oneday2WarehouseCol = row5[8]
Oneday3WarehouseFile = row5[10]
Oneday3WarehouseCol = row5[11]
if i == 3:
if row5[2] == 0:
NumofTwodaywarehouse = row5[2]
if row5[2] == 1:
NumofTwodaywarehouse = row5[2]
Twoday1WarehouseFile = row5[4]
Twoday1WarehouseCol = row5[5]
if row5[2] == 2:
NumofTwodaywarehouse = row5[2]
Twoday1WarehouseFile = row5[4]
Twoday1WarehouseCol = row5[5]
Twoday2WarehouseFile = row5[7]
Twoday2WarehouseCol = row5[8]
if row5[2] == 3:
NumofTwodaywarehouse = row5[2]
Twoday1WarehouseFile = row5[4]
Twoday1WarehouseCol = row5[5]
Twoday2WarehouseFile = row5[7]
Twoday2WarehouseCol = row5[8]
Twoday3WarehouseFile = row5[10]
Twoday3WarehouseCol = row5[11]
if i == 4:
if row5[2] == 0:
NumofThreedaywarehouse = row5[2]
if row5[2] == 1:
NumofThreedaywarehouse = row5[2]
Threeday1WarehouseFile = row5[4]
Threeday1WarehouseCol = row5[5]
if row5[2] == 2:
NumofThreedaywarehouse = row5[2]
Threeday1WarehouseFile = row5[4]
Threeday1WarehouseCol = row5[5]
Threeday2WarehouseFile = row5[7]
Threeday2WarehouseCol = row5[8]
if row5[2] == 3:
NumofThreedaywarehouse = row5[2]
Threeday1WarehouseFile = row5[4]
Threeday1WarehouseCol = row5[5]
Threeday2WarehouseFile = row5[7]
Threeday2WarehouseCol = row5[8]
Threeday3WarehouseFile = row5[10]
Threeday3WarehouseCol = row5[11]
if i == 5:
if row5[2] == 0:
NumofFourdaywarehouse = row5[2]
if row5[2] == 1:
NumofFourdaywarehouse = row5[2]
Fourday1WarehouseFile = row5[4]
Threeday1WarehouseCol = row5[5]
if row5[2] == 2:
NumofFourdaywarehouse = row5[2]
Fourday1WarehouseFile = row5[4]
Fourday1WarehouseCol = row5[5]
Fourday2WarehouseFile = row5[7]
Fourday2WarehouseCol = row5[8]
if row5[2] == 3:
NumofFourdaywarehouse = row5[2]
Fourday1WarehouseFile = row5[4]
Fourday1WarehouseCol = row5[5]
Fourday2WarehouseFile = row5[7]
Fourday2WarehouseCol = row5[8]
Fourday3WarehouseFile = row5[10]
Fourday3WarehouseCol = row5[11]
if i == 6:
if row5[2] == 0:
NumofFivedaywarehouse = row5[2]
if row5[2] == 1:
NumofFivedaywarehouse = row5[2]
Fiveday1WarehouseFile = row5[4]
Fiveday1WarehouseCol = row5[5]
if row5[2] == 2:
NumofFivedaywarehouse = row5[2]
Fiveday1WarehouseFile = row5[4]
Fiveday1WarehouseCol = row5[5]
Fiveday2WarehouseFile = row5[7]
Fiveday2WarehouseCol = row5[8]
if row5[2] == 3:
NumofFivedaywarehouse = row5[2]
Fiveday1WarehouseFile = row5[4]
Fiveday1WarehouseCol = row5[5]
Fiveday2WarehouseFile = row5[7]
Fiveday2WarehouseCol = row5[8]
Fiveday3WarehouseFile = row5[10]
Fiveday3WarehouseCol = row5[11]
i = i+1
"""print(file1Del)
PartNumberFile = 1
PartNumberCol = 1
CostFile = 2
CostCol = 2
SelectedFile = PartNumberFile
SelectedCol = PartNumberCol
number = 1"""
#Program to grab Part Number
with open(file1) as f:
r = csv.reader(f, delimiter = file1Del)
if file1titles == "Y":
file=r.__next__()
for row in r:
PartNumberobject = row[int(file1PartNumber)]
"start of data collection, save variables as SelectedFile and SelectedCol. Run PartSearch() then save Variable SelectedObject"
SelectedFile = PartNumberFile
SelectedCol = PartNumberCol
PartSearch()
FPartNumber = SelectedObject
SelectedFile = AltPartNumberFile
SelectedCol = AltPartNumberCol
PartSearch()
FAltPartNumber = SelectedObject
SelectedFile = VendorPartNumberFile
SelectedCol = VendorPartNumberCol
PartSearch()
FVendorPartNumber = SelectedObject
SelectedFile = AltVendorPartNumberFile
SelectedCol = AltVendorPartNumberCol
PartSearch()
FAltVendorPartNumber = SelectedObject
SelectedFile = DescriptionFile
SelectedCol = DescriptionCol
PartSearch()
FDescription = SelectedObject
SelectedFile = BrandFile
SelectedCol = BrandCol
PartSearch()
FBrand = SelectedObject
SelectedFile = CostFile
SelectedCol = CostCol
PartSearch()
FCost = SelectedObject
SelectedFile = RetailFile
SelectedCol = RetailCol
PartSearch()
FRetail = SelectedObject
SelectedFile = StatusFile
SelectedCol = StatusCol
PartSearch()
FStatus = SelectedObject
SelectedFile = WeightFile
SelectedCol = WeightCol
PartSearch()
FWeight = SelectedObject
print(DistributorName, PartNumberobject, FAltPartNumber, FVendorPartNumber, FAltVendorPartNumber, FDescription, FBrand, FCost, FRetail, FStatus, FWeight)
答案 0 :(得分:1)
如果您的零件文件足够小以适合内存,您可以通过将其加载到字典(高效,快速访问数据结构)来加快速度。当您循环浏览file2
时,您正在寻找row[2] == partnumber
行,然后(大概)使用row[4]
进行搜索,因此使用row[2]
的字典作为key和row[4]
因为该值会使查找非常快:
parts = {}
with [however you open CSV 2] as f:
for row in f:
parts[row[2]] = row[4]
然后,不要每次都重新打开该文件,只需执行:
data = parts[partnumber]
编辑:您还可以采取其他一些措施来改善这些代码:
将True
和False
用于布尔值,而不是字符串" Y"和" N"。
part_exists = False
if some_condition:
part_exists = True
if part_exists:
selected_object = "X" # not clear what this does so I'm not messing with it
当您将数组拆分为变量时,您可以更轻松地执行此操作:
for row4 in r4:
file1, file1Del, file1titles, file1PartNumber = row
你重复了很多代码来处理一行,两行,三行和四行的情况。考虑在这里使用循环和列表。它还可以让你摆脱eval
。
这可能看起来毫无意义,但不重复的代码是很多更容易改进。
答案 1 :(得分:0)
由于您没有真正提供详细信息,我只能回答非常一般。
鉴于您只想使用少量csv文件,并且考虑到它们的大小约为170k行,我假设这些数据适合内存。如果是这种情况,并且你想在python中工作(一个非常合理的选择),我强烈建议你花一些时间来学习大熊猫。 Pandas为您提供了大量选项来处理表格数据,包括强大的过滤或数据库样式的合并和连接操作。
当您提出更具体的问题时,我相信我们可以提供进一步的帮助。