我想拆分/分隔csv列范围内给定的值,为该范围内的每个数字添加新数据,同时保持所有其他列的数据匹配。
重要的是,我能够为(xy)范围内的任何数字维护其他列(Job ID)的数据,因此写入的结果csv显然将比原始数据长得多。
我希望输出的csv代表26-29、66-67等范围内每个数字的单独列。所以我想要一个输出的csv文件,例如:
职位ID 21879被代表4次,分别代表26、27、28和29。
我想在为脚本编写以下步骤之前先执行此操作,但此刻会陷入困境。
脚本的其余部分将日期值(/)分割,将它们分配给新行,并将它们与页码字段连接在一起。这是我要在显示范围内拆分的页码字段。
此脚本的结果列表仅从Job ID列中输出所需的值,并在第二个中显示连接的日期和页面字段。这部分工作正常,这是我需要将每个数字表示为给定范围内的单个数字的最后一个csv文件。
感谢帮助您拆分这些值范围并维护其他数据字段。
我的输入数据的一个子集如下:
Job ID Job summary Link Locality Received Job status Asset Date Page No
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 26-29
21878 Addition Documents Link CBD 28/06/2018 Completed Water
21877 Addition Documents Link CBD 28/06/2018 Completed Water
21876 Addition Documents Link CBD 28/06/2018 Completed Water
21875 Addition Documents Link CBD 28/06/2018 Completed Water
21874 Addition Documents Link CBD 28/06/2018 Completed Water 26/07/2018 42-43
21873 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018
21872 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 66-67
21871 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 07-08
21870 Addition Documents Link CBD 27/06/2018 Completed Water 28/06/2018 59
21869 Addition Documents Link CBD 27/06/2018 Completed Water 28/06/2018 58
21868 Addition Documents Link CBD 26/06/2018 Completed Water
21867 Addition Documents Link CBD 26/06/2018 Completed Water
我想要的输出是:
Job ID Job summary Link Locality Received Job status Asset Date Page No
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 26
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 27
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 28
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 29
21878 Addition Documents Link CBD 28/06/2018 Completed Water
21877 Addition Documents Link CBD 28/06/2018 Completed Water
21876 Addition Documents Link CBD 28/06/2018 Completed Water
21875 Addition Documents Link CBD 28/06/2018 Completed Water
21874 Addition Documents Link CBD 28/06/2018 Completed Water 26/07/2018 42
21874 Addition Documents Link CBD 28/06/2018 Completed Water 26/07/2018 43
21873 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018
21872 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 66
21872 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 67
21871 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 07
21871 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 08
21870 Addition Documents Link CBD 27/06/2018 Completed Water 28/06/2018 59
21869 Addition Documents Link CBD 27/06/2018 Completed Water 28/06/2018 58
21868 Addition Documents Link CBD 26/06/2018 Completed Water
21867 Addition Documents Link CBD 26/06/2018 Completed Water
当前脚本为:
import os
import csv
with open('CSV_File.csv','r') as csvinput:
with open('temp__spreadsheet_cache_1.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[7] == "Date":
writer.writerow(row+["day"])
else:
writer.writerow(row+row[4].split('/'))
with open('temp__spreadsheet_cache_1.csv','r') as csvinput:
with open('temp__spreadsheet_cache_2.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[7] == "Date":
writer.writerow(row+["month"])
else:
writer.writerow(row+row[4].split('/'))
with open('temp__spreadsheet_cache_2.csv','r') as csvinput:
with open('temp__spreadsheet_cache_3.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[7] == "Date":
writer.writerow(row+["year"])
else:
writer.writerow(row+row[4].split('/'))
with open('temp__spreadsheet_cache_3.csv','r') as csvinput:
with open('temp__spreadsheet_cache_4.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[7] == "Date":
writer.writerow(row+["Concatenation"])
else:
writer.writerow(row+row[4].split('/'))
#---Using Current output (temp__spreadsheet_cache_4.csv) to create new list--
blank =[]
with open (r'temp__spreadsheet_cache_4.csv', 'r') as NEW_CSV:
csvReader = csv.reader(NEW_CSV, delimiter=',', quotechar='"')
header = csvReader.next()
JobIndex = header.index("Job ID")
PageIndex = header.index("Page No")
DayIndex = header.index("day")
MonthIndex = header.index("month")
YearIndex = header.index("year")
Summary = header.index("Job summary")
StatusIndex = header.index("Job status")
class_1 = header.index("Asset")
for row in csvReader:
Page = row[PageIndex]
Day = row[DayIndex]
Month = row[MonthIndex]
Year = row[YearIndex]
JobID = row[JobIndex]
To_be_overridden_concat = row[PageIndex]
Type = row[Summary]
Status = row[StatusIndex]
waterclass = row[class_1]
if waterclass == 'Water'
blank.append([JobID,Day,Month,Year,Page,To_be_overridden_concat])
str(blank)
for column in blank:
column[1] = column[1].lstrip('0')
column[2] = column[2].lstrip('0')
column[3] = column[3].lstrip('0')
column[4] = column[4].lstrip('0')
for column in blank:
column[0] = column[0].lstrip()
column[1] = column[1].lstrip()
column[2] = column[2].lstrip()
column[3] = column[3].lstrip()
column[4] = column[4].lstrip()
for column in blank:
column[0] = column[0].rstrip()
column[1] = column[1].rstrip()
column[2] = column[2].rstrip()
column[3] = column[3].rstrip()
column[4] = column[4].rstrip()
column[5] = column[1]+column[2]+column[3]+column[4]
##os.remove("temp__spreadsheet_cache_4.csv")
os.remove("temp__spreadsheet_cache_3.csv")
os.remove("temp__spreadsheet_cache_2.csv")
os.remove("temp__spreadsheet_cache_1.csv")
for row in blank:
del row[1:5]
print blank[0:10]
答案 0 :(得分:0)
首先,我需要假设您有一个标准CSV文件,其中用逗号分隔了各个字段,例如:
Job ID,Job summary,Link,Locality,Received,Job status,Asset,Date,Page No
21879,Addition,Documents,Link,CBD,15/06/2018,Completed,Water,28/06/2018,26-29
21878,Addition,Documents,Link,CBD,28/06/2018,Completed,Water,,
21874,Addition,Documents,Link,CBD,28/06/2018,Completed,Water,26/07/2018,42-43
21873,Addition,Documents,Link,CBD,27/06/2018,Completed,Water,26/07/2018,1
在这种情况下,您的数据可以按以下方式修复:
from datetime import datetime
import csv
fieldnames = ["Job ID", "Job summary", "Link", "Locality", "ReceivedDay", "ReceivedMonth", "ReceivedYear", "Job status", "Asset", "Day", "Month", "Year", "Page No"]
with open("CSV_File.csv", "rb") as f_input, open("output.csv", "wb") as f_output:
csv_input = csv.reader(f_input)
next(csv_input) # skip the header
csv_output = csv.writer(f_output)
csv_output.writerow(fieldnames)
for row in csv_input:
date_received = row[5].split('/')
if len(row[8]):
date = row[8].split('/')
else:
date = ["", "", ""]
if row[9].find('-') != -1:
pages = map(int, row[9].split("-"))
for page in range(pages[0], pages[1] + 1):
output_row = row[:5] + date_received + row[6:8] + date + [page]
csv_output.writerow(output_row)
else:
output_row = row[:5] + date_received + row[6:8] + date + [row[9]]
csv_output.writerow(output_row)
这将为您提供一个开始的输出文件:
Job ID,Job summary,Link,Locality,ReceivedDay,ReceivedMonth,ReceivedYear,Job status,Asset,Day,Month,Year,Page No
21879,Addition,Documents,Link,CBD,15,06,2018,Completed,Water,28,06,2018,26
21879,Addition,Documents,Link,CBD,15,06,2018,Completed,Water,28,06,2018,27
21879,Addition,Documents,Link,CBD,15,06,2018,Completed,Water,28,06,2018,28
21879,Addition,Documents,Link,CBD,15,06,2018,Completed,Water,28,06,2018,29
21878,Addition,Documents,Link,CBD,28,06,2018,Completed,Water,,,,
21877,Addition,Documents,Link,CBD,28,06,2018,Completed,Water,,,,
21876,Addition,Documents,Link,CBD,28,06,2018,Completed,Water,,,,
21875,Addition,Documents,Link,CBD,28,06,2018,Completed,Water,,,,
21874,Addition,Documents,Link,CBD,28,06,2018,Completed,Water,26,07,2018,42
21874,Addition,Documents,Link,CBD,28,06,2018,Completed,Water,26,07,2018,43
通过首先跳过输入标头并编写合适的输出标头来工作。假定收到的日期始终存在。 split('/')
用于将日期分为三部分。如果页码包含-
符号,则使用split('-')
来获取这两个部分,然后将其转换为两个整数。
通过将输入行的一部分与两个日期部分组合在一起来创建输出行。