这是我的功能:
def prepare_file(time, mkt):
# renames file to corresponding market name
global previous_time
for file in glob.glob(os.getcwd()+'\Reports\*'):
# if it's the most recently downloaded file
if time > previous_time:
previous_time = time
# remove rows for properties that have not changed status
sheet = pyexcel.get_sheet(file_name=file)
for row in sheet:
if row[1] in changed_addresses:
pass
else:
del row
# save file as correct name
sheet.save_as(
os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'
)
os.remove(file)
想法是在目录中找到最近下载的文件,打开它,从changed_addresses
列表中删除所有不包含地址的行,并将其保存为{{1}中包含的字符串列表。
除了删除行外,一切正常。它正确地遍历它们,并且理解何时应该删除一行,但输出的文件仍然包含应该消失的所有行。
对于这种情况, mkt
是不是正确的命令?
答案 0 :(得分:0)
使用csv
我认为这应该有效:
import csv
import os
import glob
def prepare_file(time, mkt):
# renames file to corresponding market name
global previous_time
for file in glob.glob(os.getcwd()+'\Reports\*'):
# if it's the most recently downloaded file
if time > previous_time:
previous_time = time
# remove rows for properties that have not changed status
fin = open(file, 'r')
fout = open((os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'), 'w', newline='')
reader = csv.reader(fin)
writer = csv.writer(fout)
for row in reader:
if row[1] not in changed_addresses:
writer.writerow(row)
# close files
fin.close()
fout.close()
# remove original
os.remove(file)
首先打开名为file
的数据文件,然后使用新名称保存它。
答案 1 :(得分:0)
使用pyexcel,您需要使用以下语法:
del sheet.row[index] or del sheet.row[index1, index2, index3]
以下是示例代码:
def prepare_file(time, mkt):
# renames file to corresponding market name
global previous_time
for file in glob.glob(os.getcwd()+'\Reports\*'):
# if it's the most recently downloaded file
if time > previous_time:
previous_time = time
# remove rows for properties that have not changed status
sheet = pyexcel.get_sheet(file_name=file)
indices_to_be_removed = [] # <-
for index, row in enumerate(sheet):
if row[1] in changed_addresses:
pass
else:
indices_to_be_removed # <-
# save file as correct name
del sheet.row[indices_to_be_removed] # <-
sheet.save_as(
os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'
)
os.remove(file)
或者,您可以编写一个过滤器,替代方案的优点是它可以处理具有自定义内存占用的HUGE数据文件:
def filter(file_name, changed_addresses):
for row in pyexcel.iget_array(file_name=file_name):
if row[1] in changed_addresses:
yield row
def prepare_file(time, mkt):
# renames file to corresponding market name
global previous_time
for file in glob.glob(os.getcwd()+'\Reports\*'):
# if it's the most recently downloaded file
if time > previous_time:
previous_time = time
# remove rows for properties that have not changed status
pyexcel.isave_as(array=filter(file, changed_addresses),
dest_file_name=os.getcwd() + '\\Reports\\' + mkt[0] + '.csv')
os.remove(file)
但请记得在代码结束时致电。它将关闭所有csv文件句柄。
pyexcel.free_resources()