Question

这是我的功能：

     def prepare_file(time, mkt):
        # renames file to corresponding market name
        global previous_time
        for file in glob.glob(os.getcwd()+'\Reports\*'):
            # if it's the most recently downloaded file
            if time > previous_time:
                previous_time = time
                # remove rows for properties that have not changed status
                sheet = pyexcel.get_sheet(file_name=file)
                for row in sheet:
                    if row[1] in changed_addresses:
                        pass
                    else:
                        del row
                # save file as correct name
                sheet.save_as(
                    os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'
                )
                os.remove(file)

想法是在目录中找到最近下载的文件，打开它，从changed_addresses列表中删除所有不包含地址的行，并将其保存为{{1}中包含的字符串列表。

除了删除行外，一切正常。它正确地遍历它们，并且理解何时应该删除一行，但输出的文件仍然包含应该消失的所有行。

对于这种情况，

mkt是不是正确的命令？

Answer 1

使用csv我认为这应该有效：

import csv
import os
import glob

def prepare_file(time, mkt):
   # renames file to corresponding market name
   global previous_time
   for file in glob.glob(os.getcwd()+'\Reports\*'):
       # if it's the most recently downloaded file
       if time > previous_time:
           previous_time = time
           # remove rows for properties that have not changed status
           fin = open(file, 'r')
           fout = open((os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'), 'w', newline='')
           reader = csv.reader(fin)
           writer = csv.writer(fout)

           for row in reader:
               if row[1] not in changed_addresses:
                   writer.writerow(row)

           # close files
           fin.close()
           fout.close()

           # remove original
           os.remove(file)

首先打开名为file的数据文件，然后使用新名称保存它。

Answer 2

使用pyexcel，您需要使用以下语法：

del sheet.row[index] or del sheet.row[index1, index2, index3]

以下是示例代码：

 def prepare_file(time, mkt):
    # renames file to corresponding market name
    global previous_time
    for file in glob.glob(os.getcwd()+'\Reports\*'):
        # if it's the most recently downloaded file
        if time > previous_time:
            previous_time = time
            # remove rows for properties that have not changed status
            sheet = pyexcel.get_sheet(file_name=file)
            indices_to_be_removed = [] # <-
            for index, row in enumerate(sheet):
                if row[1] in changed_addresses:
                    pass
                else:
                    indices_to_be_removed # <-
            # save file as correct name
            del sheet.row[indices_to_be_removed] # <-
            sheet.save_as(
                os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'
            )
            os.remove(file)

或者，您可以编写一个过滤器，替代方案的优点是它可以处理具有自定义内存占用的HUGE数据文件：

 def filter(file_name, changed_addresses):
     for row in pyexcel.iget_array(file_name=file_name):
         if row[1] in changed_addresses:
             yield row


 def prepare_file(time, mkt):
    # renames file to corresponding market name
    global previous_time
    for file in glob.glob(os.getcwd()+'\Reports\*'):
        # if it's the most recently downloaded file
        if time > previous_time:
            previous_time = time
            # remove rows for properties that have not changed status
            pyexcel.isave_as(array=filter(file, changed_addresses),
                             dest_file_name=os.getcwd() + '\\Reports\\' + mkt[0] + '.csv')
            os.remove(file)

但请记得在代码结束时致电。它将关闭所有csv文件句柄。

 pyexcel.free_resources()

无法从.csv文件pyexcel中删除行

2 个答案: