Question

我有一个python程序，它读取excel文档。我只需要允许某些列组合的第一次出现。例如：

{%if sub_course_grade.grade is None%}
<script>
        window.onload = function(){
                reason_field = document.getElementById("id_reason");
                reason_field.parentElement.style.display = "none";
        }
</script>
{%else%}
<script>
        window.onload = function(){
                reason_field = document.getElementById("id_reason");
                reason_field.setAttribute("required", "");
        }
</script>
{%endif%}

我想删除/跳过复制找到的第三行并将其写入CSV文件。这是我到目前为止所尝试的功能。但它没有用。

    A     |  B
  -------------
  1.  200 | 201   
  2.  200 | 202
  3.  200 | 201
  4.  200 | 203
  5.  201 | 201
  6.  201 | 202
  .............

Answer 1

mylist = []使用了两次，分配单个值会使其变得困难。应该是这样的：

mylist = []
for row in range(1, number_of_rows):  
    mylist.append((sheet.cell_value(row, 0), sheet.cell_value(row, 1)))

myset = set(mylist)

请注意，set未订购。如果您想按顺序查看结果，请同时检查this。

Answer 2

它对我有用：在python 2.7中

def validateExcel(filename):
   xls=xlrd.open_workbook(filename)  
   setcount = 0
   column = 0
   count = 0
   # sheetcount = 0
   for sheet in xls.sheets():
       header=""
       # sheetcount = sheetcount + 1
       number_of_rows = sheet.nrows
       number_of_columns = sheet.ncols
       sheetname = sheet.name          
       mylist = []
       for row in range(1, number_of_rows):  
            mylist.append((sheet.cell_value(row, 0), sheet.cell_value(row, 1)))
       myset = sorted(set(mylist), key=mylist.index)
       return myset

Answer 3

这是我的解决方案。删除重复项并创建一个没有重复项的新文件。

double approx(vector<Point> const& pts)

Answer 4

这应该将行（在本例中称为子列表）附加到mylist列表中（如果尚未放入）。这应该按照在xlsx文件中找到的顺序为您提供重复数据删除的列表。如果可以，可能值得一看pandas库。如果没有，这应该有所帮助：

def validateExcel(filename):

    xls=xlrd.open_workbook(filename)  

    for sheet in xls.sheets():
        header=""

        number_of_rows = sheet.nrows
        number_of_columns = sheet.ncols
        sheetname = sheet.name          

        mylist = []

        for row in range (1, number_of_rows):  
            sublist = [sheet.cell_value(row, col) for col in range(0, number_of_cols)]

            if sublist not in mylist:
                mylist.append(sublist)

            print mylist

     return mylist

编辑：

如果您有一个包含多个工作表的xlsx文件，您可以使用dict存储重复数据删除的行数据，并将工作表名称作为键，然后将该dict传递给csv写入函数：< / p>

def validateExcel(filename):

    outputDict = {}

    xls=xlrd.open_workbook(filename)  

    sheetCount = 0

    for sheet in xls.sheets():

        number_of_rows = sheet.nrows
        number_of_columns = sheet.ncols

        sheetname = sheet.name          

        if not sheetname:
            sheetname = str(sheetCount)

        outputDict[str(sheetCount)] = []

        for row in range (1, number_of_rows):  
            sublist = [sheet.cell_value(row, col) for col in in range(0,number_of_cols)]

            if sublist not in outputDict[sheetname]:
                outputDict[sheetname].append(sublist)

            print outputDict[sheetname]

         sheetCount += 1

     return outputDict

# will go through the generated dictionary and write the data to csv files
def writeToFiles(generatedDictionary):

    for key generatedDictionary:
        with open(key + ".csv") as csvFile:
            writer = csv.writer(csvFile)
            writer.writerows(generatedDictionary[key])

如果你可以使用pandas，这样的东西可以起作用：

import pandas as pd

df = pd.read_excel(filename)

for name in df.sheetnames:

    sheetDataFrame = df.parse(name)
    filtered = sheetDataFrame.drop_duplicates()

    filtered.to_csv(name + ".csv")

使用python删除excel中具有特定列组合的重复行

4 个答案: