将列表中的数据排序为xlsx文件的列

时间:2017-08-09 13:13:55

标签: python python-3.x list pandas openpyxl

我有一个看起来像这样的列表。 (数据来自几个xlsx文件):

[['A B 10', 2, 'A B 10', 3, 1, AC], ['A B 104', 3, 'A B 104', 2, -1, 'AC']]
[['D B 126', 3, 'D B 126', 2, -1, 'EFG 1'], ['D B 15', 3, 'D B 15', 2, -1, 'EFG 1']
[]
[]
[['D B 544', 2, 'D B 544', 1, -1, 'EFG 11'], ['D B 152', 3, 'D B 152', 2, -1, 'EFG 11'], ['D B 682', 3, 'D B 682', 2, -1, 'EFG 11']

我想将此信息放入新的xlsx文件中,但首先我需要将数据排序为行和列。我希望每个子列表中的所有第一个字符串都添加到第一列,第二列中的数字等等。所以列是逗号的位置。我也不希望列表有子列表,所以一切都应该在同一个列表中。像这样:

['A B 10', 2, 'A B 10', 3, 1, AC, 
 'A B 104', 3, 'A B 104', 2, -1, 'AC'
 'D B 126', 3, 'D B 126', 2, -1, 'EFG 1'
 'D B 15', 3, 'D B 15', 2, -1, 'EFG 1'
 'D B 544', 2, 'D B 544', 1, -1, 'EFG 11'
 'D B 152', 3, 'D B 152', 2, -1, 'EFG 11' 
 'D B 682', 3, 'D B 682', 2, -1, 'EFG 11']

这是我的代码走了多远:

import pandas as pd
from openpyxl import Workbook, load_workbook
import glob
from openpyxl.utils.dataframe import dataframe_to_rows

numbers = []
os.chdir(r'C:Myfolder')
files = glob.glob('*.xlsx')
print(files)
for file in files: #Getting the data from xlsx files and to the numbers-list
    df = pd.read_excel(file)
    m = (df.iloc[:,4] - df.iloc[:,1]) != 0
    pos = [0,1,3,4,6,7]
    numbers = (df.loc[m, df.columns[pos]].values.tolist())
    print(numbers)
excel_input = load_workbook(rapp) #Going from using pandas and dataframes to working with openpyxl (Just because I am not that familiar with pandas).
ws = excel_input.active
for r in dataframe_to_rows(df, index=True, header=True):
    ws.append(r)
else:
    pass

col1 = [] #Creating open lists to put the data into columns
col2 = []
col4 = []
col5 = []
col7 = []
col8 = []

mainlist = []
try:
    for row in numbers: #Putting the data into the columns lists
        col1.append(ws.cell(row=row, column=1).value) #This is wrong, and is throwing the error
        col2.append(ws.cell(row=row, column=2).value) #Wrong
        col4.append(ws.cell(row=row, column=4).value) #Wrong
        col5.append(ws.cell(row=row, column=5).value) #Wrong
        col7.append(ws.cell(row=row, column=7).value) #Wrong
        col8.append(ws.cell(row=row, column=8).value) #Wrong
except AttributeError:
    logging.error('Something is wrong')
finally:
    columns = zip(col1, col2, col4, col5, col7, col8) #Zipping the lists
    for column in columns:
        mainlist.append(column)
  

回溯:           col1.append(ws.cell(row = row,column = 1).value)       文件“C:\ Python3 \ lib \ site-packages \ openpyxl \ worksheet \ worksheet.py”,行   306,在单元格中           如果行< 1或列< 1:       TypeError:unorderable类型:list()< INT()

有没有人知道怎么做而不会收到错误?我在代码中评论了错误。

编辑:

通过使用@ stovfl的方法,我能够将一些数据放入xlsx文件中,但由于我的列表是列表列表,因此只有最后一个列表被添加到我的xlsx文件中。 我用了这段代码:

from openpyxl import load_workbook

report = load_workbook(r"C:\Myworkbook.xlsx")
ws = report.create_sheet('My sheet')
for _list in numbers:
    for row_data in _list:
        ws.append([row_data])
        print(row_data)
report.save(r"C\Myworkbook.xlsx")

印刷品:

A B 10
2
A B 10
3
1
AC

xlsx文件中的输出:

Output Myworkbook.xlsx

问题是只添加了最后一个列表,而不是整个列表列表,我希望输出看起来像这样:

How I need it to be Myworkbook.xlsx

3 个答案:

答案 0 :(得分:1)

  

问题:...只将最后一个列表添加到我的xlsx文件

第一个 for loop是您的for file in files: 我在下面更新了我的代码。

  

问题:如何在不收到错误的情况下执行此操作?

  1. 使用openpyxl编写列表列表的解决方案,例如

    newWorkbook = False
    newWorksheet = False
    
    if newWorkbook:
        from openpyxl import Workbook
        wb = Workbook()
        # Select First Worksheet
        ws = wb.worksheets[0]
    else:
        from openpyxl import load_workbook
        wb = load_workbook("mySortData.xlsx")
        if newWorksheet:
            # Create a New Worksheet in this Workbook
            ws = wb.create_chartsheet('Sheet 2')
        else:
            # Select a Worksheet by Name in this Workbook
            ws = wb['Sheet']
    
    for file in files:  # Getting the data from xlsx files and to the numbers-list
        df = pd.read_excel(file)
        m = (df.iloc[:, 4] - df.iloc[:, 1]) != 0
        pos = [0, 1, 3, 4, 6, 7]
        numbers = (df.loc[m, df.columns[pos]].values.tolist())
    
        # numbers is a List[Row Data] of List[Columns]
        # Iterate List of Row Data
        for row_data in numbers:
            ws.append(row_data)
    
    wb.save("mySortData.xlsx")
    
      

    <强>输出
      enter image description here

  2. 解决方案将您的列表列表直接写入CSV,例如:

    # Data == List of List
    data = [[['A B 10', 2, 'A B 10', 3, 1, 'AC'], ['A B 104', 3, 'A B 104', 2, -1, 'AC']],
    [['D B 126', 3, 'D B 126', 2, -1, 'EFG 1'], ['D B 15', 3, 'D B 15', 2, -1, 'EFG 1']],
    [],
    [],
    [['D B 544', 2, 'D B 544', 1, -1, 'EFG 11'], ['D B 152', 3, 'D B 152', 2, -1, 'EFG 11'], ['D B 682', 3, 'D B 682', 2, -1, 'EFG 11']],
    ]
    
    import csv
    # Write to File
    with open('Output.csv', 'w') as csv_file:
        writer = csv.writer(csv_file)
        for _list in data:
            for row_data in _list:
                writer.writerow(row_data)
    
      

    Qutput

    A B 10,2,A B 10,3,1,AC
    A B 104,3,A B 104,2,-1,AC
    D B 126,3,D B 126,2,-1,EFG 1
    D B 15,3,D B 15,2,-1,EFG 1
    D B 544,2,D B 544,1,-1,EFG 11
    D B 152,3,D B 152,2,-1,EFG 11
    D B 682,3,D B 682,2,-1,EFG 11  
    
  3. 使用Python测试:3.4.2 - openpyxl:2.4.1

答案 1 :(得分:0)

我不知道我是否理解得很好,但如果你有一个如下列表:

l = [rows, rows, rows, ...]

为了创建数据框,您可以迭代列表l的每个元素,例如:

df = pd.DataFrame()
for rows in l:
    for row in rows:
        df = pd.concat([df, row])

在您的情况下,它给出了以下输出:

    0       1   2       3   4   5
0   A B 10  2   A B 10  3   1   AC
1   A B 104 3   A B 104 2   -1  AC
0   D B 126 3   D B 126 2   -1  EFG 1
1   D B 15  3   D B 15  2   -1  EFG 1
0   D B 544 2   D B 544 1   -1  EFG 11
1   D B 152 3   D B 152 2   -1  EFG 11
2   D B 682 3   D B 682 2   -1  EFG 11

答案 2 :(得分:0)

看起来您可以很好地将数据输入Excel文件。那么为什么不使用你使用的东西:

col1 = [cell.value for cell in ws['A']]

ws.iter_cols(min_col=x, max_col=y)

如果这不是你想要的,那么请重新提问,指出问题所在。