熊猫中的嵌套迭代

时间:2018-08-30 23:26:38

标签: python pandas

我有一个约25张纸的文件,每张纸包含5-30列,标题为系统名称。我想遍历约170个系统的列表(该列表位于主文件中的一张纸上),并且每个系统都在每个选项卡中搜索以匹配系统为标题的列。我有下面的代码,它在第一次迭代中效果很好,但是由于某种原因,它在遍历所有工作表并进入第二个系统后,会拉动工作表名称而不是第二个系统名称。有人看到我在做什么错吗?

import pandas as pd

matrix = pd.ExcelFile('file')
names_tab = pd.read_excel(matrix, sheet_name='Name_Test')

sheets_list = {}

for (y, sysRows) in names_tab.iterrows():
    print(sysRows['header'])

    for sheets in matrix.sheet_names[1:]:
        sheets_list['{}'.format(sheets)] = pd.read_excel(matrix, sheet_name='{}'.format(sheets), skiprows=2)
        print(sheets)

        for column in sheets_list[sheets]:

            if column == sysRows['header']:
                for idx, row in sheets_list[sheets][column].iteritems():
                    if sheets_list[sheets].iloc[idx][column] == 'x':
                        print('{} has X in row {} column {} on sheet {}'
                              .format(sysRows['header'], idx, column, sheets))
                    elif sheets_list[sheets].iloc[idx][column] == 'X':
                        print('{} has X in row {} column {} on sheet {}'
                              .format(sysRows['header'], idx, column, sheets))
                print(column + ' works')
            else:
                print(column + ' doesnt work')

1 个答案:

答案 0 :(得分:0)

我不太确定这是否与您要实现的结果相同,但是希望这是一个起点(我怀疑您需要4个for循环):

import pandas as pd
import numpy as np

names_tab = pd.DataFrame({'header':['System1','System2','System3'], 'some_other_column':['foo','bar','foobar']})
sheet1 = pd.DataFrame({'System1':['x','X'], 'System2':['x','X'], 'System4':['X','x']})
sheet2 = pd.DataFrame({'System2':['X','x'], 'System8':['x','x'], 'System3':['x','X']})
sheets = [sheet1, sheet2]
for i, sheet in enumerate(sheets):
    print("Sheet", i + 1)
    common_columns = list(set(sheet.columns.tolist()).intersection(names_tab['header'].tolist()))
    df = sheet[common_columns]
    print("Here are all the 'x' values in Sheet", i + 1)
    print(df.where(df == 'x'))
    # To get your behavior
    positions = np.where(df.values == 'x')
    for idx, col in positions:
        print('{} has x in row {} column {} on sheet {}'.format(df.columns[col], idx, col, str(i+1)))

也许您可以提供Minimal, Complete, and Verifiable example