我有一个约25张纸的文件,每张纸包含5-30列,标题为系统名称。我想遍历约170个系统的列表(该列表位于主文件中的一张纸上),并且每个系统都在每个选项卡中搜索以匹配系统为标题的列。我有下面的代码,它在第一次迭代中效果很好,但是由于某种原因,它在遍历所有工作表并进入第二个系统后,会拉动工作表名称而不是第二个系统名称。有人看到我在做什么错吗?
import pandas as pd
matrix = pd.ExcelFile('file')
names_tab = pd.read_excel(matrix, sheet_name='Name_Test')
sheets_list = {}
for (y, sysRows) in names_tab.iterrows():
print(sysRows['header'])
for sheets in matrix.sheet_names[1:]:
sheets_list['{}'.format(sheets)] = pd.read_excel(matrix, sheet_name='{}'.format(sheets), skiprows=2)
print(sheets)
for column in sheets_list[sheets]:
if column == sysRows['header']:
for idx, row in sheets_list[sheets][column].iteritems():
if sheets_list[sheets].iloc[idx][column] == 'x':
print('{} has X in row {} column {} on sheet {}'
.format(sysRows['header'], idx, column, sheets))
elif sheets_list[sheets].iloc[idx][column] == 'X':
print('{} has X in row {} column {} on sheet {}'
.format(sysRows['header'], idx, column, sheets))
print(column + ' works')
else:
print(column + ' doesnt work')
答案 0 :(得分:0)
我不太确定这是否与您要实现的结果相同,但是希望这是一个起点(我怀疑您需要4个for循环):
import pandas as pd
import numpy as np
names_tab = pd.DataFrame({'header':['System1','System2','System3'], 'some_other_column':['foo','bar','foobar']})
sheet1 = pd.DataFrame({'System1':['x','X'], 'System2':['x','X'], 'System4':['X','x']})
sheet2 = pd.DataFrame({'System2':['X','x'], 'System8':['x','x'], 'System3':['x','X']})
sheets = [sheet1, sheet2]
for i, sheet in enumerate(sheets):
print("Sheet", i + 1)
common_columns = list(set(sheet.columns.tolist()).intersection(names_tab['header'].tolist()))
df = sheet[common_columns]
print("Here are all the 'x' values in Sheet", i + 1)
print(df.where(df == 'x'))
# To get your behavior
positions = np.where(df.values == 'x')
for idx, col in positions:
print('{} has x in row {} column {} on sheet {}'.format(df.columns[col], idx, col, str(i+1)))