我正在处理一个Excel文件,其中按month_year(9月18日至10月15日)包含多(36)张工作表,并使用字典进行读取
<div id="container">
<div class="item">
Text with a <a href="#">link</a> in it.
This is text without a link in it.
</div>
<div class="item">
Text with a <a href="#">link</a> in it.
This is text without a link in it.
</div>
</div>
我需要一次设置所有14个列的名称,但出现ValueError:长度不匹配
在这里,我们可以看到一些工作表仅包含13列
import pandas as pd
fileName = 'project_dropColumnICSv2.xlsx'
df = pd.ExcelFile(fileName)
sheetNames = df.sheet_names
vars_dict = {}
for sheetName in sheetNames:
vars_dict["the_{0}".format(sheetName)] = pd.read_excel(fileName, sheet_name=sheetName, index_col=False)
mykeys = []
for key, value in vars_dict.items():
mykeys.append(key)
我尝试仅添加另一列
for mykey in mykeys:
print("'{}' contains {} columns".format((mykey), len(vars_dict.get(mykey).columns)))
'the_Sep 18' contains 14 columns
'the_Aug 18' contains 14 columns
'the_Jul 18' contains 14 columns
'the_Jun 18' contains 14 columns
'the_May 18' contains 14 columns
'the_April 18' contains 14 columns
'the_March 18' contains 14 columns
'the_February 18' contains 13 columns
'the_January 18' contains 14 columns
'the_December 17' contains 13 columns
'the_November 17' contains 13 columns
'the_October 17' contains 13 columns
'the_September 17' contains 13 columns
'the_August 17' contains 14 columns
'the_July 17' contains 14 columns
'the_June 17' contains 14 columns
'the_May 17' contains 14 columns
'the_April 17' contains 14 columns
'the_MARCH 17' contains 14 columns
'the_February17' contains 14 columns
'the_January17' contains 14 columns
'the_December16' contains 14 columns
'the_November16' contains 14 columns
'the_October 16' contains 14 columns
'the_September' contains 14 columns
'the_August' contains 15 columns
'the_July' contains 14 columns
'the_June' contains 14 columns
'the_May' contains 14 columns
'the_April' contains 14 columns
'the_March' contains 13 columns
'the_February' contains 13 columns
'the_January' contains 13 columns
'the_December' contains 13 columns
'the_November' contains 14 columns
'the_October' contains 13 columns
使用for循环更改列名称,但结果是第一列的字段错误,简而言之,未对齐。
假设我的列名有一个数组,我该如何做呢?
for mykey in mykeys:
if len(vars_dict.get(mykey).columns) == 13:
vars_dict.get(mykey)['Another Column'] = 'Nan'
P.S。有一张纸包含15列,只需删除最后一个
答案 0 :(得分:0)
我认为需要参数sheet_name=None
才能将所有工作表转换为read_excel
中的OrderedDict of DataFrames
:
fileName = 'project_dropColumnICSv2.xlsx'
dfs = pd.read_excel(fileName, sheet_name=None)
然后使用字典理解来检查列数,并用assign
进行设置,并创建新字典:
dfs = {k: v.assign(New = np.nan) if len(v.columns) == 13 else v for k, v in dfs.items()}
如果还需要更改键:
dfs = {f'the_{}'.format(k): v.assign(New = np.nan)
if len(v.columns) == 13
else v for k, v in dfs.items()}
然后通过键选择每个DataFrame:
print (dfs['Sep 18'])