我的输入文件是Json文件
{ "infile":"c:/tmp/cust-in-sample.xlsx",
"SheetName":"Sheet1",
"CleanColumns":[1,2],
"DeleteColumns":[3,5],
"outfile":"c:/tmp/out-cust-in-sample.csv"
}
我想在json中指定要清理和删除的列。但是我得到了pandas字符串错误。
我正在尝试这段代码:
import json
import pandas as pd
import gzip
import shutil
import sys
zJsonFile = sys.argv[-1]
iCount = len(sys.argv)
if iCount == 2:
print "json file path " ,zJsonFile
else:
print "need a json file path ending the script"
sys.exit()
with open(zJsonFile,'rb') as zTestJson:
decoded = json.load(zTestJson)
#Parameterizing the code, reading each key from 'decoded' variable and putting it into another variable for the purpose
#of parameterizing
Infile = decoded.get('infile')
print Infile
Outfile = decoded.get('outfile')
print Outfile
Sheetname = decoded.get('SheetName')
print Sheetname
# this is a list
deletecols = decoded.get('DeleteColumns')
print deletecols
#this is a list
cleancols = decoded.get('CleanColumns')
print cleancols
input_sheet = pd.ExcelFile(Infile)
dfs = {}
for x in [Sheetname]:
dfs[x] = input_sheet.parse(x)
print dfs
df = pd.DataFrame(dfs[x]) # COnverting dict to dataframe
print df
deletecols = df.columns.values.tolist()
cleancols = df.columns.values.tolist()
for idx,item in enumerate(deletecols):
df.pop(item)
#df.drop(df.columns[deletecols],axis=1,inplace=True)
#Cleaning the code
#cleancols=[]
for x in cleancols:
df[x] = df[x].str.replace(to_replace = '"', value = '', regex = True)
df[x] = df[x].str.replace(to_replace = "'", value = '', regex = True)
df[x] = df[x].str.replace(to_replace = ",", value = '', regex = True)
我试过df.pop,df.drop这看起来没什么看起来像是在为我工作而且没有创建一个循环并循环清理我的文件。
非常感谢任何帮助。!