Question

我的输入文件是Json文件

{ "infile":"c:/tmp/cust-in-sample.xlsx",
   "SheetName":"Sheet1",
   "CleanColumns":[1,2],
   "DeleteColumns":[3,5],
   "outfile":"c:/tmp/out-cust-in-sample.csv"            
}

我想在json中指定要清理和删除的列。但是我得到了pandas字符串错误。

我正在尝试这段代码：

import json
import pandas as pd
import gzip
import shutil
import sys

zJsonFile = sys.argv[-1]

iCount = len(sys.argv)


if iCount == 2:
    print "json file path " ,zJsonFile
else:
    print "need a json file path ending the script"
    sys.exit()


with open(zJsonFile,'rb') as zTestJson:
    decoded = json.load(zTestJson)

#Parameterizing the code, reading each key from 'decoded' variable and putting it into another variable for the purpose 
#of parameterizing

Infile = decoded.get('infile')
print Infile


Outfile = decoded.get('outfile')
print Outfile


Sheetname = decoded.get('SheetName')
print Sheetname

# this is a list
deletecols = decoded.get('DeleteColumns')
print deletecols

#this is a list
cleancols = decoded.get('CleanColumns')
print cleancols
input_sheet = pd.ExcelFile(Infile)                            
dfs = {}                                                     
for x in [Sheetname]:                          
    dfs[x] = input_sheet.parse(x)
    print dfs                                             

df = pd.DataFrame(dfs[x])             # COnverting dict to dataframe
print df
deletecols = df.columns.values.tolist()
cleancols = df.columns.values.tolist()
for idx,item in enumerate(deletecols):
    df.pop(item)
#df.drop(df.columns[deletecols],axis=1,inplace=True)   
#Cleaning the code
#cleancols=[]
for x in cleancols:                                                                       
    df[x] = df[x].str.replace(to_replace = '"', value = '', regex = True)
    df[x] = df[x].str.replace(to_replace = "'", value = '', regex = True)
    df[x] = df[x].str.replace(to_replace = ",", value = '', regex = True)

我试过df.pop，df.drop这看起来没什么看起来像是在为我工作而且没有创建一个循环并循环清理我的文件。

非常感谢任何帮助。！

Pandas错误：只能使用.str访问器，其字符串值使用no.object dtype

0 个答案: