读取数据帧字典时出错TypeError:字符串索引必须是整数,而不是str,

时间:2017-09-07 11:03:06

标签: python pandas dictionary dataframe

我正在尝试从数据框列中获取列和值列表,这是一个嵌套字典:

数据框列如下所示:

{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}}

我正在编写代码:

s1.columns = ['data']
l2 = []
for idx, row in s1['data'].iteritems():
    tempdf = pd.DataFrame(row['request']['plantSearch'])
    tempdf['maxResults'] = row['maxResults']
    l2.append(tempdf)


pd.concat(l2,axis = 0)

问题是python将'row'称为字符串,即使它是字典。

1 个答案:

答案 0 :(得分:0)

我认为您可以使用json.loads转换为使用dict构造函数的DataFrame来解析来自request键的所有数据:

df = pd.DataFrame({'data':['{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}}','{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}}']})
print (df)

                                                data
0  {"id":"0","request":{"plantSearch":"true","max...
1  {"id":"0","request":{"plantSearch":"true","max...
df1 =pd.DataFrame(df['data'].apply(lambda x: pd.io.json.loads(x)['request']).values.tolist())
print (df1)

   Code   Item assocManuf caller companyCode confidential  copas descOperator  \
0  5852  false       PETK    WMS        GB54        false  false           CO   
1  5852  false       PETK    WMS        GB54        false  false           CO   

   flag mastered maxResults plantSearch pscmBlock purchOrg service  
0  true     true         51        true     false     UPSO   false  
1  true     true         51        true     false     UPSO   false  

类似的解决方案:

df = pd.DataFrame([pd.io.json.loads(x)['request'] for x in df['data']])
print (df)

   Code   Item assocManuf caller companyCode confidential  copas descOperator  \
0  5852  false       PETK    WMS        GB54        false  false           CO   
1  5852  false       PETK    WMS        GB54        false  false           CO   

   flag mastered maxResults plantSearch pscmBlock purchOrg service  
0  true     true         51        true     false     UPSO   false  
1  true     true         51        true     false     UPSO   false  

最后可以按子集选择列:

cols = ['plantSearch','maxResults']
df2 = df[cols]
print (df2)
  plantSearch maxResults
0        true         51
1        true         51