我正在尝试从数据框列中获取列和值列表,这是一个嵌套字典:
数据框列如下所示:
{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}}
我正在编写代码:
s1.columns = ['data']
l2 = []
for idx, row in s1['data'].iteritems():
tempdf = pd.DataFrame(row['request']['plantSearch'])
tempdf['maxResults'] = row['maxResults']
l2.append(tempdf)
pd.concat(l2,axis = 0)
问题是python将'row'称为字符串,即使它是字典。
答案 0 :(得分:0)
我认为您可以使用json.loads
转换为使用dict
构造函数的DataFrame
来解析来自request
键的所有数据:
df = pd.DataFrame({'data':['{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}}','{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}}']})
print (df)
data
0 {"id":"0","request":{"plantSearch":"true","max...
1 {"id":"0","request":{"plantSearch":"true","max...
df1 =pd.DataFrame(df['data'].apply(lambda x: pd.io.json.loads(x)['request']).values.tolist())
print (df1)
Code Item assocManuf caller companyCode confidential copas descOperator \
0 5852 false PETK WMS GB54 false false CO
1 5852 false PETK WMS GB54 false false CO
flag mastered maxResults plantSearch pscmBlock purchOrg service
0 true true 51 true false UPSO false
1 true true 51 true false UPSO false
类似的解决方案:
df = pd.DataFrame([pd.io.json.loads(x)['request'] for x in df['data']])
print (df)
Code Item assocManuf caller companyCode confidential copas descOperator \
0 5852 false PETK WMS GB54 false false CO
1 5852 false PETK WMS GB54 false false CO
flag mastered maxResults plantSearch pscmBlock purchOrg service
0 true true 51 true false UPSO false
1 true true 51 true false UPSO false
最后可以按子集选择列:
cols = ['plantSearch','maxResults']
df2 = df[cols]
print (df2)
plantSearch maxResults
0 true 51
1 true 51