我的数据结构是:
ds = [{
"name": "groupA",
"subGroups": [123,456]
},
{
"name": "groupB",
"subGroups": ['aaa', 'bbb' , 'ccc']
}]
这给出了以下数据框
df = pd.DataFrame(ds)
name subGroups
0 groupA [123, 456]
1 groupB [aaa, bbb, ccc]
我想:
name subGroupsFlattend
0 groupA 123
1 groupA 456
2 groupB aaa
3 groupB bbb
4 groupB ccc
有什么想法吗?
答案 0 :(得分:4)
使用explode
:
"error": {
"code": 401,
"message": "Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid
authentication credential. See
https://developers.google.com/identity/sign-in/web/devconsole-project.",
"errors": [
{
"message": "Login Required.",
"domain": "global",
"reason": "required",
"location": "Authorization",
"locationType": "header"
答案 1 :(得分:2)
您可以使用SO question:
from pandas.io.json import json_normalize
df = json_normalize(ds, ['subGroups'], 'name').rename(columns={0:'subGroupsFlattend'})
print (df)
subGroupsFlattend name
0 123 groupA
1 456 groupA
2 aaa groupB
3 bbb groupB
4 ccc groupB
扁平化词典的替代解决方案:
L = [y for x in ds for y in zip(x["subGroups"], [x["name"]] * len(x["subGroups"]))]
print (L)
[(123, 'groupA'), (456, 'groupA'), ('aaa', 'groupB'), ('bbb', 'groupB'), ('ccc', 'groupB')]
df = pd.DataFrame(L, columns=['subGroupsFlattend','name'])
print (df)
subGroupsFlattend name
0 123 groupA
1 456 groupA
2 aaa groupB
3 bbb groupB
4 ccc groupB
编辑:
from itertools import chain
df = pd.DataFrame(ds)
df1 = pd.DataFrame({
'subGroups' : list(chain.from_iterable(df['subGroups'].tolist())),
'name' : df['name'].values.repeat(df['subGroups'].str.len())
})
print (df1)
name subGroups
0 groupA 123
1 groupA 456
2 groupB aaa
3 groupB bbb
4 groupB ccc
答案 2 :(得分:2)
您可以通过以下方式修复输出:
pd.DataFrame({'name':df.name.repeat(df.subGroups.str.len()),'subGroup':df.subGroups.sum()})
Out[364]:
name subGroup
0 groupA 123
0 groupA 456
1 groupB aaa
1 groupB bbb
1 groupB ccc
答案 3 :(得分:0)
YOBEN_S解决方案,但对于大数据帧而言效率更高。
from itertools import chain
pd.DataFrame({'name':df.name.repeat(df.subGroups.str.len()),
'subGroup':list(chain.from_iterable(df.subGroups.to_list()))})