在Pandas DataFrame中展平元素列表

时间:2018-03-23 14:30:08

标签: python-3.x pandas dataframe

我的数据结构是:

ds = [{
    "name": "groupA",
    "subGroups": [123,456]
},
{
    "name": "groupB",
    "subGroups": ['aaa', 'bbb' , 'ccc']
}]

这给出了以下数据框

df = pd.DataFrame(ds)

    name    subGroups
0   groupA  [123, 456]
1   groupB  [aaa, bbb, ccc]   

我想:

    name    subGroupsFlattend
0   groupA  123
1   groupA  456
2   groupB  aaa
3   groupB  bbb
4   groupB  ccc

有什么想法吗?

4 个答案:

答案 0 :(得分:4)

使用explode

 "error": {
     "code": 401,
     "message": "Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid
 authentication credential. See
 https://developers.google.com/identity/sign-in/web/devconsole-project.",
     "errors": [
       {
         "message": "Login Required.",
        "domain": "global",
         "reason": "required",
         "location": "Authorization",
         "locationType": "header"

答案 1 :(得分:2)

您可以使用SO question

from pandas.io.json import json_normalize

df = json_normalize(ds,  ['subGroups'], 'name').rename(columns={0:'subGroupsFlattend'})
print (df)
  subGroupsFlattend    name
0               123  groupA
1               456  groupA
2               aaa  groupB
3               bbb  groupB
4               ccc  groupB

扁平化词典的替代解决方案:

L = [y for x in ds for y in zip(x["subGroups"], [x["name"]] * len(x["subGroups"]))]
print (L)
[(123, 'groupA'), (456, 'groupA'), ('aaa', 'groupB'), ('bbb', 'groupB'), ('ccc', 'groupB')]

df = pd.DataFrame(L, columns=['subGroupsFlattend','name'])
print (df)
  subGroupsFlattend    name
0               123  groupA
1               456  groupA
2               aaa  groupB
3               bbb  groupB
4               ccc  groupB

编辑:

from itertools import chain
df = pd.DataFrame(ds)

df1 = pd.DataFrame({
    'subGroups' : list(chain.from_iterable(df['subGroups'].tolist())), 
    'name' : df['name'].values.repeat(df['subGroups'].str.len())
})
print (df1)
     name subGroups
0  groupA       123
1  groupA       456
2  groupB       aaa
3  groupB       bbb
4  groupB       ccc

答案 2 :(得分:2)

您可以通过以下方式修复输出:

pd.DataFrame({'name':df.name.repeat(df.subGroups.str.len()),'subGroup':df.subGroups.sum()})
Out[364]: 
     name subGroup
0  groupA      123
0  groupA      456
1  groupB      aaa
1  groupB      bbb
1  groupB      ccc

答案 3 :(得分:0)

YOBEN_S解决方案,但对于大数据帧而言效率更高。

from itertools import chain
pd.DataFrame({'name':df.name.repeat(df.subGroups.str.len()),
              'subGroup':list(chain.from_iterable(df.subGroups.to_list()))})