将一组字典解析为单行熊猫(Python)

时间:2018-08-01 03:14:42

标签: python pandas dataframe

嗨,我有一个与下面类似的熊猫df

information         record
name                apple
size                {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}
country             America
partiesrelated      [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]

我想将df转换成这样的另一个df

information                  record
name                         apple
size_weight_gram             300
size_weight_oz               10.5
size_description_height      10
size_description_width       15 
country                      America
partiesrelated_nameOfFarmer  John Smith
partiesrelated_farmerID      A0001

在这种情况下,字典将解析为size_weight_gram所在的单行并包含值。

df

的代码
df = pd.DataFrame({'information': ['name', 'size', 'country', 'partiesrealated'], 
                   'record': ['apple', {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}, 'America', [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]]})
df = df.set_index('information')

1 个答案:

答案 0 :(得分:2)

IIUC,您可以定义一个递归函数以嵌套您的序列/字典,直到您获得一个键列表为止,该键列表既可以用作pd.DataFrame构造函数的有效输入,又可以按照您描述的方式进行格式化。

看看这个解决方案:

import itertools
import collections

ch = lambda ite: list(itertools.chain.from_iterable(ite))

def isseq(obj):
    if isinstance(obj, str): return False
    return isinstance(obj, collections.abc.Sequence)

def unnest(k, v):
    if isseq(v): return ch([unnest(k, v_) for v_ in v])
    if isinstance(v, dict): return ch([unnest("_".join([k, k_]), v_) for k_, v_ in v.items()])
    return k,v

def pairwise(i):
    _a = iter(i)
    return list(zip(_a, _a))

a = ch([(unnest(k, v)) for k, v in zip(d['information'], d['record'])])
pd.DataFrame(pairwise(a))

    0                                 1
0   name                              apple
1   size_weight_gram                  300
2   size_weight_oz                    10.5
3   size_description_height           10
4   size_description_width            15
5   country                           America
6   partiesrealated_nameOfFarmer      John Smith
7   partiesrealated_farmerID          A0001

由于该解决方案的递归性质,该算法将使您的嵌套深度达到最大。例如:

d={
  'information': [
    'row1',
    'row2',
    'row3',
    'row4'
  ],
  'record': [
    'val1',
    {
      'val2': {
        'a': 300,
        'b': [
          {
            "b1": 10.5
          },
          {
            "b2": 2
          }
        ]
      },
      'val3': {
        'a': 10,
        'b': 15
      }
    },
    'val4',
    [
      {
        'val5': [
          {
            'a': {
              'c': [
                {
                  'd': {
                    'e': [
                      {
                        'f': 1
                      },
                      {
                        'g': 3
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      },
      {
        'b': 'bar'
      }
    ]
  ]
}



    0                    1
0   row1                 val1
1   row2_val2_a          300
2   row2_val2_b_b1       10.5
3   row2_val2_b_b2       2
4   row2_val3_a          10
5   row2_val3_b          15
6   row3                 val4
7   row4_val5_a_c_d_e_f  1
8   row4_val5_a_c_d_e_g  3
9   row4_b               bar