将统一dicts列表转换为pandas Dataframe,将嵌套dicts转换为多索引

时间:2016-01-27 05:53:39

标签: python dictionary pandas

尽管搜索量很大,但仍有点损失。实验... 鉴于此:

dictA = {'order': '1',
         'char': {'glyph': 'A',
                  'case': 'upper',
                  'vowel': True}
         }
dictB = {'order': '2',
         'char': {'glyph': 'B',
                  'case': 'upper',
                  'vowel': False}
         }
dictC = {'order': '3',
         'char': {'glyph': 'C',
                  'case': 'upper',
                  'vowel': False}
         }
dictD = {'order': '4',
         'char': {'glyph': 'd',
                  'case': 'lower',
                  'vowel': False}
         }
dictE = {'order': '5',
         'char': {'glyph': 'e',
                  'case': 'lower',
                  'vowel': True}
         }
letters = [dictA, dictB, dictC, dictD, dictE]

如何将letters变为:(第一列是索引)

   order              char 
          glyph       case      vowel
0      1      A      upper       True
1      2      B      upper      False
2      3      C      upper      False
3      4      d      lower      False
4      5      e      lower       True

...并且作为一个加号,然后能够对此帧进行操作以计算/绘制大写条目数,元音条目数等。

有什么想法吗?

编辑:我最初的例子可能过于简单,但我会把它留给子孙后代。

假设:

import re

class Glyph(dict):

    def __init__(self, glyph):
        super(Glyph, self).__init__()
        order = ord(glyph)
        self['glyph'] = glyph
        self['order'] = order
        kind = {'type': None}
        if re.search('\s+', glyph):
            kind = {'type': 'whitespace'}

        elif order in (range(ord('a'), ord('z')) +
                       range(ord('A'), ord('Z'))
                       ):

            lowercase = glyph.lower()
            kind = {
                'type': lowercase,
                'vowel': lowercase in ['a', 'e', 'i', 'o', 'u'],
                'case': ['upper', 'lower'][lowercase == glyph],
                'number': (ord(lowercase) - ord('a') + 1)
            }
        self['kind'] = kind

chars = [Glyph(x) for x in 'Hello World']

我可以这样做:

import pandas as pd
df = pd.DataFrame(chars) # dataframe where 'order' & 'glyph' are OK...
# unpack 'kind' Series into list of dicts and use those to make a table 
kindDf = pd.DataFrame(data=[x for x in df['kind']])

我的直觉会让我觉得我可以这样做:

df['kind'] = kindDf

...但是,这只会添加我亲切的DF的第一列,并将其置于' kind'在df。下一次尝试:

df.pop('kind') # get rid of this column of dicts
joined = df.join(kindDf)  # flattens 'kind'... 

joined 如此接近!麻烦的是,我希望这些专栏可以归类于'层次结构,而不是平面(因为joined结果是)。我尝试过堆叠/拆卸魔法,但我无法掌握它。我需要MultiIndex吗?

1 个答案:

答案 0 :(得分:1)

这使你在第一部分接近:

groupby

对于第二部分,您可以依靠plot,然后依靠内置的绘图功能来快速显示视觉效果。如果您只想查看记录,请忽略size()之后的result.groupby(result.char.vowel).size().plot(kind='bar', figsize=[8,6]) title('Glyphs are awesome') 来电。

function setCaretAfterFocusedInput(container) {    
  var input = container.querySelector('input:focus');
  if (input) {
    container.focus(); // required for firefox
    setCaretAfter(input);
  }
}

function setCaretAfter(element) {
  if (window.getSelection) {            
    var range = document.createRange();
    range.setStartAfter(element);
    
    var selection = window.getSelection();
    selection.removeAllRanges();
    selection.addRange(range);
  }
}

// for demonstration purposes
document.addEventListener('keyup', function(e) {
  if (e.which === 16) { // on SHIFT
    var container = document.querySelector('div[contenteditable]');
    setCaretAfterFocusedInput(container);
  }
});

enter image description here