从具有重复键值对的重度嵌套字典列表中提取值

时间:2016-06-01 20:38:00

标签: python dictionary pandas

尝试从复杂和杂乱的词典列表中提取总现金和现金等价值。该结构的缩短版本如下。

我尝试过:地图,Dataframe.from_dict& .from_records。尽量避免使用RE。

我很难过。



<header>
    <div class="jumbotron">
        <center><h1>Bienvenidos a JVasconcelos.me</h1></center>
    </div>
</header>

<div class="container">
    <div class="row">
        <div class="col-md-12 col-centered">
            <div class="c1">
                <div class="c2">
                    <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quis inventore illum quaerat laboriosam eos, vel sed suscipit cumque laborum est animi aliquid tempora iusto beatae quam quisquam porro dolore! Ullam tenetur doloribus ducimus, totam voluptatum, deleniti vero voluptatem eius architecto velit neque voluptas aliquam quidem sed eveniet! Nobis ex eos iste dolorum tempora doloremque non deleniti, aperiam quibusdam corrupti officia consequatur, impedit. Exercitationem debitis iste voluptatum, illo nulla iure culpa ex fugit, aliquid dolorem excepturi, impedit voluptates quae quidem error earum natus, provident eum vitae. Tempore ducimus laborum voluptates, qui aspernatur odit dolorum modi quas cupiditate unde quam earum amet!
                    </p>
                </div>
            </div>
        </div>
    </div>
</div> 
&#13;
&#13;
&#13;

1 个答案:

答案 0 :(得分:1)

如果您知道数据将具有上述格式,并且您实际上只需要这两个值,则可以直接访问它(假设data是您的上述结构):

print data[0]['Rows'][2]['Rows'][3]['Cells'][1]['Value']
print data[0]['Rows'][2]['Rows'][3]['Cells'][2]['Value']

但是,这在写下正确的表达式和关于列表顺序的更改(可能没有在您的格式中定义)方面都很容易出错。由于数据背后有一个语义结构,您可以将原始数据转换回易于访问的对象。您可能想要更改一些细节,但这是一个很好的起点:

from collections import Mapping
import pandas as pd

class Report(Mapping):
    def __init__(self, data):
        self.sections = OrderedDict()
        for row in data.pop('Rows'):
            getattr(self, 'make_%s' % row['RowType'])(row)
        self.__dict__.update(data)

    def make_Header(self, row):
        self.header = [c['Value'] for c in row['Cells']]

    def make_Section(self, sec):
        def make_row(row):
            cells = [c['Value'] for c in row['Cells']]
            return pd.Series(map(float, cells[1:]), name=cells[0])

        self.sections[sec['Title']] = pd.DataFrame(make_row(r) for r in sec['Rows'])

    def __getitem__(self, item):
        return self.sections[item]

    def __len__(self):
        return len(self.sections)

    def __iter__(self):
        return iter(self.sections)


report = Report(data[0])
print report.ReportName
print report['Cash and Cash Equivalents']