Question

我正在尝试将数据从XML数据组织成字典格式。这将用于运行蒙特卡罗模拟。

以下是XML中几个条目的示例：

<retirement>
    <item>
        <low>-0.34</low>
        <high>-0.32</high>
        <freq>0.0294117647058824</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
    <item>
        <low>-0.32</low>
        <high>-0.29</high>
        <freq>0</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
</retirement>

我当前的数据集只有两个变量，类型可以是3个中的1个，也可以是4个离散类型。硬编码两个变量不是问题，但我想开始处理具有更多变量的数据并自动执行此过程。我的目标是自动将这些XML数据导入到字典中，以便以后能够进一步操作它，而无需在数组标题中进行硬编码。变量。

这就是我所拥有的：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse('xmlfile')

# Create iterable item list
Items = tree.findall('item')

# Create Master Dictionary
masterDictionary = {}

# Assign variables to dictionary
for Item in Items:
    thisKey = Item.find('variable').text
    if thisKey in masterDictionary == False:
        masterDictionary[thisKey] = []
    else:
        pass

thisList = masterDictionary[thisKey]
newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text))
thisSublist.append(newDataPoint)

我收到了KeyError @ thisList = masterDictionary [thisKey]

我也在尝试创建一个类来处理xml的其他一些元素：

# Define a class for each data point that contains low, hi and freq attributes
class DataPoint:
 def __init__(self, low, high, freq):
  self.low = low
  self.high = high
  self.freq = freq

那么我是否能够通过以下方式检查值：

masterDictionary['stock'] [0].freq

感谢任何和所有帮助

更新

感谢John的帮助。压痕问题对我来说很邋..这是我第一次在Stack上发帖，我只是没有得到正确的复制/粘贴。 else之后的部分实际上是缩进为for循环的一部分，而类在我的代码中缩进了四个空格 - 这里只是一个糟糕的帖子。我会记住大写惯例。您的建议确实有效，现在使用命令：

print masterDictionary.keys()
print masterDictionary['stock'][0].low

的产率：

['inflation', 'stock']
-0.34

这些确实是我的两个变量，并且值与顶部列出的xml同步。

更新2

嗯，我以为我已经把这个想出来了，但我又粗心了，事实证明我没有完全解决这个问题。之前的解决方案最终将所有数据写入我的两个字典键，以便我有两个相同的列表，分配给两个不同的字典键的所有数据。我们的想法是将不同的数据集从XML分配给匹配的字典键。这是当前的代码：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}
thisList = []

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text 
    masterDictionary[thisKey] = thisList
    if thisKey not in masterDictionary:
        masterDictionary[thisKey] = []
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    thisList.append(newDataPoint)

当我输入时：

print masterDictionary['stock'][5].low
print masterDictionary['inflation'][5].low
print len(masterDictionary['stock'])
print len(masterDictionary['inflation'])

两个键（“股票”和“通胀”）的结果相同：

-.22
-.22
56
56

XML文件中有27个带有stock标签的项目，29个带有通货膨胀的标签。如何将分配给字典键的每个列表仅拉出循环中的特定数据？

更新3

它似乎可以使用2个循环，但我不知道它是如何以及为什么它不能在单个循环中工作。我意外地管理了这个：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text
    thisList = []
    masterDictionary[thisKey] = thisList

for item in items:
    thisKey = item.find('variable').text
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    masterDictionary[thisKey].append(newDataPoint)

我尝试了大量的排列，让它在一个循环中发生，但没有运气。我可以将所有数据列入两个键 - 所有数据的相同数组（不是很有帮助），或者数据正确地分类为两个键的2个不同数组，但只有最后一个数据条目（循环覆盖自身）每次只留下数组中的一个条目。）

Answer 1

在（不必要的）else: pass之后，您有严重的缩进问题。解决此问题，然后重试。您的样本输入数据是否会出现问题？其他数据？第一次围绕循环？导致问题的thisKey的值是什么[提示：它在KeyError错误消息中报告]？错误发生之前masterDictionary的内容是什么[提示：在代码周围添加一些print语句？

与您的问题无关的其他评论：

而不是if thisKey in masterDictionary == False:考虑使用if thisKey not in masterDictionary: ...对True或False的比较几乎总是多余的和/或有点“代码味道”。< / p>

Python约定是为类保留带有首字母大写字母（如Item）的名称。

每个缩进级别只使用一个空格会使代码几乎难以理解并且被严重弃用。总是使用4（除非你有充分的理由 - 但我从来没有听说过）。

更新我错了：thisKey in masterDictionary == False比我想象的更糟糕;因为in是一个关系运算符，所以使用了链式求值（如a <= b < c），所以你总是将(thisKey in masterDictionary) and (masterDictionary == False)计算为False，因此字典永远不会更新。我建议修复：使用if thisKey not in masterDictionary:

看起来thisList（已初始化但未使用）应为thisSublist（已使用但未初始化）。

Answer 2

变化：

if thisKey in masterDictionary == False:

到

if thisKey not in masterDictionary:

这似乎是你得到那个错误的原因。此外，您需要在尝试附加到“thisSublist”之前指定一些内容。尝试：

thisSublist = []
thisSublist.append(newDataPoint)

Answer 3

for-loop中的if语句出错。而不是

if thisKey in masterDictionary == False:

写

if (thisKey in masterDictionary) == False:

考虑到原始代码的其余部分，您将能够访问如下数据：

masterDictionary['stock'][0].freq

John Machin提出了一些关于风格和气味的有效观点，（你应该考虑他建议的变化），但这些东西会带来时间和经验。

将XML数据组织到字典中

3 个答案: