Question

我有以下类型的列表：

   [{'yarn.scheduler.capacity.maximum-applications': ['capacity-scheduler.xml', 'name']},{{'yarn.nodemanager.env-whitelist': ['yarn-site.xml', 'name']},{'*': ['capacity-scheduler.xml', 'value']},{'100': ['capacity-scheduler.xml', 'value']},'java_home,hadoop_common_home,hadoop_hdfs_home,hadoop_conf_dir,classpath_prepend_distcache,hadoop_yarn_home,hadoop_mapred_home': ['yarn-site.xml', 'value']},{'\n      default acl for management operations for all key acls that are not\n      explicitly defined.\n    ': ['kms-acls.xml', 'description']}]

这是我尝试构建不同 xml 文档部分的倒排索引，其中部分看起来有点像这样：

在文件名中：capability-scheduler.xml

<name> yarn.scheduler.capacity.maximum-applications </name>

在文件名中：yarn-site.xml

<name> yarn.nodemanager.env-whitelist </name>

在文件名中：capability-scheduler.xml

<value>100</value>

等

我尝试构建一个倒排索引，它遍历文件夹中的不同 XML 文件。现在，我正在尝试使用此倒排索引来获取以下格式的列表：

[{'yarn':['capacity-scheduler.xml','name'],'scheduler':['capacity-scheduler.xml','name'],'capacity':['capacity-scheduler.xml','name'],'maximum':['capacity-scheduler.xml','name'],'applications':['capacity-scheduler.xml','name'], '*':['capacity-scheduler.xml','value'],'100':['capacity-scheduler.xml','value'],...} so on and so forth

我无法理解如何以这种方式“拆分”键，因为拆分在 dict 对象上返回错误。我也想保留包含下划线的单词例如：

 [{'java_home':[file,element], 'hadoop_home':[file,element]}]

我这样做是为了将结果写入一个 output.xml 文件，该文件显示所有这些标记来自何处（哪个文件）以及来自哪个元素。

例如：

<token>
        <value>yarn</value>
        <where>capacity-scheduler.xml</where>
        <elem>name</name>
        <where>yarn-site.xml</where>
        <elem>name</name>
</token>

如何拆分键以获得每个键具有相同字典值的单独键值对？

0 个答案: