当我将有效负载传递给将对象转换为json的方法时,它将从元素中删除名称空间。我想将名称空间保留在序列化的json对象中。
<?xml version="1.0" encoding="UTF-8"?><html lang="en">
<head>
<title>jahaahahjjajajajajjajaja</title>
</head>
<body id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem"><!-- --></a>
<main role="main"><article role="article" aria-labelledby="ariaid-title1">
<h1 class="title topictitle1" id="ariaid-title1">jahaahahjjajajajajjajaja</h1>
<content class="body conbody"><p class="shortdesc">Overview of the full tool chain for jahaahahjjajajajajjajaja UA content development. Describes the
purpose of each tool and its intended end user.</p>
<p class="p">The jahaahahjjajajajajjajaja User Assistance ecosystem is being updated to employ modern tools for
structured content development, management, and delivery. The new tool chain combines
several tools that enable the jahaahahjjajajajajjajaja information developer to create, publish, and
maintain jahaahahjjajajajajjajaja UA content. </p>
<p class="p">The new tools are grouped by function, enabling you to <a class="xref" href="#c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_gqw_vkq_lgb">develop,</a>
<a class="xref" href="#c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_btp_xkq_lgb">review,</a>
<a class="xref" href="#c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_evf_zkq_lgb">manage,</a> and <a class="xref" href="#c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_bmm_1lq_lgb">deliver</a> consistent, accurate, and personalized UA content to
jahaahahjjajajajajjajaja customers.</p>
<p class="p">The new tools are shown in the diagram below, and explained more thoroughly in the
Writer's Toolbox documentation.</p>
<figure class="fig fignone" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__fig_j4y_qby_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__fig_j4y_qby_lgb"><!-- --></a>
<a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__image_pl2_pc4_kgb"><!-- --></a>
<ac:image xmlns:ac="urn:ac" xmlns:ri="urn:ri" xmlns:mf="urn:mf" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__image_pl2_pc4_kgb"><ri:attachment ri:filename="g_tool_chain.jpg"/></ac:image>
</figure>
<section class="section" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_gqw_vkq_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_gqw_vkq_lgb"><!-- --></a><h2 class="title sectiontitle">Content Development</h2>
<p class="p">jahaahahjjajajajajjajaja is authoring content in the Darwin Information Typing Architecture (jahaahahjjajajajajjajaja), a
technical communications XML standard, and thus requires a jahaahahjjajajajajjajaja-compliant XML
Editor. jahaahahjjajajajajjajaja has chosen the jahaahahjjajajajajjajaja tool set for to creating its UA content in
jahaahahjjajajajajjajaja XML.</p>
<dl class="dl">
<dt class="dt dlterm">jahaahahjjajajajajjajaja Editor</dt>
<dd class="dd"> jahaahahjjajajajajjajaja Editor is a desktop editor that should be used by any information
developer whose main job is to create UA content.</dd>
<dt class="dt dlterm">jahaahahjjajajajajjajaja Web Author</dt>
<dd class="dd"> jahaahahjjajajajajjajaja Web Author is a browser-based editor that should be used by any
content contributor, such as a Subject Matter Expert (SME), who does not
write full-time and does not typically have the need nor desire to learn
jahaahahjjajajajajjajaja XML.</dd>
</dl>
</section>
<section class="section" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_btp_xkq_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_btp_xkq_lgb"><!-- --></a><h2 class="title sectiontitle">Content Review</h2>
<p class="p">Because jahaahahjjajajajajjajaja is a topic-based architecture, jahaahahjjajajajajjajaja needs a review platform that is
both lightweight and allows for topic-based reviews, as opposed to reviews of full
books or chapters. jahaahahjjajajajajjajaja's jahaahahjjajajajajjajaja platform meets these requirements and
will be the main platform for reviewing UA content.</p>
<dl class="dl">
<dt class="dt dlterm">jahaahahjjajajajajjajaja</dt>
<dd class="dd">
<p class="p">The jahaahahjjajajajajjajaja platform has two components: an "add-on" that is part
of the jahaahahjjajajajajjajaja Editor desktop application, and a web interface where
reviewers can add their comments and even make changes.</p>
<p class="p">The add-on is used by content owners to put their topics into review, get
a URL, and share the URL with chosen content reviewers.</p>
</dd>
</dl>
</section>
<section class="section" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_evf_zkq_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_evf_zkq_lgb"><!-- --></a><h2 class="title sectiontitle">Content Management</h2>
<p class="p">jahaahahjjajajajajjajaja UA content will be stored centrally in a Git repository, Bitbucket, and
managed locally with the SourceTree client application. Working copies of content
will reside on client (local) machines and be pushed to the shared repository when
ready to be shared. </p>
<dl class="dl">
<dt class="dt dlterm">Bitbucket</dt>
<dd class="dd">Bitbucket is a Git repository that provides jahaahahjjajajajajjajaja UA a central, shared
repository for content. Its main interface is a browser-based web interface,
although it can also be accessed via command line and desktop applications
such as SourceTree. jahaahahjjajajajajjajaja authors will use Bitbucket web client to
collaborate with one another on the shared repository. </dd>
<dt class="dt dlterm">SourceTree</dt>
<dd class="dd">SourceTree is a client application that connects to Git repositories.
jahaahahjjajajajajjajaja authors will use SourceTree to manage both remote and local versions
of their content. Because it is a client application, SourceTree has the
advantage of being able to track activity at the local level. </dd>
<dt class="dt dlterm">File Explorer</dt>
<dd class="dd">Windows Explorer (Windows) or Finder (Mac) will be used by jahaahahjjajajajajjajaja authors
to store and organize local versions of their content before pushing to the
shared repository.</dd>
</dl>
</section>
<section class="section" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_bmm_1lq_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_bmm_1lq_lgb"><!-- --></a><h2 class="title sectiontitle">Content Delivery</h2>
<p class="p">jahaahahjjajajajajjajaja's jahaahahjjajajajajjajaja content will be published through the open source jahaahahjjajajajajjajaja Open Toolkit
(jahaahahjjajajajajjajaja-OT). The jahaahahjjajajajajjajaja-OT will be kicked off via the jahaahahjjajajajajjajaja Editor interface.</p>
<dl class="dl">
<dt class="dt dlterm">jahaahahjjajajajajjajaja Open Toolkit</dt>
<dd class="dd">The jahaahahjjajajajajjajaja-OT transforms jahaahahjjajajajajjajaja XML to different formats for consumption by a
customer. jahaahahjjajajajajjajaja will use the jahaahahjjajajajajjajaja-OT to produce PDF, WebHelp, Word, and
CHM formats.</dd>
</dl>
</section>
</content>
</article></main></body>
</html>
import json
import xml.etree.ElementTree as ET
class Page:
def __init__(self, type, title, space, body):
self.type = type
self.title = title
self.space = space
self.body = body
def getPageTitle(self):
return self.title
def getType(self):
return self.type
def getContent(self):
return self.content
def getJSONObject(self):
jsonobj = json.dumps(self.__dict__)
return jsonobj
class childPage(Page):
def __init__(self, type, title, ancestors, space, body):
self.type = type
self.title = title
self.ancestors = ancestors
self.space = space
self.body = body
def getContent(file):
tree=ET.parse(file)
root=tree.getroot()
title2 = findTitle(root)
body2 = findContent(root)
print(body2)
return title2, body2
def findTitle(root):
for e in root.findall('head'):
title3 = e.find('title').text
return title3
def findContent(root):
for e in root.findall('body'):
body3 = e.find('main/article/content')
return ET.tostring(body3).decode("utf-8")
title, value = getContent("test.html")
space = {"key": "TOOL"}
ancestors = [{"id":245}]
body = {"storage":{"value":value, "representation":"storage"}}
pageob = childPage("page", title, ancestors, space, body)
print (pageob.getJSONObject())
此代码有效。但是,当解码字节对象时,名称空间将被剥离并替换为意外字符。
我不是专业开发人员。请原谅代码中的任何错误。 你能帮我解决这个问题吗?预先谢谢你。
答案 0 :(得分:0)
当我注册名称空间时,问题就消失了。我在这里找到了答案: How to preserve namespaces when parsing xml via ElementTree in Python