如何防止json序列化删除前缀或名称空间?

时间:2019-04-04 05:23:02

标签: python json serialization

当我将有效负载传递给将对象转换为json的方法时,它将从元素中删除名称空间。我想将名称空间保留在序列化的json对象中。

输入HTML文件

<?xml version="1.0" encoding="UTF-8"?><html lang="en">
<head>
<title>jahaahahjjajajajajjajaja</title>
</head>
<body id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem"><!-- --></a>
<main role="main"><article role="article" aria-labelledby="ariaid-title1">
    <h1 class="title topictitle1" id="ariaid-title1">jahaahahjjajajajajjajaja</h1>


    <content class="body conbody"><p class="shortdesc">Overview of the full tool chain for jahaahahjjajajajajjajaja UA content development. Describes the
        purpose of each tool and its intended end user.</p>

        <p class="p">The jahaahahjjajajajajjajaja User Assistance ecosystem is being updated to employ modern tools for
            structured content development, management, and delivery. The new tool chain combines
            several tools that enable the jahaahahjjajajajajjajaja information developer to create, publish, and
            maintain jahaahahjjajajajajjajaja UA content. </p>

        <p class="p">The new tools are grouped by function, enabling you to  <a class="xref" href="#c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_gqw_vkq_lgb">develop,</a>
            <a class="xref" href="#c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_btp_xkq_lgb">review,</a>
            <a class="xref" href="#c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_evf_zkq_lgb">manage,</a> and <a class="xref" href="#c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_bmm_1lq_lgb">deliver</a> consistent, accurate, and personalized UA content to
            jahaahahjjajajajajjajaja customers.</p>

        <p class="p">The new tools are shown in the diagram below, and explained more thoroughly in the
            Writer's Toolbox documentation.</p>

        <figure class="fig fignone" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__fig_j4y_qby_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__fig_j4y_qby_lgb"><!-- --></a>
            <a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__image_pl2_pc4_kgb"><!-- --></a>
            <ac:image xmlns:ac="urn:ac" xmlns:ri="urn:ri" xmlns:mf="urn:mf" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__image_pl2_pc4_kgb"><ri:attachment ri:filename="g_tool_chain.jpg"/></ac:image>
        </figure>

        <section class="section" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_gqw_vkq_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_gqw_vkq_lgb"><!-- --></a><h2 class="title sectiontitle">Content Development</h2>

            <p class="p">jahaahahjjajajajajjajaja is authoring content in the Darwin Information Typing Architecture (jahaahahjjajajajajjajaja), a
                technical communications XML standard, and thus requires a jahaahahjjajajajajjajaja-compliant XML
                Editor. jahaahahjjajajajajjajaja has chosen the jahaahahjjajajajajjajaja tool set for   to creating its UA content in
                jahaahahjjajajajajjajaja XML.</p>

            <dl class="dl">

                    <dt class="dt dlterm">jahaahahjjajajajajjajaja Editor</dt>

                    <dd class="dd"> jahaahahjjajajajajjajaja Editor is a desktop editor that should be used by any information
                        developer whose main job is to create UA content.</dd>



                    <dt class="dt dlterm">jahaahahjjajajajajjajaja Web Author</dt>

                    <dd class="dd"> jahaahahjjajajajajjajaja Web Author is a browser-based editor that should be used by any
                        content contributor, such as a Subject Matter Expert (SME), who does not
                        write full-time and does not typically have the need nor desire to learn
                        jahaahahjjajajajajjajaja XML.</dd>


            </dl>

        </section>

        <section class="section" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_btp_xkq_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_btp_xkq_lgb"><!-- --></a><h2 class="title sectiontitle">Content Review</h2>

            <p class="p">Because jahaahahjjajajajajjajaja is a topic-based architecture, jahaahahjjajajajajjajaja needs a review platform that is
                both lightweight and allows for topic-based reviews, as opposed to reviews of full
                books or chapters. jahaahahjjajajajajjajaja's jahaahahjjajajajajjajaja platform meets these requirements and
                will be the main platform for reviewing UA content.</p>

            <dl class="dl">

                    <dt class="dt dlterm">jahaahahjjajajajajjajaja</dt>

                    <dd class="dd">
                        <p class="p">The jahaahahjjajajajajjajaja platform has two components: an "add-on" that is part
                            of the jahaahahjjajajajajjajaja Editor desktop application, and a web interface where
                            reviewers can add their comments and even make changes.</p>

                        <p class="p">The add-on is used by content owners to put their topics into review, get
                            a URL, and share the URL with chosen content reviewers.</p>

                    </dd>


            </dl>

        </section>

        <section class="section" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_evf_zkq_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_evf_zkq_lgb"><!-- --></a><h2 class="title sectiontitle">Content Management</h2>

            <p class="p">jahaahahjjajajajajjajaja UA content will be stored centrally in a Git repository, Bitbucket, and
                managed locally with the SourceTree client application. Working copies of content
                will reside on client (local) machines and be pushed to the shared repository when
                ready to be shared. </p>

            <dl class="dl">

                    <dt class="dt dlterm">Bitbucket</dt>

                    <dd class="dd">Bitbucket is a Git repository that provides jahaahahjjajajajajjajaja UA a central, shared
                        repository for content. Its main interface is a browser-based web interface,
                        although it can also be accessed via command line and desktop applications
                        such as SourceTree. jahaahahjjajajajajjajaja authors will use Bitbucket web client to
                        collaborate with one another on the shared repository. </dd>



                    <dt class="dt dlterm">SourceTree</dt>

                    <dd class="dd">SourceTree is a client application that connects to Git repositories.
                        jahaahahjjajajajajjajaja authors will use SourceTree to manage both remote and local versions
                        of their content. Because it is a client application, SourceTree has the
                        advantage of being able to track activity at the local level. </dd>



                    <dt class="dt dlterm">File Explorer</dt>

                    <dd class="dd">Windows Explorer (Windows) or Finder (Mac) will be used by jahaahahjjajajajajjajaja authors
                        to store and organize local versions of their content before pushing to the
                        shared repository.</dd>


            </dl>

        </section>

        <section class="section" id="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_bmm_1lq_lgb"><a name="c_jahaahahjjajajajajjajaja_ua_tools_ecosystem__section_bmm_1lq_lgb"><!-- --></a><h2 class="title sectiontitle">Content Delivery</h2>

            <p class="p">jahaahahjjajajajajjajaja's jahaahahjjajajajajjajaja content will be published through the open source jahaahahjjajajajajjajaja Open Toolkit
                (jahaahahjjajajajajjajaja-OT). The jahaahahjjajajajajjajaja-OT will be kicked off via the jahaahahjjajajajajjajaja Editor interface.</p>

            <dl class="dl">

                    <dt class="dt dlterm">jahaahahjjajajajajjajaja Open Toolkit</dt>

                    <dd class="dd">The jahaahahjjajajajajjajaja-OT transforms jahaahahjjajajajajjajaja XML to different formats for consumption by a
                        customer. jahaahahjjajajajajjajaja will use the jahaahahjjajajajajjajaja-OT to produce PDF, WebHelp, Word, and
                        CHM formats.</dd>

            </dl>

        </section>

    </content>

</article></main></body>
</html>

读取HTML文件并检索元素的Python代码。然后创建一个JSON字符串。

import json
import xml.etree.ElementTree as ET
class Page:
    def __init__(self, type, title, space, body):
        self.type = type
        self.title = title
        self.space = space
        self.body = body

    def getPageTitle(self):
        return self.title

    def getType(self):
        return self.type

    def getContent(self):
        return self.content

    def getJSONObject(self):
        jsonobj = json.dumps(self.__dict__)
        return jsonobj

class childPage(Page):
    def __init__(self, type, title, ancestors, space, body):
        self.type = type
        self.title = title
        self.ancestors = ancestors
        self.space = space
        self.body = body


def getContent(file):

        tree=ET.parse(file)
        root=tree.getroot()
        title2 = findTitle(root)
        body2 = findContent(root)
        print(body2)
        return title2, body2

def findTitle(root):
    for e in root.findall('head'):
        title3 = e.find('title').text
        return title3

def findContent(root):
    for e in root.findall('body'):
        body3 = e.find('main/article/content')
        return ET.tostring(body3).decode("utf-8")

title, value = getContent("test.html")
space = {"key": "TOOL"}
ancestors = [{"id":245}]
body = {"storage":{"value":value, "representation":"storage"}}
pageob = childPage("page", title, ancestors, space, body)
print (pageob.getJSONObject())

此代码有效。但是,当解码字节对象时,名称空间将被剥离并替换为意外字符。

我不是专业开发人员。请原谅代码中的任何错误。 你能帮我解决这个问题吗?预先谢谢你。

1 个答案:

答案 0 :(得分:0)

当我注册名称空间时,问题就消失了。我在这里找到了答案: How to preserve namespaces when parsing xml via ElementTree in Python