使用Java将RSS Feed XML转换为JSON是显示特殊字符

时间:2017-10-09 22:46:37

标签: java json xml jackson rss

创建一个基于Spring MVC的Restful Controller,它接受一个硬编码的RSS HTTP URL并将其从XML转换为JSON:

RssFeedController:

import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

import org.apache.commons.io.IOUtils;
import org.apache.log4j.Logger;
import org.json.JSONObject;
import org.json.XML;

import com.fasterxml.jackson.databind.ObjectMapper;

@RestController
public class RssFeedController {

    private HttpHeaders headers = null;

    public RssFeedController() {
        headers = new HttpHeaders();
        headers.add("Content-Type", "application/json");
    }

    @RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json")
    public String getRssFeedAsJson() throws IOException {
        InputStream xml = getInputStreamForURLData("http://www.samplefeed.com/feed");
        String xmlString = IOUtils.toString(xml);
        JSONObject jsonObject = XML.toJSONObject(xmlString);
        ObjectMapper objectMapper = new ObjectMapper();
        Object json = objectMapper.readValue(jsonObject.toString(), Object.class);
        String response = objectMapper.writeValueAsString(json);
        return response;
    }

    public static InputStream getInputStreamForURLData(String targetUrl) {
        URL url = null;
        HttpURLConnection httpConnection = null;
        InputStream content = null;

        try {
            url = new URL(targetUrl);
            URLConnection conn = url.openConnection();
            conn.setRequestProperty("User-Agent", "Mozilla/5.0");
            httpConnection = (HttpURLConnection) conn;
            int responseCode = httpConnection.getResponseCode();
            content = (InputStream) httpConnection.getInputStream();
        } 
        catch (MalformedURLException e) {
            e.printStackTrace();
        } 
        catch (IOException e) {
            e.printStackTrace();
        }
        return content;
    }

的pom.xml

<dependency>
        <groupId>org.json</groupId>
        <artifactId>json</artifactId>
        <version>20170516</version>
    </dependency>

    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.5</version>
    </dependency>

因此,原始RSS Feed具有以下内容:

<item>
    <title>October Fest Weekend</title>
    <link>http://www.samplefeed.com/feed/OctoberFestWeekend</link>
    <comments>http://www.samplefeed.com/feed/OctoberFestWeekend/#comments</comments>
    <pubDate>Wed, 04 Oct 2017 17:08:48 +0000</pubDate>
    <dc:creator><![CDATA[John Doe]]></dc:creator>
            <category><![CDATA[Uncategorized]]></category>

    <guid isPermaLink="false">http://www.samplefeed.com/feed/?p=9227</guid>
    <description><![CDATA[<p>
</p>
<p>Doors Open:6:30pm<br />
Show Begins:  7:30pm<br />
Show Ends (Estimated time): 11:00pm<br />
Location: Staples Center</p>
<p>Directions</p>
<p>Map of ...</p>
<p>The post <a rel="nofollow" href="http://www.samplefeed.com/feed/OctoberFestWeekend/">OctoberFest Weekend</a> appeared first on <a rel="nofollow" href="http://www.samplefeed.com">SampleFeed</a>.</p>
]]></description>

这样渲染成JSON:

{
    "guid": {
        "content": "http://www.samplefeed.com/feed/?p=9227",
        "isPermaLink": false
    },
    "pubDate": "Wed, 04 Oct 2017 17:08:48 +0000",
    "category": "Uncategorized",
    "title": "October Fest Weekend",
    "description": "<p>\n??</p>\n<p>Doors Open:6:30pm<br />\nShow Begins:?? 7:30pm<br />\nShow Ends (Estimated time):??11:00pm<br />\nLocation: Staples Center</p>\n<p>Directions</p>\n<p>Map of ...</p>\n<p>The post <a rel=\"nofollow\" href=\"http://www.samplefeed.com/feed/OctoberFestWeekend/\">OctoberFest Weekend</a> appeared first on <a rel=\"nofollow\" href=\"http://www.samplefeed.com\">Sample Feed</a>.</p>\n",
    "dc:creator": "John Doe",
    "link": "http://www.samplefeed.com/feed/OctoberFestWeekend",
    "comments": "http://www.samplefeed.com/feed/OctoberFestWeekend/#comments"
}

请注意,在呈现的JSON中,在“description”键的值内部后面有两个问号(“??”),如下所示:

"description": "<p>\n??</p>\n

此外,在Show Begins之后还有两个问号:

<br />\nShow Begins:??

也在晚上11点之前

Show Ends (Estimated time):??11:00pm<br />

这不是唯一一个显示特殊字符的模式,还有哪里有三个???标记生成,还有一些地方,如?????

e.g。

<title>Today’s 20th Annual Karaoke</title>

在JSON中呈现如下:

"title": "Today???s 20th Annual Karaoke"

并且

<content-encoded>: <![CDATA[(Monte Vista High School, NY.).  </span></p>]]></content:encoded>

在JSON中以这样的方式呈现:

"content:encoded":  "(Monte Vista High School, NY.).????</span></p>

有些地方的XML就像破折号(“ - ”):

<strong>Welcome</strong> – Welcome to the Party!

以JSON呈现:

<strong>Welcome</strong>????? Welcome to the Party!

有谁知道如何在我的代码中设置正确的编码,以便我可以避免这些错误/特殊字符呈现问题?

2 个答案:

答案 0 :(得分:0)

  

使用Java将RSS Feed XML转换为JSON显示特殊   字符

在逐行检查您的代码后,我得到了解决方案,我正在为您更新我的答案 特殊字符的问题响应为

如果您更新此代码行

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json")

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8")

您需要使用json指定生成param值的 UTF-8 字符集编码。我很抱歉我之前的误解答案,但是我现在更新了。

答案 1 :(得分:0)

摆脱这样的未知字符(???):

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8")
public String getRssFeedAsJson() throws IOException, IllegalArgumentException {
    String xmlString = readUrlToString("http://www.sample.com/feed");
    JSONObject xmlJSONObj = XML.toJSONObject(xmlString);
    byte[] ptext = xmlJSONObj.toString().getBytes(ISO_8859_1); 
    String jsonResponse = new String(ptext, UTF_8); 
    return jsonResponse;
}

public static String readUrlToString(String url) {
    BufferedReader reader = null;
    String result = null;
    String retValue = null;
    try {
        URL u = new URL(url);
        HttpURLConnection conn = (HttpURLConnection) u.openConnection();
        conn.setRequestProperty("User-Agent", "Mozilla/5.0");
        conn.setRequestMethod("GET");
        conn.setDoOutput(true);
        conn.setReadTimeout(2 * 1000);
        conn.connect();
        reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
        StringBuilder builder = new StringBuilder();
        String line;
        while ((line = reader.readLine()) != null) {
            builder.append(line).append("\n");
        }
        result = builder.toString();
        retValue = result.replaceAll("[^\\x00-\\x7F]", "");
    } 
    catch (IOException e) {
        e.printStackTrace();
    } 
    finally {
        if (reader != null) {
            try {
                reader.close();
            } 
            catch (IOException ignoreOnClose) {
            }
        }
    }
    return retValue;
}

令人沮丧的是除了SamDev之外没有人试图帮助......