当您下载数据时,我正在尝试解码从Facebook获得的JSON。我正在使用Node JS。数据中有许多奇怪的unicode转义,这实际上没有任何意义。示例:
"messages": [
{
"sender_name": "Emily Chadwick",
"timestamp_ms": 1480314292125,
"content": "So sorry that was in my pocket \u00f0\u009f\u0098\u0082\u00f0\u009f\u0098\u0082\u00f0\u009f\u0098\u0082",
"type": "Generic"
}
]
应将其解码为So sorry that was in my pocket ???
。使用fs.readFileSync(filename, "utf8")
可以代替我So sorry that was in my pocket ððð
,这就是mojibake。
This question提到它已经搞砸了latin1
编码,您可以编码为latin1
然后解码为utf8
。我尝试用以下方法做到这一点:
import iconv from 'iconv-lite';
function readFileSync_fixed(filename) {
var content = fs.readFileSync(filename, "binary");
return iconv.decode(iconv.encode(content, "latin1"), "utf-8")
}
console.log(JSON.parse(readFileSync_fixed(filename)))
但是我仍然得到mojibake版本。谁能指出我正确的方向?我不熟悉iconv在这方面的工作方式。
答案 0 :(得分:1)
为此,有一个非常简单的解决方案
首先安装<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Image Gallery</title>
<link rel="stylesheet" type="text/css" href="styles.css" />
<link href="../jquery-ui/jquery-ui.min.css" rel="stylesheet" type="text/css" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="../jquery-ui/jquery-ui.min.js" type="text/javascript"></script>
</head>
<body>
<!-- The four columns -->
<div class="row">
<div class="column">
<img src="https://pmcvariety.files.wordpress.com/2019/12/baby-yoda-plush-toy-mattel-the-mandalorian.png?w=1000&h=563&crop=1" alt="Nature" style="width:100%">
</div>
<div class="column">
<img src="https://i.kinja-img.com/gawker-media/image/upload/t_original/oicrsr3wwqi6u3buvvxx.jpg" alt="Snow" style="width:100%">
</div>
<div class="column">
<img src="https://images2.minutemediacdn.com/image/upload/c_crop,h_1224,w_2177,x_80,y_0/f_auto,q_auto,w_1100/v1574876645/shape/mentalfloss/609512-disney_0.jpg" alt="Mountains" style="width:100%">
</div>
<div class="column">
<img src="https://static1.srcdn.com/wordpress/wp-content/uploads/2019/12/Baby-Yoda-in-The-Mandalorian-Chapter-4.jpg" alt="Lights" style="width:100%">
</div>
</div>
<div class="container">
<span onclick="this.parentElement.style.display='none'" class="closebtn">×</span>
<img id="expandedImg" style="width:100%" />
</div>
</body>
</html>
软件包
utf8
。
您的代码将如下所示
npm i utf8
答案 1 :(得分:0)
在某种程度上解决了...。如果有更好的方法,请告诉我。
所以,这是修改后的功能
readFacebookJson(filename) {
var content = fs.readFileSync(filename, "utf8");
const json = JSON.parse(converted)
return json
}
fixEncoding(string) {
return iconv.decode(iconv.encode(string, "latin1"), "utf8")
}
不是readFileSync()
搞砸了,而是JSON.parse()
。所以-我们像往常一样以utf8格式读取文件,但是,然后需要对字符串进行latin1编码/解码,这些字符串现在是JSON文件的属性,而不是在解析之前的整个JSON文件。我是用map()
做的。
messages = readFacebookJson(filename).messages.map(message => {
const toReturn = message;
toReturn.sender_name = fixEncoding(toReturn.sender_name)
if (typeof message.content !== "undefined") {
toReturn.content = fixEncoding(message.content)
}
return toReturn;
}),
这里的问题当然是某些属性可能会丢失。因此,请确保您知道哪些属性包含哪些内容。