Question

以下代码，做了不少事情，但主要是进入网页时间点mk的存档。并在7个月内提取大量链接，之后我清除了我想要的网页链接数据（时间点mk编译来自许多不同网页的新闻）：

                ---------------Test.js file-----------
function openWin() {
var myWindow = window.open("", "MsgWindow", "width=2000,height=200,location=no,notoolbar=no,menubar=no,scrollbars=yes,left=0,top=950");
var text = document.getElementById('notebox');
var image1 = document.getElementById('image1');
var image2 = document.getElementById('image2');
var imlink = document.getElementById('imlink');
var background = document.getElementById('background');
var fontsize = document.getElementById('fontsize');
var fontcolor = document.getElementById('fontcolor');
var fonttype = document.getElementById('fonttype');
var fontspeed = document.getElementById('fontspeed');
    myWindow.document.write("<marquee behavior='scroll' direction='left' scrollamount='"  + fontspeed.value +  "' BGCOLOR='"  + background.value +  "'><h1 style='font-size:"  + fontsize.value +  "px;color:"  + fontcolor.value +  ";font-family:" + fonttype.value + ";'><img src='"  + image1.value +  "' style='width:100px;height:100px;' > "  + text.value +  " <img src='"  + image2.value +  "' style='width:100px;height:100px;' ></h1></marquee>"); 
   myWindow.document.close()
}
function SaveDatFileBro(localstorage) {
   localstorage.root.getFile("Banner.html", {create: true});
}


--------------------------------Banner.html---------------------------------
<!DOCTYPE html>
<html>
<head>
<title>
</title>
</head>
<body>

<script src="Test.js"></script>

<table style="margin:0px auto 0px auto"> 
    <tr>
        <td><input type="button" value=" Open Banner" onclick="openWin();SaveDatFileBro()"/></td>
    <td><input type="text" id="notebox" value="Enter Notification" size="120"/></td>
    </tr>
    <tr>
        <td><input type="button" value=" Save Banner" onclick="SaveDatFileBro()"/></td>
        <td>Font Color:<input type="text" id="fontcolor" value="Red"/> Font Type:<input type="text" id="fonttype" value="Times New Roman"/> Font Size:<input type="text" id="fontsize" value="130" /> Scroll Speed:<input type="text" id="fontspeed" value="25" /></td>
    </tr>
        <tr>
        <td></td>
        <td>Background Color:<input type="text" id="background" value="White" />  Leading Image:<input type="text" id="image1" value="https://vignette2.wikia.nocookie.net/uncyclopedia/images/4/44/White_square.png/revision/latest/scale-to-width-down/200?cb=20061003200043" /> Trailing Image:<input type="text" id="image2" value="https://vignette2.wikia.nocookie.net/uncyclopedia/images/4/44/White_square.png/revision/latest/scale-to-width-down/200?cb=20061003200043" /></td>
    </tr>
</table>


</body>
</html>

问题是，一旦我只检索了必要的链接，一旦我尝试仅从它们中删除文本，R就会返回错误，即：

open.connection错误（x，＆＃34; rb＆＃34;）：无法识别或错误的HTTP内容或传输编码

据我所知，这是因为当我尝试在我的浏览器中打开链接时，它们会被破坏并返回类似这样的内容（几乎！）每个链接： WDR $＆安培）/ YFR

在我的浏览器中刷新链接5-6次后，它会正常加载。

检索到的链接看起来像这样：

[965]＆＃34; http://a1on.mk/wordpress/archives/655118＆＃34; [967]＆＃34; http://a1on.mk/wordpress/archives/654641＆＃34;

我真的不确定这里的问题是什么，我想知道如何告诉R运行提取文本的代码，直到它可以提取它为止。像try和tryCatch这样的函数在这里并没有用处。

在浏览器中打开报废链接时，它显示已损坏，刷新页面后，它可以正常工作

0 个答案: