我必须使用哪种文件编码才能在R脚本中正确保存此向量(Matching complex URLs within text blocks (R))?特殊人物和中国标志似乎使事情变得复杂。
x <- c("http://foo.com/blah_blah",
"http://foo.com/blah_blah/",
"(Something like http://foo.com/blah_blah)",
"http://foo.com/blah_blah_(wikipedia)",
"http://foo.com/more_(than)_one_(parens)",
"(Something like http://foo.com/blah_blah_(wikipedia))",
"http://foo.com/blah_(wikipedia)#cite-1",
"http://foo.com/blah_(wikipedia)_blah#cite-1",
"http://foo.com/unicode_(✪)_in_parens",
"http://foo.com/(something)?after=parens",
"http://foo.com/blah_blah.",
"http://foo.com/blah_blah/.",
"<http://foo.com/blah_blah>",
"<http://foo.com/blah_blah/>",
"http://foo.com/blah_blah,",
"http://www.extinguishedscholar.com/wpglob/?p=364.",
"http://✪df.ws/1234",
"rdar://1234",
"rdar:/1234",
"x-yojimbo-item://6303E4C1-6A6E-45A6-AB9D-3A908F59AE0E",
"message://%3c330e7f840905021726r6a4ba78dkf1fd71420c1bf6ff@mail.gmail.com%3e",
"http://➡.ws/䨹",
"www.c.ws/䨹",
"<tag>http://example.com</tag>",
"Just a www.example.com link.",
"http://example.com/something?with,commas,in,url, but not at end",
"What about <mailto:gruber@daringfireball.net?subject=TEST> (including brokets).",
"mailto:name@example.com",
"bit.ly/foo",
"“is.gd/foo/”",
"WWW.EXAMPLE.COM",
"http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))/Web_ENG/View_DetailPhoto.aspx?PicId=752",
"http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))",
"http://lcweb2.loc.gov/cgi-bin/query/h?pp/horyd:@field(NUMBER+@band(thc+5a46634))")
我感谢任何帮助。
答案 0 :(得分:0)
运行你的例子,
source('file.R', encoding="unknown")
工作正常并保存为R对象并重新加载:
save(x, file='kk.Rd')
load('kk.Rd')
您可以使用iconvlist()
获取所有不同的编码并对其进行全部测试,例如:
vals <- lapply(iconvlist(), function(x)
tryCatch(source('file.R', encoding=x),
error=function(e)return(NULL)))
以file.R
为您的脚本,然后
iconvlist()[which(!sapply(vals, function(x)is.null(x)))]
为您提供加载时未抛出任何错误的所有编码。
这有帮助吗?