我正在从HDFS的mapper代码(R代码)中逐行阅读一些文本。文字如下:
input <- file("stdin", "r")
while(length(line <- readLines(input, n=1, warn=FALSE)) > 0)
{
}
close(input)
用于读取循环的代码是:
^A
在上面的文字中,我有\N
作为我的字段分隔符,^A
存在一些空白(R&#39; s NA)。我能够使用\001
分隔\N
(不知道它是如何工作的?)。但我在替换\\N
时遇到了问题。我尝试过以下建议:
remove all line breaks (enter symbols) from the string using R还有一些;但没有任何作用。我也试过"15059773" "3872" NA "2015-09-05" NA "2015-09-01" "3" "0" "0" NA "shirts adult male" "xl" NA "5183656" "c1 13 me ult tee c" "blue" "watersport blue" NA NA NA "0" NA "3" "mn" "45.05273" "-93.365555" "100" "131" "27.0" "13.0" "8.0" "85.0" "57.0" "21.0" "1012.0" "0" "0" "1" "0" "43" "3" "4"
,但也没有工作。
当我逐行处理这个时,我对第一行的预期输出是:
const string AutorunRegistryKey = @"HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run";
Registry.SetValue(AutorunRegistryKey, <AppName>, <PathToApplication>);
答案 0 :(得分:1)
这似乎有效:
ifelse(strsplit(string, "\\^A")[[1]] == "\\N", NA, strsplit(string, "\\^A")[[1]])
[1] "15059773" "3872" NA "2015-09-05" NA
[6] "2015-09-01" "3" "0" "0" NA
[11] "shirts adult male" "xl" NA "5183656" "c1 13 me ult tee c"
[16] "blue" "watersport blue" NA NA NA
[21] "0" NA "3" "mn" "45.05273"
[26] "-93.365555" "100" "131" "27.0" "13.0"
[31] "8.0" "85.0" "57.0" "21.0" "1012.0"
[36] "0" "0" "1" "0" "43"
[41] "3" "4"
数据强>:
cat(string)
15059773^A3872^A\N^A2015-09-05^A\N^A2015-09-01^A3^A0^A0^A\N^Ashirts adult male^Axl^A\N^A5183656^Ac1 13 me ult tee c^Ablue^Awatersport blue^A\N^A\N^A\N^A0^A\N^A3^Amn^A45.05273^A-93.365555^A100^A131^A27.0^A13.0^A8.0^A85.0^A57.0^A21.0^A1012.0^A0^A0^A1^A0^A43^A3^A4