我有一个关于从R中的列表中删除非字母数字字符的问题。我有一个列表将列出各种奇怪的字符,空格等,并希望删除它们。我通常能够使用r中的tm包删除我想要的内容。我摆弄它,但无处可去,所以回到列表可能是开始的地方。
清单:
list("\n \n", "\n\n ", "\n ", " ", "\n ",
"\n \n ", "\n ", "Home", "\n", "Expertise",
"Question & Research Design", "\n", "Survey Development & Validation",
"\n", "Data Processing", "\n", "Statistical Analysis", "\n",
"Publications & Grants", "\n", "Evaluation", "\n", "\n",
"Consulting Areas", "Business", "\n", "Education", "K-12",
"\n", "Â ", " Â Â Â Â", " | ")
预期输出
[1] "" "" ""
[4] "" "" ""
[7] "" "Home" ""
[10] "Expertise" "Question Research Design" ""
[13] "Survey Development Validation" "" "Data Processing"
[16] "" "Statistical Analysis" ""
[19] "Publications Grants" "" "Evaluation"
[22] "" "" "Consulting Areas"
[25] "Business" "" "Education"
[28] "K12" "" ""
[31] "" ""
答案 0 :(得分:4)
强烈建议您只使用
gsub("[^a-zA-Z0-9]","",x)
其中x是列表的名称。
你可能在列表的末尾包含了外来字符,因为你也想要这些删除 - 好吧,上面的命令实现了这一点。简单解释一下,命令中的方括号定义符号集合,^符号表示“not”,因此所有不在指定集合中的字符为62个字符(小写a到z,大写A到Z,和数字0到9)将被空字符串“”(即销毁)替换。
这是输出......
[1] "" "" ""
[4] "" "" ""
[7] "" "Home" ""
[10] "Expertise" "QuestionResearchDesign" ""
[13] "SurveyDevelopmentValidation" "" "DataProcessing"
[16] "" "StatisticalAnalysis" ""
[19] "PublicationsGrants" "" "Evaluation"
[22] "" "" "ConsultingAreas"
[25] "Business" "" "Education"
[28] "K12" "" ""
[31] "" ""
答案 1 :(得分:0)
我不确定这是否会删除您想要移除的所有内容......但是?regexp
描述了您可以使用的各种有趣的广泛类。对于你所描述的内容,我认为你想要:
gsub('[[:space:]|[:punct:]]+', '', yourlist)
给出了:
[1] "" "" "" ""
[5] "" "" "" "Home"
[9] "" "Expertise" "QuestionResearchDesign" ""
[13] "SurveyDevelopmentValidation" "" "DataProcessing" ""
[17] "StatisticalAnalysis" "" "PublicationsGrants" ""
[21] "Evaluation" "" "" "ConsultingAreas"
[25] "Business" "" "Education" "K12"
[29] "" "Â" "ÂÂÂÂ" ""