直接从源数据中提取的字符串似乎与源数据中的字符串不匹配

时间:2018-01-22 18:11:16

标签: r string string-matching

我有一个字符串无法评估为与自身匹配。我试图根据列中的8个可能值之一做一个简单的子集,

out <- df[df$`Var name` == "string",] 

我用不同的字符串多次工作但是由于某种原因这个字符串失败了。我试图从源头使用以下四个途径获得确切的字符串(认为可能存在一些字符编码问题),但没有成功。即使我对一个我知道包含该字符串的单元格进行显式调用,并将其复制到评估语句中也会失败

> df[i,j]
[1] "string"
df[i,j]=="string"  # pasted from above line

我不明白我是如何明确地粘贴我刚刚给出的输出而且它不匹配。

## attempts to get exact string to paste into subset statement    
# from dput 
"IF APPLICABLE – Which of the following best characterizes the expectations with"

# from calling a specific row/col (df[i, j])
[1] "IF APPLICABLE – Which of the following best characterizes the expectations with"

# from the source pane of rstudio
IF APPLICABLE – Which of the following best characterizes the expectations with

# from the source excel file
IF APPLICABLE – Which of the following best characterizes the expectations with

我不知道这里会发生什么。我明确地从数据中直接绘制字符串,但它仍然无法评估为真。背景中是否有一些我没有看到的东西?我忽略了一些荒谬简单的东西吗?

编辑:

我基于另一种方式进行子集化,下面是我正在做的一个输入和实际示例:

> dput(temp)
structure(list(`Item Stem` = "IF APPLICABLE – Which of the following best characterizes the expectations with", 
    `Item Response` = "It was required.", orgchar_group = "locale", 
    `Org Characteristic` = "Rural", N = 487, percent = 34.5145287030475, 
    `Graphs note` = NA_character_, `Report note` = NA_character_, 
    `Other note` = NA_character_, subsig = 1, overall = 0, varname = NA_character_, 
    statsig = NA_real_, use = NA_real_, difference = 9.16044821292665), .Names = c("Item Stem", 
"Item Response", "orgchar_group", "Org Characteristic", "N", 
"percent", "Graphs note", "Report note", "Other note", "subsig", 
"overall", "varname", "statsig", "use", "difference"), row.names = 288L, class = "data.frame")
> temp[1,1]
[1] "IF APPLICABLE – Which of the following best characterizes the expectations with"
> temp[1,1] == "IF APPLICABLE – Which of the following best characterizes the expectations with"
[1] FALSE

1 个答案:

答案 0 :(得分:0)

事实证明,这实际上是一个不可打印的角色,对评论者说出了帮助我弄清楚1)建议它和2)表明它对他们有效。

我能够使用来自here& here)和here的见解来解决这个问题。

我使用# working in the temp df > x <- temp[1,1] > grepl("[^ -~]", x) [1] TRUE > stringi::stri_enc_mark(x) [1] "UTF-8" > iconv(x, "UTF-8", "ASCII", sub="") [1] "IF APPLICABLE Which of the following best characterizes the expectations with" # set x as df$`Var name` and reassign it to fix df$`Var name` <- iconv(df$`Var name`, "UTF-8", "ASCII", sub="") 命令(来自@Tyler Rinker)确定我的字符串中确实存在非ASCII字符,并使用stringi命令(来自@hadley)来确定是什么类型。然后我使用@Josh O&#39; Brien的基本解决方案将其删除。事实证明这是heiphen。

if ( is_shop() ) {
    echo '<div>Choose a category below</div>';
}

仍然不太了解它为什么会发生但现在已经解决了。