Question

我目前正在尝试根据字符串中的单词对数据集进行子集化。使用stringr包，我尝试使用str_detect进行子集，如下所示：

subdat <- dat %>% filter(str_detect(de, index$Full[1]))

这产生了第一个＆＃34; Full＆＃34;的子集的正确数据表。 in index是检测到的内容。但是，当相同的代码输入到for循环中时，用＆＃34; i＆＃34;替换索引。要遍历所有名称，子集不再检测到正确的字符串。

for (i in 1){
  subdat <- dat %>% filter(str_detect(de, index$Full[i]))
}

除此之外，每次迭代都会检测到相同的错误子集。在测试＆＃34; i＆＃34;在for循环之外的变量，str_detect中出现同样的问题。运行以下代码时，如果i等于1，则R返回TRUE：

index$Name[i] == index$Full[1]

但是，为以下代码再次返回不同的数据集：

subdati <- dat %>% filter(str_detect(de, index$Full[i]))
subdat1 <- dat %>% filter(str_detect(de, index$Full[1]))

由于我的索引长约70个条目，我希望能够完成for循环以最终为子集写入CSV（这不是编码方面的问题）。我希望这已经足够了，因为这是我第一次询问，如果需要，可以帮助澄清任何事情。

为可重现的示例添加了输出输出：

> dput(droplevels(dat))
structure(list(evt = structure(c(3L, 4L, 1L, 5L, 2L), .Label = c("112", 
"150", "22", "41", "320"), class = "factor"), cl = structure(c(2L, 
1L, 5L, 4L, 3L), .Label = c("08:49", "10:32", "11:21", "10:31", 
"02:28"), class = "factor"), de = c("[BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte)", 
"[BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte)", "[OKC] Westbrook Foul: Shooting (1 PF) (1 FTA) (K Scott)", 
"[SAS] Paul Foul: Personal (2 PF) (B Forte)", "[DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte)"
), i = c(1, 1, 36, 383, 461)), .Names = c("evt", "cl", "de", 
"i"), row.names = c(1L, 4L, 1599L, 16358L, 18269L), class = "data.frame")
> dput(droplevels(index))
structure(list(First = structure(1:2, .Label = c("B", "K"), class = "factor"), 
    Last = structure(1:2, .Label = c("Forte", "Scott"), class = "factor"), 
    Full = c("B Forte", "K Scott")), .Names = c("First", "Last", 
"Full"), row.names = c(1L, 36L), class = "data.frame")

有了这个，我得到了当前的输出：

> subdat <- dat %>% filter(str_detect(de, index$Full[1]))
> subdat
  evt    cl                                                          de   i
1  22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte)   1
2  41 08:49       [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte)   1
3 320 10:31                  [SAS] Paul Foul: Personal (2 PF) (B Forte) 383
4 150 11:21        [DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte) 461

> for (i in 1){
+   subdatloop <- dat %>% filter(str_detect(de, index$Full[i]))
+ }
> subdatloop
  evt    cl                                                          de i
1  22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte) 1
2  41 08:49       [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte) 1

> index$Full[i] == index$Full[1]
[1] TRUE
> subdati <- dat %>% filter(str_detect(de, index$Full[i]))
> subdati
  evt    cl                                                          de i
1  22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte) 1
2  41 08:49       [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte) 1
> subdat1
  evt    cl                                                          de   i
1  22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte)   1
2  41 08:49       [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte)   1
3 320 10:31                  [SAS] Paul Foul: Personal (2 PF) (B Forte) 383
4 150 11:21        [DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte) 461

编辑：添加了可重现的示例和预期的输出。

str_detect里面for循环创建不同于外部相同值的数据集

0 个答案: