遍历每行并比较要迭代的行中多列的值

时间:2018-06-20 13:48:47

标签: r

我有以下数据框function truncateString($string, $start, $limit){ $stripped_string =strip_tags($string); // if there are HTML or PHP tags $string_array =explode(' ',$stripped_string); $truncated_array = array_splice($string_array,$start,$limit); $truncated_string=implode(' ',$truncated_array); return $truncated_string; }

yearly

我想遍历每一行,然后找到该列,其后三列为0。我想要得到这样的内容,它表示至少三个月内没有0的月份:

ID   Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC   0  0    0     1    0   0    0   0  1     0   0  0
DEF   0  0    0     1    1   0    0   0  1     0   0  0
GHI   0  0    0     1    0   1    0   0  0     1   0  0
MNO   0  0    0     1    0   1    0   0  1     0   0  0
QAL   0  1    1     1    0   0    1   0  0    1   0  0

我已经弄清楚了如何遍历向量并获得索引

ID    col1    col2 
ABC   April   Sept  
DEF   May     Sept 
GHI   Jun      N/A
MNO   Sept    N/A
QAL   N/A     N/A

但是我发现将它链接到原始数据框并获取列有点困难。有什么功能或资源可以指导我吗?

3 个答案:

答案 0 :(得分:2)

由于每行答案的数量是可变的,所以我选择一个列表。此方法使用rle查找零的游程,然后检查该游程中是否有2个以上。然后,它返回这些运行之前的月份名称。

# Data
df <- read.table(text = "ID   Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC   0  0    0     1    0   0    0   0  1     0   0  0
           DEF   0  0    0     1    1   0    0   0  1     0   0  0
           GHI   0  0    0     1    0   1    0   0  0     1   0  0
           MNO   0  0    0     1    0   1    0   0  1     0   0  0
           QAL   0  1    1     1    0   0    1   0  0    1   0  0",
           header = TRUE)

# Repackage as list (rows become elements of list)
df_list <- setNames(split(df[, -1], seq(nrow(df))), rownames(df$ID))

# Count function
morpheus_count <- function(x){
  #Run Length Encoding
  tmp <- rle(x)

  # Return months preceding a run of three (or greater) zeroes
  names(tmp$values)[which(tmp$values==0 & tmp$lengths>2)-1]
}

# Run on list
lapply(df_list, morpheus_count)

结果:

# [[1]]
# [1] "April" "Sept" 
# 
# [[2]]
# [1] "May"  "Sept"
# 
# [[3]]
# [1] "Jun"
# 
# [[4]]
# [1] "Sept"
# 
# [[5]]
# character(0)

答案 1 :(得分:2)

有多种解决方法:

字符串匹配

这种方法使用字符串匹配,因此依赖于字符长度为1的值:

review
type ValuePrism tag a = Prism (Value tag) (Value tag) a a

可以根据OP的要求将其调整为宽格式:

library(data.table)
library(magrittr)

yearly[, 
       {
         Reduce(paste0, .SD) %>% 
           stringr::str_locate_all("1000") %>% 
           as.data.table()
       }, 
       .SDcols = -"ID", by = "ID"][
         , .(ID, month = names(yearly)[start + 1L])]
    ID month
1: ABC April
2: ABC  Sept
3: DEF   May
4: DEF  Sept
5: GHI   Jun
6: MNO  Sept

以宽格式在滚动窗口中加入列

此方法有点类似于字符串匹配方法。它通过四个连续列的内部联接来查找匹配项,这些内部联接在滚动窗口中跨yearly[, { Reduce(paste0, .SD) %>% stringr::str_locate_all("1000") %>% as.data.table() }, .SDcols = -"ID", by = "ID"][ , .(ID, month = names(yearly)[start + 1L])][ , dcast(.SD, ID ~ rowid(ID, prefix = "col"))][ yearly[, ID], on = "ID"] 的列移动,即,它尝试在列 ID col1 col2 1: ABC April Sept 2: DEF May Sept 3: GHI Jun <NA> 4: MNO Sept <NA> 5: QAL <NA> <NA> 中然后在列{{1}中查找匹配项},依此类推,最后进入yearly列。

Jan, Feb, March, April
Feb, March, April, May

数据

Sept, Oct, Nov, Dec

答案 2 :(得分:1)

数据:

df<-data.table::fread("
ID   Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC   0  0    0     1    0   0    0   0  1     0   0  0
DEF   0  0    0     1    1   0    0   0  1     0   0  0
GHI   0  0    0     1    0   1    0   0  0     1   0  0
MNO   0  0    0     1    0   1    0   0  1     0   0  0
QAL   0  1    1     1    0   0    1   0  0     1   0  0") %>% setDF

代码:

library(magrittr)
rowNames <- df[,1,drop=T]
months   <- names(df[,-1])
fun1<-function(x) {
    n      <- 3 #at least 3 zeros (change if needed)
    pos    <- c(-1,cumsum(x)) %>% diff %>% as.logical %>% which
    counts <- table(cumsum(x)) %>% as.numeric %>% {. > n & as.logical(x[pos])}
    return(months[pos[counts]])
}

res <- apply(df[,-1],1,fun1)
names(res) <- rowNames

结果:

$ABC
[1] "April" "Sept" 

$DEF
[1] "May"  "Sept"

$GHI
[1] "Jun"

$MNO
[1] "Sept"

$QAL
character(0)

请注意:

  • 确保数据的类型为data.frame
  • 确保仅将fun1应用于0,1数据。这就是调用df[,-1]的原因。
  • 您可以将n内的fun1更改为其他条件。