我有以下数据框function truncateString($string, $start, $limit){
$stripped_string =strip_tags($string); // if there are HTML or PHP tags
$string_array =explode(' ',$stripped_string);
$truncated_array = array_splice($string_array,$start,$limit);
$truncated_string=implode(' ',$truncated_array);
return $truncated_string;
}
yearly
我想遍历每一行,然后找到该列,其后三列为0。我想要得到这样的内容,它表示至少三个月内没有0的月份:
ID Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC 0 0 0 1 0 0 0 0 1 0 0 0
DEF 0 0 0 1 1 0 0 0 1 0 0 0
GHI 0 0 0 1 0 1 0 0 0 1 0 0
MNO 0 0 0 1 0 1 0 0 1 0 0 0
QAL 0 1 1 1 0 0 1 0 0 1 0 0
我已经弄清楚了如何遍历向量并获得索引
ID col1 col2
ABC April Sept
DEF May Sept
GHI Jun N/A
MNO Sept N/A
QAL N/A N/A
但是我发现将它链接到原始数据框并获取列有点困难。有什么功能或资源可以指导我吗?
答案 0 :(得分:2)
由于每行答案的数量是可变的,所以我选择一个列表。此方法使用rle
查找零的游程,然后检查该游程中是否有2个以上。然后,它返回这些运行之前的月份名称。
# Data
df <- read.table(text = "ID Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC 0 0 0 1 0 0 0 0 1 0 0 0
DEF 0 0 0 1 1 0 0 0 1 0 0 0
GHI 0 0 0 1 0 1 0 0 0 1 0 0
MNO 0 0 0 1 0 1 0 0 1 0 0 0
QAL 0 1 1 1 0 0 1 0 0 1 0 0",
header = TRUE)
# Repackage as list (rows become elements of list)
df_list <- setNames(split(df[, -1], seq(nrow(df))), rownames(df$ID))
# Count function
morpheus_count <- function(x){
#Run Length Encoding
tmp <- rle(x)
# Return months preceding a run of three (or greater) zeroes
names(tmp$values)[which(tmp$values==0 & tmp$lengths>2)-1]
}
# Run on list
lapply(df_list, morpheus_count)
结果:
# [[1]]
# [1] "April" "Sept"
#
# [[2]]
# [1] "May" "Sept"
#
# [[3]]
# [1] "Jun"
#
# [[4]]
# [1] "Sept"
#
# [[5]]
# character(0)
答案 1 :(得分:2)
有多种解决方法:
这种方法使用字符串匹配,因此依赖于字符长度为1的值:
review
type ValuePrism tag a = Prism (Value tag) (Value tag) a a
可以根据OP的要求将其调整为宽格式:
library(data.table)
library(magrittr)
yearly[,
{
Reduce(paste0, .SD) %>%
stringr::str_locate_all("1000") %>%
as.data.table()
},
.SDcols = -"ID", by = "ID"][
, .(ID, month = names(yearly)[start + 1L])]
ID month 1: ABC April 2: ABC Sept 3: DEF May 4: DEF Sept 5: GHI Jun 6: MNO Sept
此方法有点类似于字符串匹配方法。它通过四个连续列的内部联接来查找匹配项,这些内部联接在滚动窗口中跨yearly[,
{
Reduce(paste0, .SD) %>%
stringr::str_locate_all("1000") %>%
as.data.table()
},
.SDcols = -"ID", by = "ID"][
, .(ID, month = names(yearly)[start + 1L])][
, dcast(.SD, ID ~ rowid(ID, prefix = "col"))][
yearly[, ID], on = "ID"]
的列移动,即,它尝试在列 ID col1 col2
1: ABC April Sept
2: DEF May Sept
3: GHI Jun <NA>
4: MNO Sept <NA>
5: QAL <NA> <NA>
中然后在列{{1}中查找匹配项},依此类推,最后进入yearly
列。
Jan, Feb, March, April
Feb, March, April, May
Sept, Oct, Nov, Dec
答案 2 :(得分:1)
数据:
df<-data.table::fread("
ID Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC 0 0 0 1 0 0 0 0 1 0 0 0
DEF 0 0 0 1 1 0 0 0 1 0 0 0
GHI 0 0 0 1 0 1 0 0 0 1 0 0
MNO 0 0 0 1 0 1 0 0 1 0 0 0
QAL 0 1 1 1 0 0 1 0 0 1 0 0") %>% setDF
代码:
library(magrittr)
rowNames <- df[,1,drop=T]
months <- names(df[,-1])
fun1<-function(x) {
n <- 3 #at least 3 zeros (change if needed)
pos <- c(-1,cumsum(x)) %>% diff %>% as.logical %>% which
counts <- table(cumsum(x)) %>% as.numeric %>% {. > n & as.logical(x[pos])}
return(months[pos[counts]])
}
res <- apply(df[,-1],1,fun1)
names(res) <- rowNames
结果:
$ABC
[1] "April" "Sept"
$DEF
[1] "May" "Sept"
$GHI
[1] "Jun"
$MNO
[1] "Sept"
$QAL
character(0)
请注意:
data.frame
。fun1
应用于0,1
数据。这就是调用df[,-1]
的原因。n
内的fun1
更改为其他条件。