我有类似的东西:
ISBN Date Quantity
3457 2004 10
3457 2004 6
3457 2004 10
3457 2005 7
3457 2005 12
9885 2013 10
9885 2013 6
9855 2013 10
9885 2014 7
9885 2014 12
我想得到:
ISBN Date Quantity Year
3457 2004 10 1st Year
3457 2004 6 1st Year
3457 2004 10 1st Year
3457 2005 7 2nd Year
3457 2005 12 2nd Year
9885 2013 10 1st Year
9885 2013 6 1st Year
9855 2013 10 1st Year
9885 2014 7 2nd Year
9885 2014 12 2nd Year
我有这段代码:
df<-df %>% group_by(ISBN) %>% mutate(Year = ifelse(DateYear > DateYear,"1st Year","2nd Year"))
但我到处都只有“第二年”,所以我想ifelse
中的比较实际上并不比较“日期”栏中的行。我想我必须使用for循环,但是我认为它在R中是另一种方式。我怎么能得到我需要的结果?
答案 0 :(得分:1)
根据评论中提到的,如果您有更多案例,您可以这样做:
library(dplyr)
library(toOrdinal)
df %>%
group_by(ISBN) %>%
mutate(Year = paste(sapply(cumsum(Date != lag(Date, default = 0)), toOrdinal), "Year"))
例如:
# ISBN Date Quantity
#1 3457 2004 10
#2 3457 2004 6
#3 3457 2005 10
#4 3457 2006 7
#5 3457 2007 12
#6 9885 2013 10
#7 9885 2014 6
#8 9855 2015 10
#9 9885 2015 7
#10 9885 2016 12
给出:
#Source: local data frame [10 x 4]
#Groups: ISBN [3]
#
# ISBN Date Quantity Year
# <int> <int> <int> <chr>
#1 3457 2004 10 1st Year
#2 3457 2004 6 1st Year
#3 3457 2005 10 2nd Year
#4 3457 2006 7 3rd Year
#5 3457 2007 12 4th Year
#6 9885 2013 10 1st Year
#7 9885 2014 6 2nd Year
#8 9855 2015 10 1st Year
#9 9885 2015 7 3rd Year
#10 9885 2016 12 4th Year
答案 1 :(得分:0)
library(dplyr)
library(readr)
df_foo = read.table(textConnection("ISBN Date Quantity
3457 2004 10
3457 2004 6
3457 2004 10
3457 2005 7
3457 2005 12
9885 2013 10
9885 2013 6
9855 2013 10
9885 2014 7
9885 2014 12"), header = TRUE, stringsAsFactors = FALSE)
df_foo %>%
group_by(ISBN) %>%
arrange(Date) %>%
mutate(
ifelse(
cumsum(Date != lag(Date, default = first(Date))),
"2nd Year", "1st Year"
)
)
答案 2 :(得分:0)
为了完整起见,并且因为我个人更喜欢这样的解决方案,这里只使用基础R,依靠lapply
和# examples data (note possible error on line 8, ISBN==9855)
dat0 <- read.table(text="ISBN Date Quantity
3457 2004 10
3457 2004 6
3457 2004 10
3457 2005 7
3457 2005 12
9885 2013 10
9885 2013 6
9855 2013 10
9885 2014 7
9885 2014 12", header=T)
# treat separately (loop using 'lapply')
datlist <- split(dat,dat$ISBN)
datlist <- lapply(datlist,
function(x) within(x, Year <- as.numeric(as.factor(Date))))
# bind together
dat <- do.call(rbind, datlist)
rownames(dat) <- NULL
来实现结果。实际上,它会循环使用不同的ISBN值。
# ISBN Date Quantity Year
# 1 3457 2004 10 1
# 2 3457 2004 6 1
# 3 3457 2004 10 1
# 4 3457 2005 7 2
# 5 3457 2005 12 2
# 6 9855 2013 10 1
# 7 9885 2013 10 1
# 8 9885 2013 6 1
# 9 9885 2014 7 2
# 10 9885 2014 12 2
输出:
Year
请注意,此方法会重新排列数据集,使行按照ISBN排序。另外,我并没有用1st Year, 2nd Year, ...
等来对1, 2, ...
列进行编码,因为除了像<li><a ui-sref="about">about</a></li>
之类的更简单的格式之外,我还没有真正看到它的价值。