附上我的数据集。我想创建一个具有每个唯一四分之一(QUART)标识符长度的新列。换句话说,对于每一行,我想创建一个新值,该值具有相应QUART出现在数据集中的次数
因此第1行应该有一个新的列,其值为" 4"因为1992.2发生了4次。
我的数据结构看起来像"
ID QUART Trasaction New Column (I want)
1 1992.2 Company 1 4
2 1992.2 Company 2 4
3 1992.2 Company 3 4
4 1992.2 Company 4 4
5 1992.3 Company 5 1
6 1992.4 Company 6 1
7 1993.1 Company 7 1
由于
答案 0 :(得分:1)
您可以dplyr::group_by
与n()
一起使用来计算每QUART
个相同条目的数量:
library(tidyverse);
df %>%
group_by(QUART) %>%
mutate(count = n());
## A tibble: 7 x 4
## Groups: QUART [4]
# ID QUART Trasaction count
# <int> <dbl> <fct> <int>
#1 1 1992. Company 1 4
#2 2 1992. Company 2 4
#3 3 1992. Company 3 4
#4 4 1992. Company 4 4
#5 5 1992. Company 5 1
#6 6 1992. Company 6 1
#7 7 1993. Company 7 1
df <- read.table(text =
"ID QUART Trasaction
1 1992.2 'Company 1'
2 1992.2 'Company 2'
3 1992.2 'Company 3'
4 1992.2 'Company 4'
5 1992.3 'Company 5'
6 1992.4 'Company 6'
7 1993.1 'Company 7'", header = T)
答案 1 :(得分:0)
使用mapply
as:
df$count <- mapply(function(x)sum(df$QUART == x), df$QUART)
# ID QUART Trasaction count
# 1 1 1992.2 Company 1 4
# 2 2 1992.2 Company 2 4
# 3 3 1992.2 Company 3 4
# 4 4 1992.2 Company 4 4
# 5 5 1992.3 Company 5 1
# 6 6 1992.4 Company 6 1
# 7 7 1993.1 Company 7 1
注意:由于QUART
的类型为numeric
/ double
。因此,我的建议是,不应与==
进行比较,而应将两个值的差异与硬件的double precision
限制进行比较。为了解决这些问题,解决方案可以是
mapply(function(x)sum(abs(df$QUART - x) <= 0.000001), df$QUART)
数据强>
df <- read.table(text = "
ID QUART Trasaction
1 1992.2 'Company 1'
2 1992.2 'Company 2'
3 1992.2 'Company 3'
4 1992.2 'Company 4'
5 1992.3 'Company 5'
6 1992.4 'Company 6'
7 1993.1 'Company 7'",
header = TRUE, stringsAsFactors = FALSE)