我有一个超过100 000行的数据集。我想在每行的特定列中找到许多外观,并将其保存到另一列(参见下面的示例)。
我可以迭代每行的整个数据集,但这将是100k * 100k迭代。有没有更有效的方法呢?
输入数据集
A B
1 6
3 1
2 6
4 2
1 4
9 1
输出数据集
A B number_of_appearances (based on column B)
1 6 2
3 1 2
2 6 2
4 2 1
1 4 1
9 1 2
答案 0 :(得分:1)
您可以使用dplyr
:
library(dplyr)
a <- c(2,1,2,3,4,3,2,1,4)
b <- c(3,2,1,2,3,4,3,2,1)
df <- data.frame(a, b)
df %>%
group_by(b) %>%
mutate(appearences_in_b = n())
Source: local data frame [9 x 3]
Groups: b [4]
a b appearences_in_b
<dbl> <dbl> <int>
1 2 3 3
2 1 2 3
3 2 1 2
4 3 2 3
5 4 3 3
6 3 4 1
7 2 3 3
8 1 2 3
9 4 1 2
答案 1 :(得分:1)
没有dplyr
:
# create the dataframe
x = sample(1:3, 10, TRUE);
y = sample(c("a","b","c"), 10, TRUE);
d = data.frame(x,y);
# get the frequencies of y
tb = table(d$y);
tb = as.data.frame(tb);
# make an "SQL join-like" merging of the two data-frames
res = merge(d,tb,by.x="y",by.y="Var1", sort=FALSE);
答案 2 :(得分:1)
我们可以使用ave
base R
df1$appearance_in_b <- with(df1, ave(B, B, FUN=length))
df1$appearance_in_b
#[1] 2 2 2 1 1 2
答案 3 :(得分:0)
只需添加data.table
方法:
library(data.table)
dt <- data.table(A = c(1, 3, 2, 4, 1, 9), B = c(6, 1, 6, 2, 4, 1))
dt[, number_of_appearances := .N, by = "B"]
print(dt)
A B number_of_appearances
1: 1 6 2
2: 3 1 2
3: 2 6 2
4: 4 2 1
5: 1 4 1
6: 9 1 2