如果问题的标题不那么清楚,请道歉。
我有两个data frame
,如下所示:
df1
NAME FOLLOWS
san big supa
san EAU
san simulate
san spang
glyn guido
glyn claire
glyn vincent
glyn dan
glyn peter
glyn EAU
df2
FOLLOWS
guido
vincent
EAU
EUSC
brian
simulate
peter
我希望count
与df1$FOLLOWS
之间的df2$FOLLOWS
匹配NAME
中的每个df1
以及df1$FOLLOWS
的长度NAME
df1
中的df3
NAME LENGTH_FOLLOWS COUNT_Match
san 4 2
glyn 6 4
。对于这些数据框架,我期待这样的输出:
# Using list
cursor.execute(
"SELECT age FROM user WHERE %(names) = '{}' OR user.name IN %(names)s",
{'names': []},
)
# Using tuple
cursor.execute(
"SELECT age FROM user WHERE %(names) = () OR user.name IN %(names)s",
{'names': ()},
)
# Using both list and tuple
cursor.execute(
"SELECT age FROM user WHERE %(names_l) = '{}' OR user.name IN %(names_t)s",
{'names_l': [], 'names_t': ()},
)
答案 0 :(得分:1)
您可以先将df1与df2合并,这样只会保留df1中的值。然后你可以简单地计算实例。
library(sqldf)
sqldf('select NAME, count(NAME) as LENGTH_FOLLOWS , count(Actual_F) as COUNT_Match from (select t1.*, t2.FOLLOWS as Actual_F from df1 t1 left join df2 t2 on t1.FOLLOWS=t2.FOLLOWS) group by NAME')
或使用基础R
df1$index=match(df1$FOLLOWS, df2$FOLLOWS)
aggregate(cbind(df1$FOLLOWS,df1$index), by = list(df1$NAME) , FUN = function(x) length(x[!is.na(x)]))
答案 1 :(得分:1)
以下是使用data.table
的选项。将第一个data.frame转换为'data.table'(setDT(df1)
)并使用'df2'连接on
以创建索引列('ind')。然后,按'NAME'分组,我们得到'ind'中非NA元素的逻辑向量的行数(.N
)和sum
library(data.table)
setDT(df1)[df2, ind := 1, on = .(FOLLOWS)]
df1[, .(LENGTH_FOLLOWS = .N, COUNT_MATCH = sum(!is.na(ind))), NAME]
# NAME LENGTH_FOLLOWS COUNT_MATCH
#1: san 4 2
#2: glyn 6 4