Question

我有一个包含2个数据集的列表。

a = data.frame(c(1,1,1,1,1,2,2,2,2,2), c("a","b", "c", "d","e","e","f", "g", "h","i"))
colnames(a) = c("Numbers","Letters")
c = data.frame(c(3,3,3,3,3,4,4,4,4,4), c("q","r", "s", "t","u","u","v", "w", "x","y"))
colnames(c) = c("Numbers","Letters")
my.list = list(a,c)
my.list

我有兴趣只返回每个数据集的唯一编号之间共同的字母。期望的结果如下：

new_a = data.frame(c(1,2),c("e","e"))
new_c = data.frame(c(3,4),c("u","u"))
colnames(new_a) = c("Numbers","Letters")
colnames(new_c) = c("Numbers","Letters")
my.new.list = list(new_a,new_c)
my.new.list

正如您将看到的，写信＆＃34; e＆＃34;是数字＆＃34; 1＆＃34;和＆＃34; 2＆＃34;分享数据集1，而字母＆＃34; u＆＃34;是数据集2中数字3和4共享的唯一通用字母。

我正在尝试为一个非常大的列表执行此操作。为了让您了解我的真实问题，我有一个列表，其中每个元素都是一个状态。在每个州，我有多个资产经理或＆＃34;帐户＆＃34;每个帐户都有多个代码。我试图找到帐户对每个地理位置有共同点的代码。在上面的例子中，数字将是帐户，字母将是代码，列表中包含的两个数据集将是两个不同的状态。我希望这有助于解决我的问题。

谢谢！

Answer 1

library(data.table)
a <- as.data.table(a)
a[, if(.N > 1) .SD, by = list(Letters)]
#    Letters Numbers
# 1:       e       1
# 2:       e       2

说明：获取表a并按列Letters（by = list(Letters)）分组，并仅在数量为.SD时返回每个组的数据子集（.N）组的行（<Import Project="..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props" Condition="Exists('..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props')" /> <Reference Include="xunit.abstractions"> <HintPath>..\packages\xunit.abstractions.2.0.0\lib\net35\xunit.abstractions.dll</HintPath> </Reference> <Reference Include="xunit.core"> <HintPath>..\packages\xunit.extensibility.core.2.0.0\lib\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.dll</HintPath> </Reference> <Target Name="EnsureNuGetPackageBuildImports" BeforeTargets="PrepareForBuild"> <PropertyGroup> <ErrorText>This project references NuGet package(s) that are missing on this computer. Enable NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is {0}.</ErrorText> </PropertyGroup> <Error Condition="!Exists('..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props')" Text="$([System.String]::Format('$(ErrorText)', '..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props'))" /> </Target>）> 1。

Answer 2

我们可以在Reduce

中将intersect与base R一起使用

 lapply(my.list, function(x) x[with(x, Letters %in%
                 Reduce(intersect, split(Letters, Numbers))),])

或使用dplyr

 library(dplyr)
 lapply(my.list, function(x)
                    x %>% 
                        group_by(Letters) %>% 
                        filter(n_distinct(Numbers)==2))

不是使用list，而是可以将其更改为包含其他分组列的单个数据集，然后执行相同操作，

 library(tidyr)
 unnest(my.list, group) %>%
            group_by(group, Letters) %>%
            filter(n_distinct(Numbers)==2)

如果我们不知道每个列表元素中唯一数字的数量

  unnest(my.list, group) %>% 
              group_by(group) %>% 
              mutate(n= n_distinct(Numbers)) %>%
              group_by(Letters, add=TRUE) %>% 
              filter(n_distinct(Numbers)==n) %>%
              select(-n)

根据2个条件

2 个答案: