根据2个条件

时间:2015-05-22 20:15:53

标签: r list duplicates subset

我有一个包含2个数据集的列表。

a = data.frame(c(1,1,1,1,1,2,2,2,2,2), c("a","b", "c", "d","e","e","f", "g", "h","i"))
colnames(a) = c("Numbers","Letters")
c = data.frame(c(3,3,3,3,3,4,4,4,4,4), c("q","r", "s", "t","u","u","v", "w", "x","y"))
colnames(c) = c("Numbers","Letters")
my.list = list(a,c)
my.list

我有兴趣只返回每个数据集的唯一编号之间共同的字母。期望的结果如下:

new_a = data.frame(c(1,2),c("e","e"))
new_c = data.frame(c(3,4),c("u","u"))
colnames(new_a) = c("Numbers","Letters")
colnames(new_c) = c("Numbers","Letters")
my.new.list = list(new_a,new_c)
my.new.list

正如您将看到的,写信" e"是数字" 1"和" 2"分享数据集1,而字母" u"是数据集2中数字3和4共享的唯一通用字母。

我正在尝试为一个非常大的列表执行此操作。为了让您了解我的真实问题,我有一个列表,其中每个元素都是一个状态。在每个州,我有多个资产经理或"帐户"每个帐户都有多个代码。我试图找到帐户对每个地理位置有共同点的代码。在上面的例子中,数字将是帐户,字母将是代码,列表中包含的两个数据集将是两个不同的状态。我希望这有助于解决我的问题。

谢谢!

2 个答案:

答案 0 :(得分:2)

library(data.table)
a <- as.data.table(a)
a[, if(.N > 1) .SD, by = list(Letters)]
#    Letters Numbers
# 1:       e       1
# 2:       e       2

说明:获取表a并按列Lettersby = list(Letters))分组,并仅在数量为.SD时返回每个组的数据子集(.N 组的行(<Import Project="..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props" Condition="Exists('..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props')" /> <Reference Include="xunit.abstractions"> <HintPath>..\packages\xunit.abstractions.2.0.0\lib\net35\xunit.abstractions.dll</HintPath> </Reference> <Reference Include="xunit.core"> <HintPath>..\packages\xunit.extensibility.core.2.0.0\lib\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.dll</HintPath> </Reference> <Target Name="EnsureNuGetPackageBuildImports" BeforeTargets="PrepareForBuild"> <PropertyGroup> <ErrorText>This project references NuGet package(s) that are missing on this computer. Enable NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is {0}.</ErrorText> </PropertyGroup> <Error Condition="!Exists('..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props')" Text="$([System.String]::Format('$(ErrorText)', '..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props'))" /> </Target> )> 1。

答案 1 :(得分:1)

我们可以在Reduce

中将intersectbase R一起使用
 lapply(my.list, function(x) x[with(x, Letters %in%
                 Reduce(intersect, split(Letters, Numbers))),])

或使用dplyr

 library(dplyr)
 lapply(my.list, function(x)
                    x %>% 
                        group_by(Letters) %>% 
                        filter(n_distinct(Numbers)==2))

不是使用list,而是可以将其更改为包含其他分组列的单个数据集,然后执行相同操作,

 library(tidyr)
 unnest(my.list, group) %>%
            group_by(group, Letters) %>%
            filter(n_distinct(Numbers)==2)

如果我们不知道每个列表元素中唯一数字的数量

  unnest(my.list, group) %>% 
              group_by(group) %>% 
              mutate(n= n_distinct(Numbers)) %>%
              group_by(Letters, add=TRUE) %>% 
              filter(n_distinct(Numbers)==n) %>%
              select(-n)