我有一个包含2个数据集的列表。
a = data.frame(c(1,1,1,1,1,2,2,2,2,2), c("a","b", "c", "d","e","e","f", "g", "h","i"))
colnames(a) = c("Numbers","Letters")
c = data.frame(c(3,3,3,3,3,4,4,4,4,4), c("q","r", "s", "t","u","u","v", "w", "x","y"))
colnames(c) = c("Numbers","Letters")
my.list = list(a,c)
my.list
我有兴趣只返回每个数据集的唯一编号之间共同的字母。期望的结果如下:
new_a = data.frame(c(1,2),c("e","e"))
new_c = data.frame(c(3,4),c("u","u"))
colnames(new_a) = c("Numbers","Letters")
colnames(new_c) = c("Numbers","Letters")
my.new.list = list(new_a,new_c)
my.new.list
正如您将看到的,写信" e"是数字" 1"和" 2"分享数据集1,而字母" u"是数据集2中数字3和4共享的唯一通用字母。
我正在尝试为一个非常大的列表执行此操作。为了让您了解我的真实问题,我有一个列表,其中每个元素都是一个状态。在每个州,我有多个资产经理或"帐户"每个帐户都有多个代码。我试图找到帐户对每个地理位置有共同点的代码。在上面的例子中,数字将是帐户,字母将是代码,列表中包含的两个数据集将是两个不同的状态。我希望这有助于解决我的问题。
谢谢!
答案 0 :(得分:2)
library(data.table)
a <- as.data.table(a)
a[, if(.N > 1) .SD, by = list(Letters)]
# Letters Numbers
# 1: e 1
# 2: e 2
说明:获取表a
并按列Letters
(by = list(Letters)
)分组,并仅在数量为.SD
时返回每个组的数据子集(.N
) 组的行(<Import Project="..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props" Condition="Exists('..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props')" />
<Reference Include="xunit.abstractions">
<HintPath>..\packages\xunit.abstractions.2.0.0\lib\net35\xunit.abstractions.dll</HintPath>
</Reference>
<Reference Include="xunit.core">
<HintPath>..\packages\xunit.extensibility.core.2.0.0\lib\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.dll</HintPath>
</Reference>
<Target Name="EnsureNuGetPackageBuildImports" BeforeTargets="PrepareForBuild">
<PropertyGroup>
<ErrorText>This project references NuGet package(s) that are missing on this computer. Enable NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is {0}.</ErrorText>
</PropertyGroup>
<Error Condition="!Exists('..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props')" Text="$([System.String]::Format('$(ErrorText)', '..\packages\xunit.core.2.0.0\build\portable-net45+win+wpa81+wp80+monotouch+monoandroid+Xamarin.iOS\xunit.core.props'))" />
</Target>
)> 1。
答案 1 :(得分:1)
我们可以在Reduce
intersect
与base R
一起使用
lapply(my.list, function(x) x[with(x, Letters %in%
Reduce(intersect, split(Letters, Numbers))),])
或使用dplyr
library(dplyr)
lapply(my.list, function(x)
x %>%
group_by(Letters) %>%
filter(n_distinct(Numbers)==2))
不是使用list
,而是可以将其更改为包含其他分组列的单个数据集,然后执行相同操作,
library(tidyr)
unnest(my.list, group) %>%
group_by(group, Letters) %>%
filter(n_distinct(Numbers)==2)
如果我们不知道每个列表元素中唯一数字的数量
unnest(my.list, group) %>%
group_by(group) %>%
mutate(n= n_distinct(Numbers)) %>%
group_by(Letters, add=TRUE) %>%
filter(n_distinct(Numbers)==n) %>%
select(-n)