按频率减少列表元素的最快方法是什么?

时间:2017-02-21 17:45:15

标签: r list

假设我有一个与此类似的列表:

set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})

[[1]]
[1] "b"

[[2]]
[1] "d" "c"

[[3]]
[1] "b" "a" "a"

[[4]]
[1] "d" "d" "b" "c"

[[5]]
[1] "d" "d" "c" "c" "b"

[[6]]
[1] "b" "d" "b" "d" "c" "c"

[[7]]
[1] "a" "b" "d" "d" "b" "a" "d"

我想在列表中使用较高频率的元素给出长度为1的向量。请注意, 可以包含长度为&gt;的向量。如果没有重复,则为1 。频率表如下:

table(unlist(out))[order(table(unlist(out)), decreasing = T)]

 b  c  d  a 
16 14 13 12 

示例的结果是这样的:

list("b", "c", "b", "b", "b", "b", "b")

REMARK 可以具有长度> 1的向量。如果没有重复,则为1。

out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
length(out)
[1] 10
out[[length(out)+1]] <- c("L", "K")
out
[[1]]
[1] "c"

[[2]]
[1] "d" "a"

[[3]]
[1] "c" "b" "a"

[[4]]
[1] "b" "c" "b" "c"

[[5]]
[1] "a" "a" "d" "c" "d"

[[6]]
[1] "d" "b" "d" "d" "d" "a"

[[7]]
[1] "d" "b" "c" "c" "d" "c" "a"

[[8]]
[1] "d" "a" "d" "b" "d" "a" "b" "d"

[[9]]
[1] "a" "b" "b" "b" "c" "c" "a" "c" "d"

[[10]]
 [1] "d" "d" "d" "a" "d" "d" "c" "c" "a" "c"

[[11]]
[1] "L" "K"

预期结果:

list("c", "d", "c", "c", "d", "d", "d", "d", "d", "d", c("L", "K"))

2 个答案:

答案 0 :(得分:1)

我相信这应该适合您所寻找的目标。

myRanks

 b  c  d  a 
10  9  5  4 


# calculate if most popular, then second most popular, ... item shows up for each list item
sapply(out, function(i) names(myRanks)[min(match(i, names(myRanks)))])
[1] "b" "b" "b" "c" "b" "b" "b"

这会产生

sapply

这里,min遍历每个列表项并返回一个向量。它应用一个函数,使用match选择列表元素中出现的myRanks表的第一个元素的名称(通过sapply(out, function(i) { intersect(names(myRanks)[myRanks == max(unique(myRanks[match(i, names(myRanks))]))], i)}) )。

如果myRanks表中的多个元素具有相同的计数(重复),则以下代码应返回每个列表项的顶部观察列表:

class SchoolAdminForm(forms.ModelForm):
    students = forms.ModelMultipleChoiceField(
        queryset=Student.objects.all(),
        widget=FilteredSelectMultiple(verbose_name='students', is_stacked=False))

    class Meta:
        model = School
        fields = ['your_school_fields_go_here']

    def __init__(self, *args, **kwargs):
        super(SchoolAdminForm, self).__init__(*args, **kwargs)
        if self.instance:
            # fill initial related values
            self.fields['students'].initial = self.instance.student_set.all()

class SchoolAdmin(admin.ModelAdmin):
   form = SchoolAdminForm

   def save_model(self, request, obj, form, change):
       original_students = obj.student_set.all()
       new_students = form.cleaned_data['students']
       remove_qs = original_students.exclude(id__in=new_students.values('id'))
       add_qs = new_students.exclude(id__in=original_students.values('id'))
       for item in remove_qs:
           obj.student_set.remove(item)
       for item in add_qs:
           obj.student_set.add(item)
       obj.save()

这里,与myRanks中具有最高值的列表项中的值具有相同值的myRanks的名称与列表项中存在的名称相交,以便仅返回两个集合中的值。

答案 1 :(得分:0)

这应该有效:

set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
out
#[[1]]
#[1] "b"

#[[2]]
#[1] "c" "b"

#[[3]]
#[1] "b" "b" "b"

#[[4]]
#[1] "d" "c" "c" "d"

#[[5]]
#[1] "d" "b" "a" "a" "c"

#[[6]]
#[1] "a" "b" "c" "b" "c" "c"

#[[7]]
#[1] "a" "c" "d" "b" "d" "c" "b"

tbl <- table(unlist(out))[order(table(unlist(out)), decreasing = T)]
sapply(out, function(x) intersect(names(tbl), x)[1])
# [1] "b" "b" "b" "c" "b" "b" "b"

<强> [编辑]

set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
out[[length(out)+1]] <- c("L", "K")
out
#[[1]]
#[1] "b"

#[[2]]
#[1] "c" "b"

#[[3]]
#[1] "b" "b" "b"

#[[4]]
#[1] "d" "c" "c" "d"

#[[5]]
#[1] "d" "b" "a" "a" "c"

#[[6]]
#[1] "a" "b" "c" "b" "c" "c"

#[[7]]
#[1] "a" "c" "d" "b" "d" "c" "b"

#[[8]]
#[1] "L" "K"

tbl <- table(unlist(out))[order(table(unlist(out)), decreasing = T)]

#tbl
#b  c  d  a  K  L 
#10  9  5  4  1  1 

lapply(out, function(x) names(tbl[tbl==max(tbl[names(tbl) %in%  intersect(names(tbl), x)])]))

#[[1]]
#[1] "b"

#[[2]]
#[1] "b"

#[[3]]
#[1] "b"

#[[4]]
#[1] "c"

#[[5]]
#[1] "b"

#[[6]]
#[1] "b"

#[[7]]
#[1] "b"

#[[8]]
#[1] "K" "L"