我有列表列表,并希望创建包含所有唯一元素计数的数据框。这是我的测试数据:
<Grid>
<Grid.RowDefinitions>
<RowDefinition Height="auto"/>
<RowDefinition Height="*"/>
</Grid.RowDefinitions>
<Grid Grid.Row="0">
<CheckBox Name="ChkBox" IsChecked="{Binding IsCheckedP}"/>
</Grid>
<telerik:RadGridView Grid.Row="1">
<telerik:RadGridView.Columns>
<telerik:GridViewDataColumn IsVisible="{Binding IsCheckedP}" Header="First Name" UniqueName="FirstName" />
<telerik:GridViewDataColumn Header="Last Name" UniqueName="LasttName" />
</telerik:RadGridView.Columns>
</telerik:RadGridView>
</Grid>
private bool _isChecked;
public bool IsCheckedP
{
get { return _isChecked; }
set
{
_isChecked = value;
OnPropertyChanged();
}
}
public ViewModel()
{
//...
}
public event PropertyChangedEventHandler PropertyChanged;
protected virtual void OnPropertyChanged([CallerMemberName] string propertyName = null)
{
PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName));
}
我可以使用带有DataContext
循环的overflow: scroll;
来执行此类操作:
test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
["P1", "P1", "P1"],
["P1", "P1", "P1", "P2"],
["P4"],
["P1", "P4", "P2"],
["P1", "P1", "P1"]]
但是如何将这个循环的结果总结为新的数据框?
预期输出为数据框:
Counter
答案 0 :(得分:6)
这是一种方式。
from collections import Counter
from itertools import chain
test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
["P1", "P1", "P1"],
["P1", "P1", "P1", "P2"],
["P4"],
["P1", "P4", "P2"],
["P1", "P1", "P1"]]
c = Counter(chain.from_iterable(test))
for k, v in c.items():
print(k, v)
# P1 15
# P2 4
# P3 1
# P4 2
输出为数据帧:
df = pd.DataFrame.from_dict(c, orient='index').transpose()
# P1 P2 P3 P4
# 0 15 4 1 2
答案 1 :(得分:5)
就更好的表现而言,您应该使用:
collections.Counter
,itertools.chain.from_iterable
为:
itertools
或者,你应该使用collections.Counter
与列表理解 (需要少量导入>>> from collections import Counter
>>> Counter([x for a in test for x in a])
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})
具有相同的性能) as :
list
继续阅读更多替代解决方案和性能比较。 (否则跳过)
方法1 :连接您的子列表以创建单个test = [
["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
["P1", "P1", "P1"],
["P1", "P1", "P1", "P2"],
["P4"],
["P1", "P4", "P2"],
["P1", "P1", "P1"]
]
from itertools import chain
from collections import Counter
my_counter = Counter(chain.from_iterable(test))
并使用collections.Counter
查找计数。
解决方案1 :使用itertools.chain.from_iterable
连接列表,并使用collections.Counter
查找计数:
from collections import Counter
my_counter = Counter([x for a in my_list for x in a])
解决方案2 :使用列表理解组合列表:
from collections import Counter
my_counter = Counter(sum(test, []))
解决方案3 :使用sum
连接列表
Counter
方法2: 使用collections.Counter
然后sum
from collections import Counter
my_counter = sum(map(Counter, test), Counter())
个对象计算每个子列表中元素的数量在列表中。
解决方案4 :使用collections.Counter
和map
计算每个子列表的对象:
from collections import Counter
my_counter = sum([Counter(t) for t in test], Counter())
解决方案5 :使用 list comprehension 计算每个子列表的对象:
my_counter
在上述所有解决方案中,>>> my_counter
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})
将保留值:
timeit
下面是Python 3的chain.from_iterable
比较,其中列出了1000个子列表和每个子列表中的100个元素:
使用mquadri$ python3 -m timeit "from collections import Counter; from itertools import chain; my_list = [list(range(100)) for i in range(1000)]" "Counter(chain.from_iterable(my_list))"
100 loops, best of 3: 17.1 msec per loop
最快(17.1毫秒)
Count
列表中的第二个是使用列表理解来组合列表,然后执行itertools
(与上面类似的结果,但没有额外导入mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter([x for a in my_list for x in a])"
100 loops, best of 3: 18.36 msec per loop
)(18.36毫秒)
Counter
性能方面的第三个是在列表理解中的子列表上使用mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum([Counter(t) for t in my_list], Counter())"
10 loops, best of 3: 162 msec per loop
:(162毫秒)
Counter
列表中的第四个是map
使用mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum(map(Counter, my_list), Counter())"
10 loops, best of 3: 176 msec per loop
(结果与上面使用列表理解的结果非常相似)(176毫秒)
sum
使用mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter(sum(my_list, []))"
10 loops, best of 3: 526 msec per loop
连接列表的解决方案太慢(526毫秒)
jQuery
答案 2 :(得分:1)
以下是使用itertools.groupby
>>> from itertools import groupby, chain
>>> out = [(k,len(list(g))) for k,g in groupby(sorted(chain(*test)))]
>>> out
>>> [('P1', 15), ('P2', 4), ('P3', 1), ('P4', 2)]
将其转换为dict,如:
>>> dict(out)
>>> {'P2': 4, 'P3': 1, 'P1': 15, 'P4': 2}
要将其转换为使用数据框
>>> import pandas as pd
>>> pd.DataFrame(dict(out), index=[0])
P1 P2 P3 P4
0 15 4 1 2
答案 3 :(得分:0)
函数“set”仅保留列表中的唯一元素。因此,使用“len(set(mylinst))”,可以获得列表中唯一元素的数量。然后,你只需要迭代它。
dict_nb_item = {}
i = 0
for test_item in test:
dict_nb_item[i] = len(set(test_item))
i += 1
print(dict_nb_item)