我有大量的两元子列表,它们是名为mylist
的列表的成员:
mylist = [['AB001', 22100],
['AB001', 32935],
['XC013', 99834],
['VD126', 18884],
['AB001', 34439],
['XC013', 86701]]
我想根据子列表是否包含与第一个项目相同的字符串,将mylist
排序到新的子列表中。例如,这就是我要输出的代码:
newlist = [['AB001', 22100], ['AB001', 32935], ['AB001', 34439]],
[['XC013', 99834], ['XC013', 86701]],
[['VD126', 18884]]
以下是我尝试对此进行编码的方式:
mylist = sorted(mylist)
newlist = []
for sublist in mylist:
id = sublist[0]
if id == next.id:
newlist.append(id)
print newlist
我还试图了解itertools.groupby()
是否是解决此问题的正确工具。有人可以帮我解决这个问题吗?提前谢谢。
答案 0 :(得分:4)
请注意,您不能使用以前导零开头的十进制文字。我假设这实际上存储在一个变量中,因此没有前导零。
你认为这是groupby
的工作是正确的:
from itertools import groupby
from operator import itemgetter
mylist = [['AB001', 22100],
['AB001', 32935],
['XC013', 99834],
['VD126', 18884],
['AB001', 4439],
['XC013', 86701]]
print [list(value) for key, value in groupby(sorted(mylist), key=itemgetter(0))]
这将为您提供一个列表,按子列表中的第一项分组。
[[['AB001', 4439], ['AB001', 22100], ['AB001', 32935]],
[['VD126', 18884]],
[['XC013', 86701], ['XC013', 99834]]]
答案 1 :(得分:1)
有许多替代方法可以解决此问题:
def regroup_by_di(items, key=None):
result = {}
callable_key = callable(key)
for item in items:
key_value = key(item) if callable_key else item
if key_value not in result:
result[key_value] = []
result[key_value].append(item)
return result
import collections
def regroup_by_dd(items, key=None):
result = collections.defaultdict(list)
callable_key = callable(key)
for item in items:
result[key(item) if callable_key else item].append(item)
return dict(result) # to be in line with other solutions
def regroup_by_sd(items, key=None):
result = {}
callable_key = callable(key)
for item in items:
key_value = key(item) if callable_key else item
result.setdefault(key_value, []).append(item)
return result
import itertools
def regroup_by_it(items, key=None):
seq = sorted(items, key=key)
result = {
key_value: list(group)
for key_value, group in itertools.groupby(seq, key)}
return result
def group_by(
seq,
key=None):
items = iter(seq)
try:
item = next(items)
except StopIteration:
return
else:
callable_key = callable(key)
last = key(item) if callable_key else item
i = j = 0
for i, item in enumerate(items, 1):
current = key(item) if callable_key else item
if last != current:
yield last, seq[j:i]
last = current
j = i
if i >= j:
yield last, seq[j:i + 1]
def regroup_by_gb(items, key=None):
return dict(group_by(sorted(items, key=key), key))
这些可以分为两类:
dict
的结构(regroup_by_di()
,regroup_by_dd()
,regroup_by_sd()
)uniq
的函数(例如itertools.groupby()
)(regroup_by_it()
,regroup_by_gb()
)第一类方法的计算复杂度为O(n)
,而第二类方法的计算复杂度为O(n log n)
。
所有提议的方法都需要指定key
。
对于OP的问题,可以使用operators.itemgetter(0)
或lambda x: x[0]
。此外,要获得OP的理想结果,应该只获得list(dict.values())
,例如:
from operator import itemgetter
mylist = [['AB001', 22100],
['AB001', 32935],
['XC013', 99834],
['VD126', 18884],
['AB001', 4439],
['XC013', 86701]]
print(list(regroup_by_di(mylist, key=itemgetter(0)).values()))
# [[['AB001', 22100], ['AB001', 32935], ['AB001', 4439]], [['XC013', 99834], ['XC013', 86701]], [['VD126', 18884]]]
对于所有基于dict
的(第一类)解决方案,计时速度更快,而对于所有基于groupby
的第二类解决方案的计时速度都慢。
在基于dict
的解决方案中,它们的性能将在一定程度上取决于“碰撞率” ,后者与新项目创建新对象的次数成正比。
对于较高的碰撞率,regroup_by_di()
可能是最快的,而对于较低的碰撞率,regroup_by_dd()
可能是最快的。
基准测试如下:
(更多详细信息,here。)
答案 2 :(得分:0)
collections.defaultdict
import { ZonedDateTime, LocalDate, ZoneId} from "js-joda";
import 'js-joda-timezone';
const aTimeWhenLondonIsAlreadyInTomorrow = "2019-04-15T00:00:05.000Z";
const inBetweenTimeInLondon = ZonedDateTime.parse(aTimeWhenLondonIsAlreadyInTomorrow);
const inBetweenTimeInNYC = inBetweenTimeInLondon.withZoneSameInstant(ZoneId.of("America/New_York"))
const dayInLondon = inBetweenTimeInLondon.toLocalDate();
const dayInNYC = inBetweenTimeInNYC.toLocalDate();
console.log(inBetweenTimeInLondon.toString()); // "2019-04-15T00:00:05Z"
console.log(dayInLondon.toString()); // "2019-04-15"
console.log(inBetweenTimeInNYC.toString()) // "2019-04-14T20:00:05-04:00[America/New_York]"
console.log(dayInNYC.toString()); // "2019-04-14"
解决方案将产生O( n log n )成本,因为必须先对输入进行 排序。您可以使用itertools.groupby
的列表来保证O( n )的解决方案:
defaultdict
答案 3 :(得分:0)
.get
来确定密钥是否存在,如果密钥不存在,请返回某个指定的值library(ComplexHeatmap)
library(RColorBrewer)
Import data:
x = readRDS("data.rds")
#Select number of clusters to plot:
clusters = 7
#Generate dendrogram to make block annotation:
dend = as.dendrogram(hclust(as.dist(1- cor(t(x))),"complete"), hang=-1)
#Create annotation block for heatmap
cl_num = cutree(dend, k = clusters)
cl_col = brewer.pal(clusters,"Set2")
names(cl_col) = unique(cl_num)
cl_col = list(cl_num=cl_col)
row_ha = rowAnnotation(cl_num = cl_num,
col = cl_col,
annotation_legend_param = list(title = "Cluster"),
show_annotation_name = F)
###Create heatmap:
Heatmap(x,
clustering_distance_columns = "euclidean",
clustering_method_columns = "complete",
clustering_distance_rows = "pearson",
clustering_method_rows = "complete",
row_split = clusters,
left_annotation = row_ha,
show_row_names = F,
heatmap_legend_param = list(title = "Z-score"),
column_title = "Treatment",
row_title = "Genes",
use_raster = F,
column_dend_height = unit(0.1, "cm"),
column_names_gp = gpar(fontsize = 16),
column_names_rot = 0,
column_names_centered = TRUE)
。dict.get
的默认值为None
,因此此方法从不引发KeyError。
None
是字典中的值,则更改None
返回的默认值。
.get
test.get(t[0], 'something here')
,然后将key
,list
添加为t
值。dict