我正在尝试从较小的列表中获取经过排序的主列表。可以使用python或R。
在R中,我有
l1<-c("a","c","d")
l2<-c("a","b","e")
l3<-c("a","c","e")
l4<-c("a","b","c","e")
l5<-c("b","c","d")
m<-unique(c(l1,l2,l3,l4,l5))
我期望的输出是a,b,c,d,e
。
在python中
l1=["a","c","d"]
l2=["a","b","e"]
l3=["a","c","e"]
l4=["a","b","c","e"]
l5=["b","c","d"]
预期结果为["a","b","c","d","e"]
我首先进行设置并遍历每个列表并检查索引,但是开始变得复杂。
感谢您的帮助。
编辑:
我想我对按字母顺序排序的列表感到困惑。这些列表项可能是随机的。实际订单可能不是a,b,c,d,e
。
l1=["e","a"]
l2=["e","b","d"]
l3=["b","d","a"]
在这种情况下,预期顺序为["e","b","d","a"]
而不是["a","b","d","e"]
为更清楚起见,请考虑很少有人试图从东到西命名美国各州。
person 1 says, Florida, Louisiana, Nevada,California.
person 2 says Alabama, Mississippi, Louisiana, new Mexico, Nevada
person 3 says Florida, Alabama, Texas, New Mexico, California
person 4 says Alabama, Mississippi, Texas, Nevada
person 5 says Mississippi Louisiana, Nevada
我正在尝试从上述信息中获得正确的顺序。
因此,在这里,我们将从Florida, Louisiana, Nevada, California
开始。现在,加第二,就是(Alabama, Florida),Louisiana,New Mexico, Nevada, California.
加第三(打破阿拉巴马州/佛罗里达的平局),Florida, Alabama, Louisiana, Texas, New Mexico, Nevada, California
并加第四使Florida, Alabama, (Mississippi/Louisiana), Texas, New Mexico, Nevada, California
。密西西比州和路易斯安那州增加了第5个突破领带。
答案 0 :(得分:3)
Aaa和实际答案:https://www.python.org/doc/essays/graphs/
打猎好! :D
这应该针对您的原始问题:
l1=["a","c","d"]
l2=["a","b","e"]
l3=["a","c","e"]
l4=["a","b","c","e"]
l5=["b","c","d"]
s = set()
s.update(l1, l2, l3, l4, l5)
l = sorted(s)
l
#['a', 'b', 'c', 'd', 'e']
对于您已编辑的问题,让我们考虑第二个示例的细微变化:
l1=["e","a"]
l2=["e","b","d"]
l3=["b","c","a"]
(在l3
上斜视)。在这种情况下,列表集未充分确定,因为d
和c
之间没有唯一的顺序。没有确定联系的规则,就不可能有算法。
答案 1 :(得分:2)
对于Python:
# Create list of lists
lsts = [l1, l2, l3, l4, l5]
s = set()
# Add lists to set
for lst in lsts:
s.update(lst)
# Sort set
sorted(s)
编辑:OP更新后:
def sort_lists(lsts):
list_of_hashes = []
for lst in lsts:
list_of_hashes.append({k: v for v, k in enumerate(lst)})
result_hash = dict()
for hash_item in list_of_hashes:
for key, value in hash_item.items():
if result_hash.get(key):
result_hash[key] += value
else:
result_hash[key] = value
print(result_hash)
sorted_results = sorted(result_hash.items(), key=lambda kv: kv[1])
print(sorted_results)
return [tup[0] for tup in sorted_results]
# Test Case 1
l1=["e","a"]
l2=["e","b","d"]
l3=["b","d","a"]
print(sort_lists([l1,l2,l3]))
>> ['e', 'd', 'b', 'a']
# Test Case 2
s1 = ['Florida', 'Louisiana', 'Nevada', 'California']
s2 = ['Alabama', 'Mississippi', 'Louisiana', 'New Mexico', 'Nevada']
s3 = ['Florida', 'Alabama', 'Texas', 'New Mexico', 'California']
s4 = ['Alabama', 'Mississippi', 'Texas', 'Nevada']
s5 = ['Mississippi', 'Louisiana', 'Nevada']
print(sort_lists([s1,s2,s3,s4,s5]))
>> ['Florida', 'Alabama', 'Mississippi', 'Louisiana', 'Texas', 'New Mexico', 'California', 'Nevada']
答案 2 :(得分:1)
这是R中的一种方法,该方法使用tidygraph
将向量转换为有向无环图,然后使用node_topo_order
得出隐含的节点顺序。使用从东到西的示例状态:
l1 <- c("Florida", "Louisiana", "Nevada", "California")
l2 <- c("Alabama", "Mississippi", "Louisiana", "New Mexico", "Nevada" )
l3 <- c("Florida", "Alabama", "Texas", "New Mexico", "California")
l4 <- c("Alabama", "Mississippi", "Texas", "Nevada")
l5 <- c("Mississippi", "Louisiana", "Nevada")
library(tidyverse)
library(tidygraph)
ew_graph <- list(l1, l2, l3, l4, l5) %>%
map_dfr(~tibble(east = ., west = lead(.))) %>% # turn vectors into edge table
filter(!is.na(west)) %>%
as_tbl_graph()
ew_graph %>% # Now we can order nodes and extract their names as output
arrange(node_topo_order()) %>%
pull(name)
#> [1] "Florida" "Alabama" "Mississippi" "Louisiana" "Texas"
#> [6] "New Mexico" "Nevada" "California"
请注意,可以有多个正确的订单,这只会返回其中之一。如果需要,我们还可以绘制图形以更清楚地查看它们之间的关系,这表明在此数据中,路易斯安那州和德克萨斯州之间存在联系(您无法相互追踪),而在构建例。碰巧我们按“真实”顺序拥有它们。如果您需要定义一种单独的打破关系的方式,则这种方法将需要一些技巧。
library(ggraph)
ggraph(ew_graph) +
geom_node_label(aes(label = name)) +
geom_edge_link(
mapping = aes(start_cap = label_rect(node1.name),
end_cap = label_rect(node2.name)),
arrow = arrow(length = unit(4, 'mm'))
)
由reprex package(v0.3.0)于2019-05-28创建
答案 3 :(得分:0)
我的解决方案的复杂度为 O(n)。其他解决方案可以具有O(n log n):
Python :(与R类似)
l1=["a","c","d"]
l2=["a","b","e"]
l3=["a","c","e"]
l4=["a","b","c","e"]
l5=["b","c","d"]
lsts = [l1, l2, l3, l4, l5]
solve = []
for p in range(130):
solve.append(0)
for lst in lsts:
for p in lst:
solve[ord(p)] += 1
for idx, value in enumerate(solve):
if value != 0:
print chr(idx)
此解决方案基于表ascii中的值。
对于您的更新:
l1=["z","c","d"]
l2=["a","b","e"]
l3=["a","c","e"]
l4=["a","b","c","e"]
l5=["b","c","d"]
mySet = set()
mySet.update(l1, l2, l3, l4, l5)
result = sorted(mySet)
print(result)