Python排序问题 - 给出['url','tag1','tag2',...]和搜索规范['tag3','tag1',...]的列表,返回相关的url列表

时间:2010-12-12 01:29:37

标签: python

我对编程很陌生,所以我确信有一种更简洁的方式来构建这个,但我正在尝试创建一个个人书签程序。给定多个网址,每个网址都有按相关性排序的标记列表,我希望能够创建一个搜索,该搜索由返回最相关网址列表的标记列表组成。我的第一个解决方案是给第一个标签一个值为1,第二个标签为2,依此类推。让python list sort函数完成其余的工作。 2个问题:

1)有没有一种更优雅/更有效的方法(让我难堪!) 2)在给出上述问题的输入的情况下,按相关性排序的任何其他一般方法?

很有责任。

# Given a list of saved urls each with a corresponding user-generated taglist 
# (ordered by relevance), the user enters a "search" list-of-tags, and is 
# returned a sorted list of urls. 

# Generate sample "content" linked-list-dictionary. The rationale is to 
# be able to add things like 'title' etc at later stages and to 
# treat each url/note as in independent entity. But a single dictionary
# approach like "note['url1']=['b','a','c','d']" might work better?

content = []
note = {'url':'url1', 'taglist':['b','a','c','d']}
content.append(note)
note = {'url':'url2', 'taglist':['c','a','b','d']}
content.append(note)
note = {'url':'url3', 'taglist':['a','b','c','d']}
content.append(note)
note = {'url':'url4', 'taglist':['a','b','d','c']}
content.append(note)
note = {'url':'url5', 'taglist':['d','a','c','b']}
content.append(note)

# An example search term of tags, ordered by importance
# I'm using a dictionary with an ordinal number system 
# This seems clumsy
search = {'d':1,'a':2,'b':3}

# Create a tagCloud with one entry for each tag that occurs
tagCloud = []
for note in content:
    for tag in note['taglist']:
        if tagCloud.count(tag) == 0:
            tagCloud.append(tag)

# Create a dictionary that associates an integer value denoting
# relevance (1 is most relevant etc) for each existing tag

d={}            
for tag in tagCloud:
    try:
        d[tag]=search[tag]
    except KeyError:
        d[tag]=100

# Create a [[relevance, tag],[],[],...] result list & sort 
result=[]    
for note in content:
    resultNote=[]
    for tag in note['taglist']:
        resultNote.append([d[tag],tag])
    resultNote.append(note['url'])
    result.append(resultNote)
result.sort()

# Remove the relevance values & recreate a list containing
# the url string followed by corresponding tags. 
# Its so hacky i've forgotten how it works!
# It's mostly for display, but suggestions on "best-practice" 
# intermediate-form data storage? 

finalResult=[]
for note in result:
    temp=[]
    temp.append(note.pop())
    for tag in note:
        temp.append(tag[1])
    finalResult.append(temp)

print "Content: ", content
print "Search: ", search
print "Final Result: ", finalResult

2 个答案:

答案 0 :(得分:2)

  

1)有没有一种更优雅/更有效的方法(让我难堪!)

当然可以。基本思路:退出尝试告诉Python该做什么,然后问问你想要什么。

content = [
    {'url':'url1', 'taglist':['b','a','c','d']},
    {'url':'url2', 'taglist':['c','a','b','d']},
    {'url':'url3', 'taglist':['a','b','c','d']},
    {'url':'url4', 'taglist':['a','b','d','c']},
    {'url':'url5', 'taglist':['d','a','c','b']}
]

search = {'d' : 1, 'a' : 2, 'b' : 3}

# We can create the tag cloud like this:
# tagCloud = set(sum((note['taglist'] for note in content), []))
# But we don't actually need it: instead, we'll just use a default value
# when looking things up in the 'search' dict.

# Create a [[relevance, tag],[],[],...] result list & sort 
result = sorted(
    [
        [search.get(tag, 100), tag]
        for tag in note['taglist']
    ] + [[note['url']]]
    # The result will look like [ [relevance, tag],... , [url] ]
    # Note that the url is wrapped in a list too. This makes the
    # last processing step easier: we just take the last element of
    # each nested list.
    for note in content
)

# Remove the relevance values & recreate a list containing
# the url string followed by corresponding tags. 
finalResult = [
    [x[-1] for x in note]
    for note in result
]

print "Content: ", content
print "Search: ", search
print "Final Result: ", finalResult

答案 1 :(得分:0)

我建议你也给每个标签赋予一个重量,这取决于它的稀有程度(例如“狼蛛”标签的重量超过“自然”标签¹)。对于给定的网址,与其他网址相同的罕见代码应标记更强的相关性,而在另一个网址中存在的给定网址的常用标记应标记 down 相关性。

将上面描述的规则转换为每个其他网址的数字相关性计算很容易。

¹除非您的所有网址都与“狼蛛”相关,当然:)