将栏放在包含foo的每一行的末尾

时间:2009-10-05 21:04:32

标签: python text-processing

我有一个包含大量行的列表,每个行采用subject-verb-object形式,例如:

Jane likes Fred
Chris dislikes Joe
Nate knows Jill

要绘制表示定向颜色编码边缘中节点之间不同关系的网络图,我需要用箭头替换动词并在每行的末尾放置颜色代码,因此,有些简化:

Jane -> Fred red;
Chris -> Joe blue;
Nate -> Jill black;

只有少量的动词,所以用箭头替换它们只是一些搜索和替换命令的问题。然而,在这之前,我需要在与行的动词对应的每一行的末尾添加一个颜色代码。我想用Python做到这一点。

这些是我编程的宝贝步骤,所以请明确并包含读入文本文件的代码。

感谢您的帮助!

7 个答案:

答案 0 :(得分:5)

听起来你想要研究dictionariesstring formatting。一般来说,如果你需要帮助编程,只需将你遇到的任何问题分解成极小的离散块,独立搜索这些块,然后你应该能够将它全部制作成更大的答案。 Stack Overflow是这种搜索的绝佳资源。

此外,如果您对Python有任何一般的好奇心,请搜索或浏览official Python documentation。如果您发现自己经常不知道从哪里开始,请阅读Python tutorial或查找要经历的书籍。为了获得关于你正在做的事情的良好基础知识而进行的一两周投资将在你完成工作时一次又一次地得到回报。

verb_color_map = {
    'likes': 'red',
    'dislikes': 'blue',
    'knows': 'black',
}

with open('infile.txt') as infile: # assuming you've stored your data in 'infile.txt'
    for line in infile:
        # Python uses the name object, so I use object_
        subject, verb, object_ = line.split()
        print "%s -> %s %s;" % (subject, object_, verb_color_map[verb])

答案 1 :(得分:3)

足够简单;假设动词列表是固定的和小的,这很容易用字典和for循环:

VERBS = {
    "likes": "red"
  , "dislikes": "blue"
  , "knows": "black"
  }

def replace_verb (line):
    for verb, color in VERBS.items():
        if verb in line:
            return "%s %s;" % (
                  line.replace (verb, "->")
                , color
                )
    return line

def main ():
    filename = "my_file.txt"
    with open (filename, "r") as fp:
        for line in fp:
            print replace_verb (line)

# Allow the module to be executed directly on the command line
if __name__ == "__main__":
    main ()

答案 2 :(得分:2)

verbs = {"dislikes":"blue", "knows":"black", "likes":"red"}
for s in open("/tmp/infile"):
  s = s.strip()
  for verb in verbs.keys():
    if (s.count(verb) > 0):
      print s.replace(verb,"->")+" "+verbs[verb]+";"
      break

编辑:而是使用“for s in open”

答案 3 :(得分:1)

你确定这不是一个小家庭作业:)如果是这样的话,那就好了。没有太多细节,请考虑您正在尝试执行的任务:

对于每一行:

  1. 读它
  2. 将其拆分为单词(在空白处 - .split())
  3. 将中间词转换为颜色(基于映射 - > cf:python dict()
  4. 打印第一个单词,箭头,第三个单词和颜色
  5. 使用NetworkX的代码(networkx.lanl.gov /)

    '''
    plot relationships in a social network
    '''
    
    import networkx
    ## make a fake file 'ex.txt' in this directory
    ## then write fake relationships to it.
    example_relationships = file('ex.txt','w') 
    print >> example_relationships, '''\
    Jane Doe likes Fred
    Chris dislikes Joe
    Nate knows Jill \
    '''
    example_relationships.close()
    
    rel_colors = {
        'likes':  'blue',
        'dislikes' : 'black',
        'knows'   : 'green',
    }
    
    def split_on_verb(sentence):
        ''' we know the verb is the only lower cased word
    
        >>> split_on_verb("Jane Doe likes Fred")
        ('Jane Does','Fred','likes')
    
        '''
        words = sentence.strip().split()  # take off any outside whitespace, then split
                                           # on whitespace
        if not words:
            return None  # if there aren't any words, just return nothing
    
        verbs = [x for x in words if x.islower()]
        verb = verbs[0]  # we want the '1st' one (python numbers from 0,1,2...)
        verb_index = words.index(verb) # where is the verb?
        subject = ' '.join(words[:verb_index])
        obj =  ' '.join(words[(verb_index+1):])  # 'object' is already used in python
        return (subject, obj, verb)
    
    
    def graph_from_relationships(fh,color_dict):
        '''
        fh:  a filehandle, i.e., an opened file, from which we can read lines
            and loop over
        '''
        G = networkx.DiGraph()
    
        for line in fh:
            if not line.strip():  continue # move on to the next line,
                                             # if our line is empty-ish
            (subj,obj,verb) = split_on_verb(line)
            color = color_dict[verb]
            # cf: python 'string templates', there are other solutions here
            # this is the 
            print "'%s' -> '%s' [color='%s'];" % (subj,obj,color)
            G.add_edge(subj,obj,color)
            # 
    
        return G
    
    G = graph_from_relationships(file('ex.txt'),rel_colors)
    print G.edges()
    # from here you can use the various networkx plotting tools on G, as you're inclined.
    

答案 4 :(得分:0)

Python 2.5:

import sys
from collections import defaultdict

codes = defaultdict(lambda: ("---", "Missing action!"))
codes["likes"] =    ("-->", "red")
codes["dislikes"] = ("-/>", "green")
codes["loves"] =    ("==>", "blue")

for line in sys.stdin:
    subject, verb, object_ = line.strip().split(" ")
    arrow, color = codes[verb]
    print subject, arrow, object_, color, ";"

答案 5 :(得分:0)

除了这个问题,Karasu还说(在一个答案的评论中):“在实际输入中,主题和对象在一到两个单词之间不可预测地变化。”

好的,这就是我如何解决这个问题。

color_map = \
{
    "likes" : "red",
    "dislikes" : "blue",
    "knows" : "black",
}

def is_verb(word):
    return word in color_map

def make_noun(lst):
    if not lst:
        return "--NONE--"
    elif len(lst) == 1:
        return lst[0]
    else:
        return "_".join(lst)


for line in open("filename").readlines():
    words = line.split()
    # subject could be one or two words
    if is_verb(words[1]):
        # subject was one word
        s = words[0]
        v = words[1]
        o = make_noun(words[2:])
    else:
        # subject was two words
        assert is_verb(words[2])
        s = make_noun(words[0:2])
        v = words[2]
        o = make_noun(words[3:])
    color = color_map[v]
    print "%s -> %s %s;" % (s, o, color)

一些注意事项:

0)对于这个问题,我们并不需要“with”,并且以这种方式编写它会使程序更容易移植到旧版本的Python。这应该适用于Python 2.2及更新版本,我认为(我只在Python 2.6上测试过)。

1)您可以更改make_noun()以获得您认为对处理多个单词有用的任何策略。我展示了将它们与下划线链接在一起,但你可以有一个带形容词的词典并将它们扔出去,有一个名词词典并选择那些,或者其他什么。

2)您还可以使用正则表达式进行模糊匹配。您可以拥有一个元组列表,使用正则表达式与替换颜色配对,而不是简单地使用color_map字典,然后在正则表达式匹配时替换颜色。

答案 6 :(得分:0)

以下是我之前回答的改进版本。这个使用正则表达式匹配来对动词进行模糊匹配。这些都有效:

Steve loves Denise
Bears love honey
Maria interested Anders
Maria interests Anders

正则表达式模式“喜欢?”匹配“爱”加上一个可选的“s”。模式“兴趣。*”匹配“兴趣”加上任何东西。如果任何一个备选项匹配,则具有由竖线分隔的多个备选项的模式匹配。

import re

re_map = \
[
    ("likes?|loves?|interest.*", "red"),
    ("dislikes?|hates?", "blue"),
    ("knows?|tolerates?|ignores?", "black"),
]

# compile the regular expressions one time, then use many times
pat_map = [(re.compile(s), color) for s, color in re_map]

# We dont use is_verb() in this version, but here it is.
# A word is a verb if any of the patterns match.
def is_verb(word):
    return any(pat.match(word) for pat, color in pat_map)

# Return color from matched verb, or None if no match.
# This detects whether a word is a verb, and looks up the color, at the same time.
def color_from_verb(word):
    for pat, color in pat_map:
        if pat.match(word):
            return color
    return None

def make_noun(lst):
    if not lst:
        return "--NONE--"
    elif len(lst) == 1:
        return lst[0]
    else:
        return "_".join(lst)


for line in open("filename"):
    words = line.split()
    # subject could be one or two words
    color = color_from_verb(words[1])
    if color:
        # subject was one word
        s = words[0]
        o = make_noun(words[2:])
    else:
        # subject was two words
        color = color_from_verb(words[1])
        assert color
        s = make_noun(words[0:2])
        o = make_noun(words[3:])
    print "%s -> %s %s;" % (s, o, color)

我希望很清楚如何回答并扩展它。您可以轻松添加更多模式以匹配更多动词。您可以添加逻辑来检测“是”和“进入”并丢弃它们,以便“Anders对Maria感兴趣”将匹配。等等。

如果您有任何疑问,我很乐意进一步解释。祝你好运。