Python使用可选的键/ val对标记句子

时间:2013-07-22 18:50:59

标签: python regex tokenize text-parsing

我正在尝试解析您有句子的句子(或文本行),并且可选地在同一行上跟随一些键/值对。键/值对不仅是可选的,而且是动态的。我正在寻找一个类似的结果:

输入:

"There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

输出:

Values = {'theSentence' : "There was a cow at home.",
          'home' : "mary",
          'cowname' : "betsy",
          'date'= "10-jan-2013"
         }

输入:

"Mike ordered a large hamburger. lastname=Smith store=burgerville"

输出:

Values = {'theSentence' : "Mike ordered a large hamburger.",
          'lastname' : "Smith",
          'store' : "burgerville"
         }

输入:

"Sam is nice."

输出:

Values = {'theSentence' : "Sam is nice."}

感谢任何输入/指示。我知道句子看起来这是一个家庭作业问题,但我只是一个蟒蛇新手。我知道它可能是一个正则表达式的解决方案,但我不是最好的正则表达式。

9 个答案:

答案 0 :(得分:3)

我会使用re.sub

import re

s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

d = {}

def add(m):
    d[m.group(1)] = m.group(2)

s = re.sub(r'(\w+)=(\S+)', add, s)
d['theSentence'] = s.strip()

print d

如果您愿意,可以使用以下更紧凑的版本:

d = {}
d['theSentence'] = re.sub(r'(\w+)=(\S+)',
    lambda m: d.setdefault(m.group(1), m.group(2)) and '',
    s).strip()

或许,findall可能是更好的选择:

rx = '(\w+)=(\S+)|(\S.+?)(?=\w+=|$)'
d = {
    a or 'theSentence': (b or c).strip()
    for a, b, c in re.findall(rx, s)
}
print d

答案 1 :(得分:1)

第一步是做

inputStr = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
theSentence, others = str.split('.')

然后你想要分手“其他人”。使用split()(你传入的参数告诉Python将字符串拆分为什么),看看你能做些什么。 :)

答案 2 :(得分:1)

如果您的句子保证在.结束,那么您可以按照以下方法。

>>> testList = inputString.split('.')
>>> Values['theSentence'] = testList[0]+'.'

对于其余的值,只需这样做。

>>> for elem in testList[1].split():
        key, val = elem.split('=')
        Values[key] = val

给你Values这样的

>>> Values
{'date': '10-jan-2013', 'home': 'mary', 'cowname': 'betsy', 'theSentence': 'There was a cow at home.'}
>>> Values2
{'lastname': 'Smith', 'theSentence': 'Mike ordered a large hamburger.', 'store': 'burgerville'}
>>> Values3
{'theSentence': 'Sam is nice.'}

答案 3 :(得分:1)

假设只有一个点,则将句子和赋值对分开:

input = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
sentence, assignments = input.split(". ")

result = {'theSentence': sentence + "."}
for item in assignments.split():
    key, value = item.split("=")
    result[key] = value

print result

打印:

{'date': '10-jan-2013', 
 'home': 'mary', 
 'cowname': 'betsy', 
 'theSentence': 'There was a cow at home.'}

答案 4 :(得分:0)

假设=没有出现在句子本身中。这似乎比假设句子以.结尾更有效。

s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

eq_loc = s.find('=')
if eq_loc > -1:
    meta_loc = s[:eq_loc].rfind(' ')
    s = s[:meta_loc]
    metastr = s[meta_loc + 1:]

    metadict = dict(m.split('=') for m in metastr.split())
else:
    metadict = {}

metadict["theSentence"] = s

答案 5 :(得分:0)

和往常一样,有很多方法可以做到这一点。这是一个基于正则表达式的方法,用于查找键=值对:

import re

sentence = "..."

values = {}
for match in re.finditer("(\w+)=(\S+)", sentence):
    if not values:
        # everything left to the first key/value pair is the sentence                                                                               
        values["theSentence"] = sentence[:match.start()].strip()
    else:
        key, value = match.groups()
        values[key] = value
if not values:
    # no key/value pairs, keep the entire sentence
    values["theSentence"] = sentence

这假设密钥是Python样式的标识符,并且该值由一个或多个非空白字符组成。

答案 6 :(得分:0)

假设第一个句点将句子与值分开,您可以使用以下内容:

#! /usr/bin/python3

a = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

values = (lambda s, tail: (lambda d, kv: (d, d.update (kv) ) ) ( {'theSentence': s}, {k: v for k, v in (x.split ('=') for x in tail.strip ().split (' ') ) } ) ) (*a.split ('.', 1) ) [0]

print (values)

答案 7 :(得分:0)

没有人发布可理解的单行。问题得到解答,但必须在一行中完成,这是Python方式!

{"theSentence": sentence.split(".")[0]}.update({item.split("=")[0]: item.split("=")[1] for item in sentence.split(".")[1].split()})

呃,不是超级优雅,但它完全在一条线上。甚至没有进口。

答案 8 :(得分:0)

使用正则表达式findall。第一个捕获组是句子。 |是第二个捕获组的 or 条件:一个或多个空格、一个或多个字符、等号和一个或多个非空格字符。

from PyQt5 import QtWidgets
from PyQt5.QtWidgets import QApplication, QMainWindow, QPushButton, QFileDialog
from PyQt5 import QtCore
import sys
 
 
def dialog():
    file , check = QFileDialog.getOpenFileName(None, "QFileDialog.getOpenFileName()",
                    "", "All Files (*);;Python Files (*.py);;Text Files (*.txt)")
    if check:
        print(file)
 
app = QApplication(sys.argv)
win = QMainWindow()
win.setGeometry(400,400,300,300)
win.setWindowTitle("CodersLegacy")
  
button = QPushButton(win)
button.setText("Press")
button.clicked.connect(dialog)
button.move(50,50)
 
win.show()
sys.exit(app.exec_())

输出:

onChangeField(e) {
    const value = e.target.value;
    const name = e.target.options[e.target.selectedIndex]?.getAttribute('name');
    console.log(value)                 // 1234
    console.log(name)                 // will print correct value
};