访问Python对象中的文本片段

时间:2016-04-15 15:32:26

标签: python parsing

我有一个看起来像的对象 block = [{'id':'10001', 'date':'2016-01-11', 'text':'this is some text. grab 40'},{'id':'10002', 'date':'2014-03-12', 'text':'this is some more text. grab 60'}]

我想抓住text中的项目并重新格式化我的对象,使其看起来像: block = [{'id':'10001', 'date':'2016-01-11', 'text':'this is some text. grab 40', 'grabbed': '40'},{'id':'10002', 'date':'2014-03-12', 'text':'this is some more text. grab 60', 'grabbed': '60'}]

我试过

for item in block:
 if "grab" in item['text']:
         m=re.search('grab (..)',line)
 print m

但得到了错误

Traceback (most recent call last): File "<stdin>", line 3, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string) TypeError: expected string or buffer

4 个答案:

答案 0 :(得分:1)

无需正则表达式。你可以这样做

for b in block:
    b["grabbed"] = b["text"].rstrip().rsplit(" ",1)[-1]

In [205]: block
Out[205]:
[{'date': '2016-01-11',
  'grabbed': '40',
  'id': '10001',
  'text': 'this is some text. grab 40'},
 {'date': '2014-03-12',
  'grabbed': '60',
  'id': '10002',
  'text': 'this is some more text. grab 60'}]

答案 1 :(得分:0)

假设抓取后只有2位数字,并且在一个字符串中只有一个“抓取xx”

for item in block:
 if "grab it" in item['text']:
        m = re.findall('grab \d{2}',item['text'])[0]
 print m

或者在抓住后假设总是至少有一位数

for item in block:
     if "grab it" in item['text']:
            m = re.findall('grab \d+',item['text'])[0]
     print m

答案 2 :(得分:0)

嗨看起来你的正则表达式的输入是关闭的:

m=re.search('grab (..)',line)

“线”来自哪里?那是一个字符串吗?你不想搜索“item ['text']”吗? 另请注意,“re.search”不会返回匹配项;使用例如re.findall()。

答案 3 :(得分:0)

此程序将修改您在问题中描述的block

from pprint import pprint
import re

block = [{'id':'10001', 'date':'2016-01-11', 'text':'this is some text. grab 40'},{'id':'10002', 'date':'2014-03-12', 'text':'this is some more text. grab 60'}]


pprint("Before:")
pprint(block)

for item in block:
    grab = re.search(r"grab\s+(\d+)", item['text'])
    if grab:
        item['grabbed'] = grab.groups()[0]

pprint("After:")
pprint(block)