将多行数据作为单个记录处理

时间:2014-11-17 06:24:33

标签: python

此代码按预期工作并返回日期。

str='Date : {{2014,8,7},{8,48,48}} :: Connected to ["280",46,"179",46,"67",46,"194",58,"2345"]'

a = str.split(':')[1].split(',')[0][-4:]+'-'+str.split(':')[1].split(',')[1].zfill(2)+'-'+str.split(':')[1].split(',')[2].replace('}', '').zfill(2)

import datetime
datetime.datetime.strptime(a, '%Y-%m-%d')

(提取日期+时间有简单方法吗?)

我在2行的文本文件中有相同的字符串。

Date : {{2014,8,7},{8,48,48}} ::  
     Connected to ["280",46,"179",46,"67",46,"194",58,"2345"]

如何将这两行作为单个记录处理? 它可以跨越3行:

Date : {{2014,8,7},{11,6,49}} :: Queue initailized !!! [{rps,30},
                                                        {queue_file,
                                                         "./sample_esme.dqueue"}] 

我无法逐行处理文件,因为无法将日期戳与已连接的服务器或已初始化的队列链接起来。

6 个答案:

答案 0 :(得分:1)

像这样的东西

l = list() 
for line in text: 
    if line.startswith('Date') and l:
        multiline = "".join(l)
        **some processing**        
        l = list()
    l.append(line) 

答案 1 :(得分:1)

您可以创建一个连接以空格开头的行的迭代器。像这样:

def join_start_with_whitespace(it):
    cur = []

    for line in it:
        if line.startswith(' '):
            cur.append(line.strip())
        elif line:
            if cur: yield ''.join(cur)
            cur = [line.strip()]
    if cur: yield ''.join(cur)

演示:

data = '''
Date : {{2014,8,7},{11,6,49}} :: Queue initailized !!! [{rps,30},
                                                        {queue_file,
                                                         "./sample_esme.dqueue"}]
Date : {{2014,8,7},{11,6,50}} :: Queue initailized !!! [{rps,30},
                                                        {queue_file,
                                                         "./sample_esme.dqueue"}]
Date : {{2014,8,7},{11,6,51}} :: Queue initailized !!! [{rps,30},
                                                        {queue_file,
                                                         "./sample_esme.dqueue"}]
'''.split('\n')

print(list(join_start_with_whitespace(data)))

输出:

['Date : {{2014,8,7},{11,6,49}} :: Queue initailized !!! [{rps,30},{queue_file,"./sample_esme.dqueue"}]',
 'Date : {{2014,8,7},{11,6,50}} :: Queue initailized !!! [{rps,30},{queue_file,"./sample_esme.dqueue"}]',
 'Date : {{2014,8,7},{11,6,51}} :: Queue initailized !!! [{rps,30},{queue_file,"./sample_esme.dqueue"}]']

答案 2 :(得分:1)

使用re模块。如果该行与日期模式匹配,则它将返回非空列表。在没有给你完整解决方案的情况下,我告诉你如何使用re来轻松完成这项工作。

按照另一个答案中的建议将3行连接成1行:

import re
>>> re.findall(r'^.*{{(\d+),(\d+),(\d+)},.*$', line)
[('2014', '8', '7')]

>>> mydate = re.findall(r'^.*{{(\d+),(\d+),(\d+)},.*$', line)
>>> '-'.join(mydate[0])
'2014-8-7'

答案 3 :(得分:1)

要提取日期时间,可以使用正则表达式:

str='Date : {{2014,8,7},{8,48,48}} :: Connected to ["280",46,"179",46,"67",46,"194",58,"2345"]'

import re, datetime
regex = re.compile('Date\s*:\s*\{(?P<val>.+)\}')
s = re.search(regex,str).group('val')
print datetime.datetime.strptime(s, '{%Y,%m,%d},{%H,%M,%S}')

输出:

2014-08-07 08:48:48

答案 4 :(得分:1)

使用Regex可能是更好的选择。 我举了一个例子。

import re

s = """Date : {{2014,8,7},{8,48,48}} ::  Connected to ["280",46,"179",46,"67",46,"194",58,"2345"]"""

m = re.match(r"^Date : {{(?P<year>\d+),(?P<month>\d+),(?P<date>\d+)},{8,48,48}}", s)

print m.group('year')
print m.group('month')
print m.group('date')

答案 5 :(得分:0)

如果文件大小(test1.txt)足够小,则可以使用。

inner=''
mylist=[]
final=[]

for line in open('test1.txt', 'r'): 
    inner += line 
    mylist=inner.split('Date :')

for item in mylist:
    str=item.split('::')[0]
    try:
        import re, datetime
        regex = re.compile('\{(?P<val>.+)\}')
        s = re.search(regex,str).group('val')
        myd=datetime.datetime.strptime(s, '{%Y,%m,%d},{%H,%M,%S}')
        final.append((myd, item.split('::')[1]))
    except:
        pass

import pymysql
conn = pymysql.connect(host='localhost', port=3306, user='dba', passwd='dba', db='test')
cur = conn.cursor()
query="""INSERT INTO logs  (mydate, mytext) VALUES (%s, %s) """    
cur.executemany(query, final)
conn.commit()