我有这个字符串:
"01:00 AM ART Partly Cloudy 14C 01:00 PM ART Mostly Sunny 25C 06:00 PM ART Mostly Cloudy 23C"
我想按时间格式(01:00 AM
,01:00 PM
和06:00 PM
)进行拆分,每次时间格式可能不同。
因此,我尝试将其转换为列表以循环通过:
因此,我得到以下列表:
[u'', u'01:00', u'AM', u'ART', u'', u'Partly', u'Cloudy', u'14C', u'01:00', u'PM', u'ART', u'', u'Mostly', u'Sunny', u'25C', u'06:00', u'PM', u'ART', u'', u'Mostly', u'Cloudy', u'23C', u'']
我想删除空格和空字符(但您无法看到它),并从所有列表中获取另一个包含三个项目的列表:
"01:00 AM ART Partly Cloudy 14C"
"01:00 PM ART Mostly Sunny 25C"
"06:00 PM ART Mostly Cloudy 23C"
当然,根据在字符串中找到的“单词”的时间,可以有一个,一个甚至三个以上。 这是我到目前为止一直尝试做的事情:
w_table = soup.find("table", border="0", width="650", cellspacing="0", cellpadding="0")
w_text = w_table.text.split(" ")
refined_w = ""
for word_w in w_text:
if word_w != " " or word_w != "":
refined_w += word_w.strip() + " "
print refined_w
w_list = refined_w.split(" ")
print w_list
found_w = []
for element_w in w_list:
if validate_date(element_w):
for index in range(len(w_list)):
if w_list[index] == element_w and index not in found_w:
print index
found_w.append(index)
print found_w
for i in found_w:
print w_list[i:]
提前谢谢!
答案 0 :(得分:0)
我不确定这是否是最好的解决方案,但可以完成工作。
import re
try:
from itertools import zip_longest
except ImportError:
from itertools import izip_longest as zip_longest
def grouper(n, iterable, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
def split_by_date(input_data):
splitted = re.split(r'([0-9]{2}:[0-9]{2} (AM|PM))', input_data)
splitted.remove('')
return ['{}{}'.format(a, c).strip() for a, _, c in grouper(3, splitted)]
样品用量:
>>> split_by_date('01:00 AM ART Partly Cloudy 14C 01:00 PM ART Mostly Sunny 25C 06:00 PM ART Mostly Cloudy 23C')
['01:00 AM ART Partly Cloudy 14C', '01:00 PM ART Mostly Sunny 25C', '06:00 PM ART Mostly Cloudy 23C']
>>> split_by_date('01:35 PM some very random string 16:65 AM Yet another string')
['01:35 PM some very random string', '16:65 AM Yet another string']
try
块只是为了确保python2 / 3兼容性。 grouper
函数是itertools
模块文档(https://docs.python.org/3/library/itertools.html#itertools-recipes)中的食谱。
编辑
我可以通过稍微更改正则表达式并使用itertools.islice
来摆脱grouper
函数。我希望它使代码更易读。
import re
from itertools import islice
def split_by_date(input_data):
splitted = re.split(r'([0-9]{2}:[0-9]{2} AM|[0-9]{2}:[0-9]{2} PM)', input_data)
splitted.remove('')
iter_a = islice(splitted, 0, None, 2)
iter_b = islice(splitted, 1, None, 2)
return ['{}{}'.format(a, b).strip() for a, b in zip(iter_a, iter_b)]
用法相同。