Question

我有这个字符串：

"01:00 AM ART  Partly Cloudy 14C 01:00 PM ART  Mostly Sunny 25C 06:00 PM ART  Mostly Cloudy 23C"

我想按时间格式（01:00 AM，01:00 PM和06:00 PM）进行拆分，每次时间格式可能不同。因此，我尝试将其转换为列表以循环通过：因此，我得到以下列表：

[u'', u'01:00', u'AM', u'ART', u'', u'Partly', u'Cloudy', u'14C', u'01:00', u'PM', u'ART', u'', u'Mostly', u'Sunny', u'25C', u'06:00', u'PM', u'ART', u'', u'Mostly', u'Cloudy', u'23C', u'']

我想删除空格和空字符（但您无法看到它），并从所有列表中获取另一个包含三个项目的列表：

第一项："01:00 AM ART Partly Cloudy 14C"
第二项："01:00 PM ART Mostly Sunny 25C"
第三项："06:00 PM ART Mostly Cloudy 23C"

当然，根据在字符串中找到的“单词”的时间，可以有一个，一个甚至三个以上。这是我到目前为止一直尝试做的事情：

w_table = soup.find("table", border="0", width="650", cellspacing="0", cellpadding="0")
w_text = w_table.text.split(" ")
refined_w = ""
for word_w in w_text:
    if word_w != " " or word_w != "":
        refined_w += word_w.strip() + " "
print refined_w
w_list = refined_w.split(" ")
print w_list
found_w = []
for element_w in w_list:
    if validate_date(element_w):
        for index in range(len(w_list)):
            if w_list[index] == element_w and index not in found_w:
                print index
                found_w.append(index)
print found_w
for i in found_w:
    print w_list[i:]

提前谢谢！

Answer 1

我不确定这是否是最好的解决方案，但可以完成工作。

import re

try:
    from itertools import zip_longest
except ImportError:
    from itertools import izip_longest as zip_longest

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)

def split_by_date(input_data):
    splitted = re.split(r'([0-9]{2}:[0-9]{2} (AM|PM))', input_data)
    splitted.remove('')

    return ['{}{}'.format(a, c).strip() for a, _, c in grouper(3, splitted)]

样品用量：

>>> split_by_date('01:00 AM ART  Partly Cloudy 14C 01:00 PM ART  Mostly Sunny 25C 06:00 PM ART  Mostly Cloudy 23C')
['01:00 AM ART  Partly Cloudy 14C', '01:00 PM ART  Mostly Sunny 25C', '06:00 PM ART  Mostly Cloudy 23C']
>>> split_by_date('01:35 PM some very random string 16:65 AM Yet another string')
['01:35 PM some very random string', '16:65 AM Yet another string']

try块只是为了确保python2 / 3兼容性。 grouper函数是itertools模块文档（https://docs.python.org/3/library/itertools.html#itertools-recipes）中的食谱。

编辑

我可以通过稍微更改正则表达式并使用itertools.islice来摆脱grouper函数。我希望它使代码更易读。

import re
from itertools import islice


def split_by_date(input_data):
    splitted = re.split(r'([0-9]{2}:[0-9]{2} AM|[0-9]{2}:[0-9]{2} PM)', input_data)
    splitted.remove('')

    iter_a = islice(splitted, 0, None, 2)
    iter_b = islice(splitted, 1, None, 2)

    return ['{}{}'.format(a, b).strip() for a, b in zip(iter_a, iter_b)]

用法相同。

Python：如何按时间格式分割字符串？

1 个答案: