格式化原始字符串Python

时间:2017-12-29 14:50:52

标签: python string format scrape

我在Python中有一个原始字符串,通过imap库检索。

看起来像这样:

DATE:

在每个标签之后废弃数据的最佳方法是什么(标签为DATE: 02/13/2018 * - ( previously 02/06/2018 ))例如,Date将设置为等于print(date)之类的变量,所以当打印02/13/2018 * - ( previously 02/06/2018 )for line in message: if "DATE:" in line: print line 将是输出。

我尝试了以下代码,但每行打印一个字符。谢谢!

RealmConfiguration config = new RealmConfiguration.Builder ()
     .deleteRealmIfMigrationNeeded ()
     .build ()

5 个答案:

答案 0 :(得分:5)

您可以使用正则表达式和字典:

import re
s = """
Season: Winter 2017-18
Activity: Basketball - Boys JV
*DATE: 02/13/2018 * - ( previously 02/06/2018 )
Event type: Game
Home/Host: Clear Lake
Opponent: Webster City
*START TIME: 6:15PM CST* - ( previously 4:30PM CST )
Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA
"""
final_dict = {(a[1:] if a.startswith('*') else a).strip('\r'):b.strip('\r') for a, b in filter(lambda x:len(x)> 1, [re.split('\:\s', i) for i in filter(None, s.split('\n'))])}

输出:

{'Home/Host': 'Clear Lake', 'Season': 'Winter 2017-18', 'START TIME': '6:15PM CST* - ( previously 4:30PM CST )', 'Location': 'Clear Lake High School, 125 N. 20th Street, Clear Lake, IA', 'Activity': 'Basketball - Boys JV', 'DATE': '02/13/2018 * - ( previously 02/06/2018 )', 'Event type': 'Game', 'Opponent': 'Webster City'}

答案 1 :(得分:3)

您可以使用str.splitlines()将字符串拆分为行。然后迭代这些行并使用regular expression来提取数据,例如:

import re

for line in message.splitlines():
    match = re.match(r'\*DATE: (.*)', line)
    if match:
        date = match.group(1)
        print date

答案 2 :(得分:2)

For line in message迭代消息中的每个项目:简单来说,消息是一个字符串,其项目是字符(因为它会迭代每个字符)。

Split是解决问题的简单/天真的方法,但只要您的数据不复杂得多,它就可能有效:

使用message.split("\n")在换行符上拆分字符串并迭代它。然后,您可以使用line.strip().strip("*").split(":", maxsplit=1)将键与值分开。第一个strip()删除可能保留的额外空格(例如潜在的" \ r"),第二个删除额外的星号。 maxsplit=1在第一个冒号处停止(如果您的数据有冒号作为标签的一部分,则可能会出现问题。)

我说键/值因为你可能不需要(或想要)动态地将对分配给实际变量,并且可能只是将它存储为dict并根据需要进行查询。

output = dict()
for line in message.split("\n"): ## Split Lines
    key,value = line.strip().split(":",maxsplit=1) ## Remove extra whitespace/* and split at the first colon
    output[key] = value

编辑:我的印象是" date"只是你的例子,但如果你正在寻找,那么显然只需添加行if key == "DATE"并返回/打印/等值。

答案 3 :(得分:0)

如果您的数据位于名为datafile.txt的文件中,您可以尝试:

for( $i=0; $i<$input; $i++ ){

    // number of asterisks we need for this iteration
    $chars='';
    for( $j=0; $j<=$i; $j++ ){
        $chars.="$char &nbsp;";
    }

    // add spaces for alignment
    $padding='';
    for( $j=$i; $j<$input; $j++ ){
        $padding.="&nbsp; &nbsp;";
    }

    // output "padding" first, then our asterisks string
    echo $padding.$chars."<br />";

}

答案 4 :(得分:0)

这个解决方案有效(我认为相当“Pythonic”):

lines = message.split("\n") # Split your message into "lines"
sections = [line.split(": ") for line in lines] # Split lines by the "colon space"
message_dict = {section[0].lstrip(' '): section[1] for section in sections} # Dictionary comprehension to put your keys and values into a dict struct. Also removes leading whitespace from your keys.