如何从字符串中提取特定单词?

时间:2017-03-31 06:02:57

标签: python mysql

我有一个包含多行的文件,想要提取每行的前三个单词。

str = []

str = [
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web1 journal: afg-prod-web1 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"]

我想提取date ie。每行Feb 17 07:10:07并将其放入数组中。

我尝试应用for循环,但它出错:

IndexError: list index out of range

我试过的代码:

for i in splitdata:
            abc  = splitdata[logcount]
            aa = abc.split()
            if(aa[0] == "Feb"):
                aaa = "".join([aa[0],' ',aa[1],' ',aa[2]])
                logtime.append(aaa)
                logcount += 2   
            else:
                pass
        print logtime

2 个答案:

答案 0 :(得分:0)

如果您的日志保存在名为log.log的文件中,您可以通过执行以下操作来获取日期:

with open('log.log') as f: 
    log_time = []
    for line in f:
        log_time.append(line[:15])
print(log_time) 

答案 1 :(得分:0)

您只需检查len(拆分字符串)以避免此类错误。改进代码有很多空间。

  • 使用可重复使用的方法
  • 在按索引
  • 访问之前检查列表的列表
  • 你不需要为python中的if条件设置括号
  • 以聪明的方式使用列表理解
  • 您习惯加入列表的代码显示您需要在python中学到很多东西。祝你好运!
In [1]: sample_text = """Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A
   ...:  \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime
   ...: -ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \
   ...: x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242a
   ...: c110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A
   ...: {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-A
   ...: CTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A
   ...: \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A
   ...:         }\x0A    ]\x0A}"""

In [2]: def get_time_from_log(log_text):
   ...:     log_text_split = log_text.split(" ")
   ...:     if len(log_text_split) < 3:
   ...:         pass
   ...:     elif log_text_split[0] == "Feb":
   ...:         return " ".join(log_text_split[0:3])
   ...:

In [3]: get_time_from_log(sample_text)
Out[3]: 'Feb 17 07:10:07'