给出一个文字:
start_KA03MM7155_RKMS121MI4-4.21005_NEW_end, 2018-01-02 09:48:23
。
如何使用python将2018-01-02
作为020118
提取为09:48:23
,将094823
提取为另一个变量中的name := "scala_spark_stream_metrices"
version := "1.0"
scalaVersion := "2.11.8"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.9.5"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.5"
dependencyOverrides += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.9.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kinesis-asl
libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.3.0"
// https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark
libraryDependencies += "org.elasticsearch" % "elasticsearch-hadoop" % "6.2.3"
libraryDependencies += "com.maxmind.geoip2" % "geoip2" % "2.12.0"
?
答案 0 :(得分:2)
如果您在字符串中的日期遵循YYYY-MM-DD或YYYY-MM-DD模式,
用于提取日期字段的代码
import re
text = 'start_KA03MM7155_RKMS121MI4-4.21005_NEW_end, 2018-01-02 09:48:23'
result = re.search('(\d{4}-\d{2}-\d{2})', text).group(0)
print('result: ', result)
result: 2018-01-02
然后你可以操作字符串来获得所需的输出,对于你的情况
split_data = d.split('-') #split the string
date_pattern = split_data[-1] + split_data[-2] + split_data[-3][-2:]
print('date Pattern: ', date_pattern)
date Pattern: 020118
通过正则表达式模式的微小变化,您可以节省时间
time_pattern = re.search('(\d{2}:\d{2}:\d{2})', a).group(0).replace(':', '')
print('time_pattern: ', time_pattern)
time_pattern: 094823
简要说明:
\d
查找数字
\d{4}
匹配4位数字
(\d{4}-\d{2}-\d{2})
查找具有(4位数) - (2位数) - (2位数)的组
要了解有关正则表达式的更多信息,请遵循official link
答案 1 :(得分:1)
快速 脏 方式,
date = re.sub('-', '', re.findall('\d{4}-\d{2}-\d{2}',a)[0]) # '20180102'
time = re.sub(':', '', re.findall('\d{2}:\d{2}:\d{2}',a)[0]) # '094823'