I am processing, with python, a long list of data that looks like this
The digraphs are probably due to encoding problems. (I am not sure whether these characters will be preserved in this site)
29/07/2016 04:00:12 0.125143
Now, when I read such file into a script using something like open
and readlines
, there is an error, reading
SyntaxError: EOL while scanning string literal
I know (or may look up usage of) replace and regex functions, but I cannot do them in my script. The biggest problem is that anywhere I include or read such strange character, error occurs, pointing on the very line it is read. So I cannot do anything to them.
答案 0 :(得分:1)
Are you reading a file? If so, try to extract values using regexps, not to remove extra characters:
re.search(r'^([\d/: ]{19})', line).group(1)
re.search(r'([\d.]{7})', line).group(1)
答案 1 :(得分:0)
我发现re.findall
有效。 (对不起,我没有时间测试所有其他方法,因为这项工作的意义已经消失,我甚至忘记了这个问题。)
def extract_numbers(str_i):
pat="(\d+)/(\d+)/(\d+)\D*(\d+):(\d+):(\d+)\D*(\d+)\.(\d+)"
match_h = re.findall(pat, str_i)
return match_h[0]
# ....
# `f` is the handle of the file in question
lines =f.readlines()
for l in lines:
ls_f =extract_numbers(l)
# process them....