我有一个包含大量这样的行的文本文件。
NOTE: Variable Variable_S1 already exists on file D1.D, using Var_S8 instead.
NOTE: The variable name more_than_eight_letters_m has been truncated to ratio_s.
NOTE: Variable ratio_s already exists on file D1.D, using Var_S9 instead.
我正在尝试创建一个包含2列的列表:
Variable_S1 Var_S8
more_than_eight_letters Var_S9
有人可以告诉我如何使用sed或python甚至R吗?
答案 0 :(得分:1)
我不知道sed或R,但是在Python中:
>>> import re
>>> i = """NOTE: Variable Variable_S1 already exists on file D1.D, using Var_S8 instead.
NOTE: The variable name more_than_eight_letters_m has been truncated to ratio_s.
NOTE: Variable ratio_s already exists on file D1.D, using Var_S9 instead."""
>>> print(re.findall(r'(\w+_\w+)', i))
['Variable_S1', 'Var_S8', 'more_than_eight_letters_m', 'ratio_s', 'ratio_s', 'Var_S9']
这是一个改进版本,它将为您提供每行的变量集:
>>> print([re.findall(r'(\w+_\w+)', line) for line in i.split('\n')])
[['Variable_S1', 'Var_S8'],
['more_than_eight_letters_m', 'ratio_s'],
['ratio_s', 'Var_S9']]