Question

我有一个文件，其中包含以下格式的每一行

"('-1259656819525938837', 598679497)\t0.036787946"    # "\t" within the string is the tab sign

我需要把组件拿出来

-1259656819525938837  #string, it is the content within ' '
598679497     # long
0.036787946   # float

Python 2.6

Answer 1

您可以使用re模块中的正则表达式：

import re
s = "('-1259656819525938837', 598679497)\t0.036787946"
re.findall(r'[-+]?[0-9]*\.?[0-9]+', s)
% gives: ['-1259656819525938837', '598679497', '0.036787946']

Answer 2

"2.7.0_bf4fda703454".split("_")给出了一个字符串列表：

In [1]: "2.7.0_bf4fda703454".split("_")
Out[1]: ['2.7.0', 'bf4fda703454']

这会在每个下划线处拆分字符串。如果您希望在第一次拆分后停止，请使用"2.7.0_bf4fda703454".split("_", 1).

如果您知道该字符串包含下划线这一事实，您甚至可以将LHS和RHS解压缩到单独的变量中：

In [8]: lhs, rhs = "2.7.0_bf4fda703454".split("_", 1)

In [9]: lhs
Out[9]: '2.7.0'

In [10]: rhs
Out[10]: 'bf4fda703454'

Answer 3

您可以使用正则表达式从字符串中提取数字和浮点数：

>>> import re
>>> a = "('-1259656819525938837', 598679497)\t0.036787946"
>>> re.findall(r'[-?\d\.\d]+', a)
['-1259656819525938837', '598679497', '0.036787946']

如何在Python中解析这个字符串？

3 个答案: