Question

我试图解析逗号分隔字符串的不同部分。

以下是两个示例字符串：

植物群落和土壤在低温扰动的苔原沿着生物气候梯度在低北极，阿拉斯加，Phytocoenologia，2005年第35期，p。 761。

可视化冰霜，科学与工程中的挑战，2005年第13期，p。 18。

我需要将页码，年份，卷（第13行），日记和标题存储到单独的变量中。我想从后面处理这些字符串，因为标题中可能有逗号（计划分割逗号），字符串的后端非常一致。关于如何向后解决这个问题的任何指示都会非常有帮助。谢谢！

对于第二个例子：

page = 'p.18'
year = '2005'
volume = 'v.13'
journal = 'Challenges in Science and Engineering' 
title = 'Visualizing Frost Boils'

Answer 1

title,journal,vol,year,page = my_string.rsplit(',',4)

我认为是你想要的

Answer 2

您可以使用rsplit()：

>>> s = 'Visualizing Frost Boils,Challenges in Science and Engineering, v.13, 2005, p. 18.'
>>> title, journal, volume, year, page = [entry.strip() for entry in  s.rsplit(',', 4)]
>>> page
'p. 18.'
>>> year
'2005'
>>> volume
'v.13'
>>> journal
'Challenges in Science and Engineering'
>>> title
'Visualizing Frost Boils'

您可以使用逗号rsplit(',' 4)从右侧开始拆分字符串，并将拆分数限制为4。 entry.strip()会删除条目周围的空白区域。

Answer 3

如果逗号的数量始终相同，您可以编写一个函数来获取各种逗号的索引，然后在索引之间返回字符串。

例如，如果我们计算有4个逗号，我们就有：

title = string[:comma_index1]
year = string[comma_index1:comma_index2]
volume = string[comma_index2:comma_index3]
year = string[comma_index3:comma_index4]
page = string[comma_index4:]

这可能是一种天真的方式来做到这一点。

Answer 4

就此而言，我会使用正则表达式。

>>> import re
>>> c = re.compile('(.*), v.(\d*), (\d*), p. (\d*).')
>>> c.match('Plant communities and soils in cryoturbated tundra along a bioclimate gradient in the Low Arctic, Alaska,Phytocoenologia, v.35, 2005, p. 761.').group(1,2,3,4)

('Plant communities and soils in cryoturbated tundra along a bioclimate gradient in the Low Arctic, Alaska,Phytocoenologia', '35', '2005', '761')

如何使用split命令从后端处理字符串

4 个答案: