Python库-出版物引用拆分

时间:2019-02-20 11:56:30

标签: python citations biblatex

我有一堆引文字符串,我想将它们分成一个引文。这是我在OWL引用网站上找到的示例。我有MLA,APA,.. etc等引用类型的组合。是否有python库或其他应用程序可以将这些字符串拆分为列表中的元素。由于引文类型的多样性,我尝试避免使用正则表达式,并且还尝试了使用“ / n”分隔,但是,我的某些字符串没有使用“ / n”定界符...因此您可以看到问题。我想知道是否有更好的捕获方法。我不是要捕获名称,日期,标题...找到一个可以做到这一点的库...我只需要将字符串分开即可。任何帮助将非常感激!!!!谢谢!

输入字符串-示例

Dean, Cornelia. "Executive on a Mission: Saving the Planet." The New York Times, 22 May 2007, www.nytimes.com/2007/05/22/science/earth/22ander.html?_r=0. Accessed 12 May 2016.

Ebert, Roger. Review of An Inconvenient Truth, directed by Davis Guggenheim. rogerebert.com, 1 June 2006, www.rogerebert.com/reviews/an-inconvenient-truth-2006. Accessed 15 June 2016.

输出-样本

['Dean, Cornelia. "Executive on a Mission: Saving the Planet." The New York Times, 22 May 2007, www.nytimes.com/2007/05/22/science/earth/22ander.html?_r=0. Accessed 12 May 2016.',
'Ebert, Roger. Review of An Inconvenient Truth, directed by Davis Guggenheim. rogerebert.com, 1 June 2006, www.rogerebert.com/reviews/an-inconvenient-truth-2006. Accessed 15 June 2016.']

2 个答案:

答案 0 :(得分:0)

尝试split,然后使用filter删除空元素:

string = '''Dean, Cornelia. "Executive on a Mission: Saving the Planet." The New York Times, 22 May 2007, www.nytimes.com/2007/05/22/science/earth/22ander.html?_r=0. Accessed 12 May 2016.

Ebert, Roger. Review of An Inconvenient Truth, directed by Davis Guggenheim. rogerebert.com, 1 June 2006, www.rogerebert.com/reviews/an-inconvenient-truth-2006. Accessed 15 June 2016.'''

result = list(filter(None, string.split('\n')))

输出:

['Dean, Cornelia. "Executive on a Mission: Saving the Planet." The New York Times, 22 May 2007, www.nytimes.com/2007/05/22/science/earth/22ander.html?_r=0. Accessed 12 May 2016.', 'Ebert, Roger. Review of An Inconvenient Truth, directed by Davis Guggenheim. rogerebert.com, 1 June 2006, www.rogerebert.com/reviews/an-inconvenient-truth-2006. Accessed 15 June 2016.']

答案 1 :(得分:0)

如果要将字符串s用换行符分隔符\n进行拆分,则可以将字符串方法splitlines()与listcomp一起使用,以过滤空元素:

[i for i in s.splitlines() if i]