AWK:如何清理bibtex文件?

时间:2015-09-29 07:22:12

标签: awk bibtex

我有一个bibtex文件(从Zotero导出),我想通过删除特定字段来清理它。

例如,从以下条目中删除文件字段:

@inproceedings{sridharan_fast_2008,
    title = {Fast {Rates} for {Regularized} {Objectives}.},
    urldate = {2014-03-26},
    booktitle = {{NIPS}},
    author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
    year = {2008},
    pages = {1545--1552},
    file = {3400-fast-rates-for-regularized-objectives.pdf:/home/johnros/.zotero/zotero/66g0wvis.default/zotero/storage/6ND67P5F/3400-fast-rates-for-regularized-objectives.pdf:application/pdf}
}

3 个答案:

答案 0 :(得分:3)

您可以很轻松地使用grep执行此操作:

grep -v "^\s*file =" bibtext.txt

上一条记录的尾随逗号应该不是问题... see here

或者,如果你真的热衷于awk

awk '!/file = /' bibtext.txt

答案 1 :(得分:2)

我不熟悉bibtex格式,如果有一些工具可以更好地编辑这些格式,那么你应该选择这些工具。

如果你想使用awk来处理它,这里有一个gnu awk one-liner:

awk -v RS=',\n\\s*file\\s*=\\s[^\\n]*' '7' file

基本上,它只是使用RS变量,删除file=行以及前一个结束逗号",",以便保持生成的输出仍然是有效的bibtex格式。 (我希望是这样)。

用你的例子测试:

kent$  cat f
@inproceedings{sridharan_fast_2008,
    title = {Fast {Rates} for {Regularized} {Objectives}.},
    urldate = {2014-03-26},
    booktitle = {{NIPS}},
    author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
    year = {2008},
    pages = {1545--1552},
    file = {3400-fast-rates-for-regularized-objectives.pdf:/home/johnros/.zotero/zotero/66g0wvis.default/zotero/storage/6ND67P5F/3400-fast-rates-for-regularized-objectives.pdf:application/pdf}
}

kent$  awk -v RS=',\n\\s*file\\s*=\\s[^\\n]*' '7' f
@inproceedings{sridharan_fast_2008,
    title = {Fast {Rates} for {Regularized} {Objectives}.},
    urldate = {2014-03-26},
    booktitle = {{NIPS}},
    author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
    year = {2008},
    pages = {1545--1552}

}

答案 2 :(得分:1)

我知道这是一个较老的问题,但对于那些仍然发现这一点的人:Zotero(Zotero Better BibTeX)的扩展允许您在Zotero内部执行此操作。完全披露:我是此扩展的作者。