使用不同的字符作为分隔符分隔excel中的列

时间:2013-12-27 15:17:48

标签: python excel excel-vba citations vba

我有一个以这种方式排列的几千个来源的excel:

示例1:

Abbott KW, Snidal D (2009) The Governance Triangle: Regulatory Standards Institutions and the Shadow of the State. In: Mattli W , Woods N (eds) The Politics of Global Regulation, pp. 44–88. Princeton University Press, Princeton, NJ

示例2:

Moschella M , Tsingou E (eds) (2013) Great Expectations, Slow Transformations: Incremental Change in Financial Governance. ECPR Press, Colchester

我需要将这些数据分成7列:

  1. 第一作者
  2. 第二作者
  3. 第三位是N作者
  4. 出版年度
  5. 源文章标题
  6. 发表于(并非总是包括在内,但始终以In:开头)
  7. 更多信息 - 表示在源文章标题发布之后/之后的所有内容(如果它不是更大的出版物的一部分)
  8. 我尝试在excel中使用拆分列工具,但由于数据如此多变,我无法有效地完成。 有谁知道解决这个问题?

2 个答案:

答案 0 :(得分:1)

请参阅How to split Bibiliography MLA string into BibTex using c#?我链接到几个专用工具,用于从格式化文本中提取书目信息。

答案 1 :(得分:0)

试试这个VBA宏。它使用正则表达式来解析不同的段;但如果数据不是你呈现的方式,它就会失败;所以如果有失败,你需要看看它与我的假设或你提供数据的方式不匹配。

宏假设数据在A1中开始并且在A列中,第1行中没有标签。结果写入B列及后续;标签第1行 - 但这些可以放在任何地方。

此代码进入常规模块。

Option Explicit
Sub ParseBiblio()
    Dim vData As Variant
    Dim vBiblios() As Variant
    Dim rRes As Range
    Dim re As Object, mc As Object
    Dim I As Long

'Assume Data is in column A.
'Might need to start at row 2 if there is a label row
vData = Range("A1", Cells(Rows.Count, "A").End(xlUp))

'Results to start in Column B with labels in row 1
Set rRes = Range("b1")

Set re = CreateObject("vbscript.regexp")
With re
    .MultiLine = True
    .Global = True
    .ignorecase = True
    .Pattern = "(^[^,]+),?\s*([^,]+?)(?:,\s*([^(]+))?\s*\((\d{4})\)\s*(.*?\.)\s*(?:In:\s*(.*)\.)?\s*(.*)"
End With

'Results array and labels
ReDim vBiblios(1 To UBound(vData) + 1, 1 To 7)
    vBiblios(1, 1) = "First Author"
    vBiblios(1, 2) = "Second Author"
    vBiblios(1, 3) = "Other Authors"
    vBiblios(1, 4) = "Publication Year"
    vBiblios(1, 5) = "Title"
    vBiblios(1, 6) = "Published In"
    vBiblios(1, 7) = "More Info"

For I = 1 To UBound(vData)
    Set mc = re.Execute(vData(I, 1))
    If mc.Count > 0 Then
        With mc(0)
            vBiblios(I + 1, 1) = .submatches(0)
            vBiblios(I + 1, 2) = .submatches(1)
            vBiblios(I + 1, 3) = .submatches(2)
            vBiblios(I + 1, 4) = .submatches(3)
            vBiblios(I + 1, 5) = .submatches(4)
            vBiblios(I + 1, 6) = .submatches(5)
            vBiblios(I + 1, 7) = .submatches(6)
        End With
    End If
Next I

Set rRes = rRes.Resize(rowsize:=UBound(vBiblios, 1), columnsize:=UBound(vBiblios, 2))
rRes.EntireColumn.Clear
rRes = vBiblios
With rRes
    With .Rows(1)
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
    End With
    .EntireColumn.AutoFit
End With

End Sub