我有很多出版物,所有数据都放在一个单元格中。我想将每个单元格中的数据分成不同的列。一个用于作者,标题,年份,期刊,日期等。
例如,在一个单元格中我有:
Plant RA,Page JP,Bonham JH,Jones JP Stairway to Heaven(1971)Led Zepplin Dec 5; 1(39):14802-14084
代表作者的名字,代表头衔的Stairway to Heaven,(1971)年份,Led Zepplin为期刊名称,12月5日为日期,1(39)为期刊,14802-14804为迄今。
虽然每篇引文的长度和作者数量等都有所不同,但是可以自动化这个过程吗?基于LEFT(B1,SEARCH(" ",B1)-1)
之类的命令,某些命令的变量太多,但我无法帮助,但我觉得这一定是人们经常遇到的问题。一个简单的解决方案是否可能?
提前致谢。
答案 0 :(得分:0)
Well, it depends on how badly your citation formats vary, but one very useful tool for automating complicated string handling is regular expressions. Here's some code that demonstrates how you could structure one possible solution. The "Test()" subroutine will demo the process.
MSDN Regular Expression Quick Reference
Sub Test()
Dim s(1 To 3) As String
'Format A
s(1) = "Plant RA, Page JP, Bonham JH, Jones JP Stairway to Heaven (1971) Led Zepplin Dec 5;1(39):14802-14084"
'Format B
s(2) = "Plant RA, Page JP, Bonham JH, Jones JP Stairway to Heaven [1971] Led Zepplin Dec 5;1(39):14802-14084"
'Unknown Format
s(3) = "Plant RA, Page JP, Bonham JH, Jones JP Stairway to Heaven (1971) Led Zepplin Dec 5-1(39):14802-14084"
test_string = s(1)
MsgBox GetFormat(test_string) & Chr(10) & GetYear(test_string)
End Sub
Function GetYear(ByVal s As String)
Dim YearPattern As Object
Set YearPattern = CreateObject("Scripting.Dictionary")
YearPattern.Add "FormatA", "\(\d{4}\)"
YearPattern.Add "FormatB", "\[\d{4}\]"
F = GetFormat(s)
If F = "Unknown Format" Then
GetYear = "Error: Format not recognized"
Else
Set Result = FindPattern(s, YearPattern(F))
n = Result.Count
If n = 0 Then
GetYear = "![No Result]"
ElseIf n = 1 Then
GetYear = Result(0)
Else
GetYear = "![Multiple results]: "
For Each r In Result
GetYear = GetYear & ", " & r
Next
End If
GetYear = Clean(GetYear, CType)
End If
End Function
Function GetFormat(ByVal s As String)
Set FormatPatterns = CreateObject("Scripting.Dictionary")
FormatPatterns.Add "FormatA", ",+.*\(\d{4}\).*;.*\):"
FormatPatterns.Add "FormatB", ",+.*\[\d{4}\].*;.*\):"
If FindPattern(s, FormatPatterns("FormatA")).Count > 0 Then
GetFormat = "FormatA"
ElseIf FindPattern(s, FormatPatterns("FormatB")).Count > 0 Then
GetFormat = "FormatB"
Else
GetFormat = "Unknown Format"
End If
End Function
Function FindPattern(ByVal s As String, ByVal p As String) As Variant
'Argument 1: The string to execute regular expressions on (s)
'Argument 2: A pattern string to execute (p)
'Return Value: An array of regular expression results
Set r = CreateObject("vbscript.regexp")
r.Global = True
r.IgnoreCase = True
r.MultiLine = True
r.Pattern = p
Set FindPattern = r.Execute(s)
End Function
Function Clean(ByVal s As String, Optional ByVal CType As String) As String
'Removes unwanted characters from a string (s)
'Based on the specified type of string "CType"
Select Case CType
Case "Year"
Clean = Replace(Replace(Replace(s, "(", ""), ")", ""), ": ,", ": ")
Case Else
Clean = Replace(s, ": ,", ": ")
End Select
End Function