字符串匹配/搜索python

时间:2018-06-10 20:09:05

标签: python regex wikipedia pywikibot

我试图抓取并清理维基百科数据。我有一个包含尺寸的数据字段,如下所示。

Sub TestingMacAndWin1()
Application.ScreenUpdating = False
Dim appWD As Object
Dim wddoc As Object

On Error Resume Next
Set appWD = GetObject(, "Word.application")
If Err = 429 Then
    Set appWD = CreateObject("Word.application")
    Err.Clear
End If

Set wddoc = appWD.Documents.Add
appWD.Visible = True

With appWD.ActiveDocument.PageSetup
    .Orientation = 1
    .Content.Style = .Styles("No Spacing")
    .TopMargin = appWD.InchesToPoints(0.3)
    .BottomMargin = appWD.InchesToPoints(0.3)
    .LeftMargin = appWD.InchesToPoints(0.3)
    .RightMargin = appWD.InchesToPoints(0.3)
    .InsertBreak Type:=0

End With

Sheets("Sheet1").Range("B4").CurrentRegion.Copy
appWD.Selection.Paste



Sheets("C").Range("C6:F20").Copy
appWD.Selection.Paste

With appWD.Selection
    .Collapse Direction:=0
    .InsertBreak Type:=7
End With





For i = 1 To wddoc.Tables.Count - 1
wddoc.Tables(i).Select
wddoc.Tables(i).AutoFitBehavior wdAutoFitWindow
With wddoc.Tables(i).Range
.bordersall = True
.Font.Name = "Calibri"

End With
Next i



appWD.Activate

Application.ScreenUpdating = True

End Sub

提取尺寸很容易,但考虑到条目的变化有多少,提取单位相当困难。 解决这个问题的最佳方式是什么?

我已经开始使用;

["112 x 76 yards (102.4m x 69.4m)", "104.5 x 70.3 m", "107m x 72m", 
 "109×73 yds / 100×67 m", "{{convert|105|x|68|m|yd|1}}", "100 metres by 70 metres"]

哪个应提取所有尺寸,然后我只保存前2个数字匹配,保存单位的第一个匹配(' m','米',& #39;米'' Y''码'' YDS''码'' FT&# 39; .....)然后我可以将所有转换为米。

我只是不确定如何保存第一场单位比赛。

0 个答案:

没有答案