从网站获取国家ISO代码

时间:2017-11-18 12:38:06

标签: powershell web-scraping powershell-v5.0

我正在尝试以CSV或JSON格式检索国家/地区的ISO代码。我的代码如下:

# ############################
$logFile = "$env:USERPROFILE\desktop\ISOCountry.log"
Start-Transcript -Path $logFile -Append
#########################################

$WebResponse = Invoke-WebRequest "http://kirste.userpage.fu-berlin.de/diverse/doc/ISO_3166.html"
#$WebResponse = Invoke-WebRequest "https://en.wikipedia.org/wiki/ISO_3166-1"
$PRETAG = $WebResponse.ParsedHtml.getElementsByTagName("PRE") | select -expand innertext
$PRETAG
$JsonText = $PRETAG | ConvertTo-csv
$JsonText
# end logging 
###########################
Stop-Transcript
###########################

数据位于PRE标记内,并且可能都是制表符分隔格式。需要帮助。我正在使用该网站,因为这是一个免费网站。

我尝试从Wiki检索数据,但无法使用以下代码检索数据:

$URI = “https://en.wikipedia.org/wiki/ISO_3166-1“
$HTML = Invoke-WebRequest -Uri $URI
($HTML.ParsedHtml.getElementsByTagName('table') | Where{ $_.className -eq 'wikitable sortable' }).innerText

仍面临同样的问题。需要帮助。

2 个答案:

答案 0 :(得分:1)

这是我对 HTMLAgilityPack 所做的事情。 您可以从http://html-agility-pack.net/下载该包 它是一个众所周知且受人尊敬的框架,用于与XPath一起抓取网站。

cls
[void][Reflection.Assembly]::LoadFile("C:\temp\HtmlAgilityPack\lib\Net20\HtmlAgilityPack.dll”)
[HtmlAgilityPack.HtmlWeb]$web = @{}
[HtmlAgilityPack.HtmlDocument]$doc = $web.Load("https://en.wikipedia.org/wiki/ISO_3166-1")

## FILTER NEEDED CONTENT THROUGH X-PATH
[HtmlAgilityPack.HtmlNodeCollection]$country = $doc.DocumentNode.SelectNodes("//table[2]//tr//td[1]")
[HtmlAgilityPack.HtmlNodeCollection]$iso = $doc.DocumentNode.SelectNodes("//table[2]//tr//td[5]")

# go trough the arrays and put each item into output
$output = @()
for($i=0; $i -le $country.selectnodes.Count; $i++){

    $output += [pscustomobject] @{
    country = $country[$i].InnerText
    iso = $iso[$i].innertext
    }    
}
# export csv
$output | ConvertTo-Csv -Delimiter ";" -NoTypeInformation | out-file C:\temp\iso.csv  -Force

这会给你一个输出:

"country";"iso"
"Afghanistan";"ISO 3166-2:AF"
"Aland Islands !Åland Islands";"ISO 3166-2:AX"
"Albania";"ISO 3166-2:AL"
"Algeria";"ISO 3166-2:DZ"
"American Samoa";"ISO 3166-2:AS"

编辑:找到一种更高效的方式

答案 1 :(得分:0)

感谢 HTMLAgility 模块。我没有 ADMIN 权限来安装模块,所以我就这样做了:

<################################################CODE HEADER############################################
SCRIPT NAME       : ISO Country Code.ps1
DESCRIPTION       : 
RUNTIME PARAMETERS: 
INPUT PARAMETERS  : 
OUTPUT PARAMETERS : 
Date                                                     Developer                                    Description 
--------------------------------------  ----------------------------------------------    --------------------------------- 

################################################CODE HEADER############################################>

<################################################SAMPLE DATA############################################
name           : Zimbabwe
topLevelDomain : {.zw}
alpha2Code     : ZW
alpha3Code     : ZWE
callingCodes   : {263}
capital        : Harare
altSpellings   : {ZW, Republic of Zimbabwe}
region         : Africa
subregion      : Eastern Africa
population     : 14240168
latlng         : {-20.0, 30.0}
demonym        : Zimbabwean
area           : 390757.0
gini           : 
timezones      : {UTC+02:00}
borders        : {BWA, MOZ, ZAF, ZMB}
nativeName     : Zimbabwe
numericCode    : 716
currencies     : {@{code=BWP; name=Botswana pula; symbol=P}, @{code=GBP; name=British pound; symbol=£}, @{code=CNY; name=Chinese yuan; symbol=¥}, @{code=EUR; name=Euro; 
                 symbol=€}...}
languages      : {@{iso639_1=en; iso639_2=eng; name=English; nativeName=English}, @{iso639_1=sn; iso639_2=sna; name=Shona; nativeName=chiShona}, @{iso639_1=nd; iso639_2=nde; 
                 name=Northern Ndebele; nativeName=isiNdebele}}
translations   : @{de=Simbabwe; es=Zimbabue; fr=Zimbabwe; ja=ジンバブエ; it=Zimbabwe; br=Zimbabwe; pt=Zimbabué; nl=Zimbabwe; hr=Zimbabve; fa=زیمباوه}
flag           : https://restcountries.eu/data/zwe.svg
regionalBlocs  : {@{acronym=AU; name=African Union; otherAcronyms=System.Object[]; otherNames=System.Object[]}}
cioc           : ZIM
###############################################################################################################################>

# ############################
$logFile = 'D:\03_PowerShell\PowerShell Scripts\ISO Country Code\ISOCountryCodes.log'
Start-Transcript -Path $logFile -Append

$isodate = Get-Date -Format ddMMMyyyy_HHhmmss

#BUILDING FILE PATH
$FilePath = "D:\03_PowerShell\PowerShell Scripts\ISO Country Code\"

#BUILDING JSON FILE PATH
$JsonFileName = "ISOCountryCode_$isodate"
$JsonFileExtn = ".json"
$JsonFileOutputPath = $FilePath+$JsonFileName+$JsonFileExtn

#BUILDING CSV FILE PATH
$CsvFileName = "ISOCountryCode_$isodate"
$CsvFileExtn = ".csv"

#CSV File Path
$CsvFileOutputPath = $FilePath+$CsvFileName+$CsvFileExtn

"The Log File is placed at: - $logFile"
"The ISO Date is: - $isodate "
"The Full File Path for the JSON File is: - $JsonFileOutputPath"
"The Full File Path for the CSV File is: - $CsvFileOutputPath"

#CLEAR RESULT SCREEN
cls

#INVOKE REST METHOD TO RETRIEVE DATA

$ISOCountryCode = Invoke-RestMethod "https://restcountries.eu/rest/v2/all"
$ISOCountryCodeFormatted = $ISOCountryCode | Select-Object @{Name="Country Name";Expression={$_."name"}} `
                                                          ,@{Name = "Internet Domain"; Expression={$_."topLevelDomain"}} `
                                                          ,@{Name = "Alpha 2 Code"; Expression={$_."alpha2Code"}} `
                                                          ,@{Name = "Alpha 3 Code"; Expression={$_."alpha3Code"}} `
                                                          ,@{Name = "Capital"; Expression={$_."capital"}} `
                                                          ,@{Name = "Continent"; Expression={$_."region"}} `
                                                          ,@{Name = "Area (LandMass)"; Expression={$_."area"}} `
                                                          ,@{Name = "Numeric Code"; Expression={$_."numericCode"}} `

<#BELOW COLUMNS AND VALUES TO BE USED AS REQUIRED
#,@{Name = "population"; Expression={$_."population"}} `
#,@{Name = "Languages Used"; Expression={$_."languages "}} `
#,@{Name = "International Dialing Code"; Expression={$_."callingCodes "}} `
#,@{Name = "Latitude/Longitude"; Expression={$_."latlng"}} `
#,@{Name = "Translations"; Expression={$_."translations"}} 
#,@{Name = "Country Direction/Location"; Expression={$_."subregion "}} `
#,@{Name = "Timezones"; Expression={$_."timezones "}} `
#,@{Name = "Borders"; Expression={$_."borders "}}#> `

#TEST THE CODE TO CONVERT RESULTSET TO JSON
#$JsonText = $Foo1 | ConvertTo-Json
#$JsonText

$ISOCountryCodeFormatted | ft

#BUILD A JSON FILE
$text | Set-Content $JsonFileOutputPath

#BUILD JSON FILE WITH HEADER > DATA > FOOTER
$JsonHeader = '{
  "ISOCountryCodes":' | Add-Content $JsonFileOutputPath
$ISOCountryCodeFormatted | ConvertTo-Json | Add-Content $JsonFileOutputPath
$JsonFooter = '}' | Add-Content $JsonFileOutputPath

#GENERATE CSV FILE
$ISOCountryCodeFormatted |Export-Csv $CsvFileOutputPath -encoding "unicode" -NoTypeInformation

#END TRANSCRIPT FOR LOGGING DATA FLOW
Stop-Transcript