如何解决Powershell:在本地存储网站并保留对象类型
我想离线保存网站作为对象。我在Windows10上使用Powershell 5.1.19041.546
#在线分析(有效)
$website = Invoke-WebRequest https://www.w3schools.com/html/html_tables.asp
$website | gm
#I get an Microsoft.PowerShell.Commands.HtmlWebResponSEObject object
#next I use $website in this function (I call it Get-WebRequestTable) that expects a [Microsoft.PowerShell.Commands.HtmlWebResponSEObject] $WebRequest,input object https://www.leeholmes.com/blog/2015/01/05/extracting-tables-from-powershells-invoke-webrequest/
#offline分析在本地保存网站并使用get-content导入网站(无效)
#saving the website locally
Invoke-WebRequest -Uri https://www.w3schools.com/html/html_tables.asp -OutFile C:\temp\website
#writing the website back to a variable
$offlinedata = Get-Content C:\temp\website
#I get a string object
$offlinedata | gm
#String can not be used in function :Get-WebRequestTable : Cannot process argument transformation on parameter 'WebRequest'. Cannot convert the "System.Object[]" value of type "System.Object[]" to type "Microsoft.PowerShell.Commands.HtmlWebResponSEObject".
Get-WebRequestTable -WebRequest $offlinedata
#离线分析将网站另存为XML(无效)
Invoke-WebRequest -Uri https://www.w3schools.com/html/html_tables.asp | Export-Clixml C:\temp\website.xml
运行时间很长,我得到以下XML(简称)
<Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04">
[...] <S>System.__ComObject</S>
<S>System.__ComObject</S>
这似乎似乎造成了一个无限循环
<S>System.__ComObject</S>
#将其转换为json以将其存储在本地(不起作用)
$website = Invoke-WebRequest -Uri https://www.w3schools.com/html/html_tables.asp
$website | ConvertTo-Json
我明白了
ConvertTo-Json : An item with the same key has already been added.
有人知道如何在本地存储网站,然后再还原[Microsoft.PowerShell.Commands.HtmlWebResponSEObject]对象以进行进一步处理吗?
解决方法
此代码将本地html代码导入“ HtmlWebResponseObject”对象
function convert-localhtml($localhtmlpath){
$HTML = New-Object -Com "HTMLFile"
$website = Get-Content "$localhtmlpath" -raw -ErrorAction Stop
# Write HTML content according to DOM Level2
$HTML.IHTMLDocument2_write($website)
$HTML
}
对Prateek Singh https://ridicurious.com/2017/01/24/powershell-tip-parsing-html-from-a-local-file-or-a-string/
表示敬意我稍微修改了Lee Holmes的代码,使其可以处理两种对象类型。
如果使用invoke-webrequest
,则为[Microsoft.PowerShell.Commands.HtmlWebResponseObject];如果使用convert-localhtml
,则为[HTMLDocumentClass]
https://www.leeholmes.com/blog/2015/01/05/extracting-tables-from-powershells-invoke-webrequest/
赞扬他出色的表格提取代码
function Get-WebRequestTable{
param(
[Parameter(Mandatory = $true)]
$WebRequest,[Parameter(Mandatory = $true)]
[int]$TableNumber
)
# Ensure that a supported type was passed.
if (($WebRequest.GetType().Name -ne "HTMLDocumentClass") -and ($WebRequest.GetType().Name -ne "HtmlWebResponseObject")) { Throw "Unsupported argument type. Need [Microsoft.PowerShell.Commands.HtmlWebResponseObject] or [HTMLDocumentClass] " }
if ($WebRequest -is [Microsoft.PowerShell.Commands.HtmlWebResponseObject]) {
$tables = @($WebRequest.ParsedHtml.getElementsByTagName("TABLE"))
}
else {
#"[HTMLDocumentClass] arguments given."
$tables = @($WebRequest.getElementsByTagName("TABLE"))
}
## Extract the tables out of the web request
$table = $tables[$TableNumber]
$titles = @()
$rows = @($table.Rows)
## Go through all of the rows in the table
foreach ($row in $rows)
{
$cells = @($row.Cells)
## If we've found a table header,remember its titles
if ($cells[0].tagName -eq "TH")
{
$titles = @($cells | ForEach-Object { ("" + $_.InnerText).Trim() })
continue
}
## If we haven't found any table headers,make up names "P1","P2",etc.
if (-not $titles)
{
$titles = @(1..($cells.Count + 2) | ForEach-Object { "P$_" })
}
## Now go through the cells in the the row. For each,try to find the
## title that represents that column and create a hashtable mapping those
## titles to content
$resultObject = [Ordered]@{}
for ($counter = 0; $counter -lt $cells.Count; $counter++)
{
$title = $titles[$counter]
if (-not $title) { continue }
$resultObject[$title] = ("" + $cells[$counter].InnerText).Trim()
}
## And finally cast that hashtable to a PSCustomObject
[pscustomobject]$resultObject
}
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。