如何解决Etsy 产品抓取工具提取一行数据
我正在尝试提取 Etsy.com 的一些产品数据,我不确定是不是因为我的父类错误,我无法提取数据或其他问题。我已经尝试了几个班级作为父班级,当前的班级允许我拉出一行。
我一直在等待页面加载并向下滚动页面以确保它正确加载而不是作为一个懒惰的加载器。但是我仍然只能提取一行数据。
我下面的代码通常适用于我
Set Html = objIE.document
Set elements = Html.getElementsByClassName("bg-white display-block pb-xs-2 mt-xs-0") ' parent CLASS
'FOR LOOP
For Each element In elements
''' Element 1
If element.getElementsByClassName("js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0) Is nothing Then ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count,"A").End(xlUp).Row + 1,"A").Value = "-" 'If nothing then Hyphen in CELL
Else
HtmlText = element.getElementsByClassName("js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0).href 'Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count,"A").Value = HtmlText 'return value in column
End If
''' Element 2
If element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0) Is nothing Then ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count,"B").End(xlUp).Row + 1,"B").Value = "-" 'If nothing then Hyphen in CELL
Else
HtmlText = element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0).innerText ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count,"B").Value = HtmlText 'return value in column
End If
''' Element 3
二年级
我以为我已经解决了问题并且没有发布我原来的上述问题。 通过下面的父类,我能够完成一整页 50 多个项目和 A 列结果。 从那以后我什么都没有改变,但是我不能再次产生相同的结果。我得到的只是一排,我不明白为什么。我一直试图解决这个问题一段时间了,但无法弄清楚问题是什么。下面的课程运行了一次并提取了 50 多个结果,现在它只执行 1 行,我已经清除了所有浏览器缓存,并重新启动了 PC,
二年级
Set Html = objIE.document
Set elements = Html.getElementsByClassName("wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container") ' parent CLASS
'FOR LOOP
For Each element In elements
我尝试了以下类,只有两个在笔记状态中取得了一些结果
'wt-mt-xs-2 wt-text-black
'col-group pl-xs-0 search-listings-group pr-xs-1
'col-xs-12 pl-xs-1 pl-md-3
'responsive-listing-grid wt-grid wt-grid--block wt-justify-content-flex-start wt-list-unstyled pl-xs-0
'bg-white display-block pb-xs-2 mt-xs-0
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'Can only do 1 row
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'I was able to pull of 50+ items Now not working
'wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder
'js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none
对于每个项目都有一个 li 类,请参见下图了解更多信息
问题 - 有人可以告诉我我做错了什么吗? (我设法在第二个父类中获得了 50 多个结果,但是现在这只是获得了 1 行,我无法解决)
<li class="wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder">
<div class="js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none " data-palette-listing-id="973170689" data-shop-id="" data-listing-id="973170689" data-behat-listing-card="" data-listing-card-v2="">
<a class="6dd4c4354676ccda display-inline-block listing-link logged" data-listing-id="973170689" data-palette-listing-image="" href="https://www.etsy.com/uk/listing/973170689/deconstructed-iphone-5-artwork?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=phones&ref=sc_gallery-1-1&plkey=247d3e6c1599979de70c884db995d78e95827f21%3A973170689&frs=1"
data-display-loc="w.0" data-page-num="1" data-position-num="1" data-logging-key="247d3e6c1599979de70c884db995d78e95827f21:973170689" target="etsy.973170689" title="Deconstructed iPhone 5 artwork">
<div class="v2-listing-card__img position-relative">
<div data-listing-card-image="">
<div class="placeholder placeholder-landscape ">
<div class="placeholder-content ">
<div class="placeholder vertically-centered-placeholder placeholder-landscape">
<div class="height-placeholder">
<img data-listing-card-listing-image="" src="https://i.etsystatic.com/27880825/c/2250/1788/0/538/il/116587/2961533797/il_340x270.2961533797_r4pc.jpg" class="width-full wt-height-full display-block position-absolute " alt="">
</div>
</div>
</div>
</div>
</div>
</div>
<div class="v2-listing-card__info
">
<div>
<h3 class="text-gray text-truncate mb-xs-0 text-body ">
Deconstructed iPhone 5 artwork
</h3>
<p>
</p>
<div class="v2-listing-card__shop">
<p class="text-gray-lighter text-body-smaller display-inline-block" aria-hidden="true"><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">A</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>
<span
class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">d</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span><span class="p06299890 c968b3da8">E</span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">b</span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">y</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>dissectProjects</p>
<p class="screen-reader-only">Ad from shop dissectProjects</p>
<span class="v2-listing-card__rating icon-t-2 display-block">
</span>
</div>
<span class="n-listing-card__price text-gray mt-xs-0 strong display-block
text-body-larger
">
<span class="currency-symbol">£</span><span class="currency-value">120.00</span>
<span class="text-body-smaller no-wrap">
span class="wt-badge wt-badge--small wt-badge--sale-01">
FREE UK delivery</span>
</span>
</span>
<p></p>
</div>
</div>
</a>
<div data-favorite-button-wrapper="" class="v2-listing-card__actions z-index-1 position-absolute">
<button class="inline-overlay-trigger favorite-item-action position-absolute favorite-listing-button p-xs-1 has-hover-state z-index-1 btn-transparent position-right in-search v2-listing-card__favorite" data-ui="favorite-listing-button" data-listing-id="973170689"
data-accessible-btn-fave="" data-favorite-label="Add to Favourites" data-favorited-label="Remove from Favourites">
<div data-source="search" data-btn-fave="" data-neu-fave="">
<span class="favorite-listing-button-icon-container icon-circle-container bg-white icon-group p-xs-1
" data-favorite-icon-container="">
<span class="etsy-icon icon-smaller text-gray wt-display-block
" data-not-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,21C10.349,21,2,14.688,9,5.579,4.364,3,7.5,3A6.912,6.912,1,12,5.051,6.953,16.5,3C19.636,22,13.651,21ZM7.5,5C5.472,5,4,6.683,9c0,4.108,6.432,9.325,8,10,1.564-.657,8-5.832,8-10,0-2.317-1.472-4-3.5-4-1.979,0-3.7,2.105-3.721,2.127L11.991,8.1,11.216,7.12C11.186,7.083,9.5,5Z"></path></svg></span>
<span class="etsy-icon icon-smaller text-red wt-display-none
" data-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M16.5,3A6.953,3C4.364,5.688,8.349,12S22,9C22,19.636,3Z"></path></svg></span>
</span>
</div>
<!--icon font and display:none; elements -->
<span aria-hidden="true" class="icon"></span>
<span class="screen-reader-only default" data-a11y-label="">
Add to Favourites
</span>
</button>
</div>
</div>
</li>
从 SIM 代码更新
我用它来向下滚动浏览器。
objIE.document.parentwindow.Scroll 0&,9999 ' Scrolls Down the browser
''##################### 今日更新#################### ##
我猜父类是 v2-listing-card__info
但是如果我没记错的话 PRODUCT URL 不属于这个,因此我怎么得到那个
''###################### 今日更新 19/3/2021 ############## #######
非常感谢 SIM
的支持,也感谢 Qharr
的投入。终于搞定了,谢谢大家
结果
一如既往地提前致谢
解决方法
试试这个:
Sub GetTitles()
Dim IE As New InternetExplorer,HTML As HTMLDocument
Dim posts As Object,post As Object,startTime As Double
Dim timeout As Integer,prevlen&,curlen&
timeout = 5
With IE
.Visible = True
.navigate "https://www.etsy.com/uk/search?q=phones"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
prevlen = HTML.getElementsByClassName("v2-listing-card").Length
startTime = Timer
Do
HTML.parentWindow.scrollBy 0,99999
Set posts = HTML.getElementsByClassName("v2-listing-card")
curlen = posts.Length
If curlen > prevlen Then
startTime = Timer
prevlen = curlen
End If
Loop While Round(Timer - startTime,2) <= timeout
For Each post In posts
Debug.Print post.getElementsByTagName("h3")(0).innerText
Debug.Print post.getElementsByClassName("listing-link")(0).getAttribute("href")
Next post
IE.Quit
End Sub
顺便说一句,如果你使用
v2-listing-card__info
作为容器,请确保使用以下行
post.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href
获取产品链接。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。