Scrapy：CSS 选择器只从表中提取前两行，我想要每一行

如何解决Scrapy：CSS 选择器只从表中提取前两行，我想要每一行

我正在尝试从下面的页面中从表格的所有 td 中提取文本。我为此使用了 CSS 选择器，但不知何故它没有给出任何输出。我在浏览器中仔细检查了我的 CSS 选择器脚本，检查它在那里工作，但在 Scrapy 中不起作用。

HTML 链接：https://trusting-sinoussi-0dbf65.netlify.app/

在这个 HTML 页面中，我有两个同名的表，第一个没有任何内容，第二个包含所有数据。这是我选择tr然后提取td文本的scrapy代码。有不止一张表，所以我使用摘要属性来选择表。我正在使用 for 循环从所有 tr 中提取 td。因此，在可变课程中，我一直刮到所有 tr，然后将其传递给 for 循环以从每个 tr 中提取 td。但不知何故输出只显示第一个 tr 只是它没有从此表中选择所有 tr

当我提取使用

response.css('tr').extract()

OUTPUT 只包含前两个 tr

但我想要每个 tr 中的所有 td 文本，所以我尝试使用 td 而不是 tr

response.css('td.dddefault ::text').extract()

然后它返回表中的所有 td。 **但我无法理解，当使用 tr 作为提取标签时，它只输出前 2 行值？？？为什么？ **

def course_scrap(self,response):
        print('course slected')
        courses = response.css('table.datadisplaytable[summary="This layout table is used to present the course found"] tr')

        for course in courses:
            trs = course.css('td.dddefault ::text').extract()

        print(trs)

解决方法

要获得第一个实际行使用：

table.datadisplaytable[summary="This layout table is used to present the course found"] tr:nth-child(3)

第二：

table.datadisplaytable[summary="This layout table is used to present the course found"] tr:nth-child(4)

前两个 tr 是标题。

同样的方法可用于获得更短的表定位器：

.datadisplaytable:nth-of-type(2)

这应该可以从该站点的表格中获取数字及其相关文本：

class TrustingSpider(scrapy.Spider):
    name = "trusting"
    start_urls = ['https://trusting-sinoussi-0dbf65.netlify.app/']

    def parse(self,response):
        for item in response.xpath("//table[@class='datadisplaytable'][./tbody]//tr[./td]"):
            td_first = item.xpath(".//td[@class='dddefault']/text()").get()
            td_second = item.xpath(".//td[@class='dddefault'][@text-align]/text()").get()
            yield {"Number":td_first,"Text":td_second}

前几个结果：

051 CS Co-op Work Term #1
052 CS Co-op Work Term #2
053 CS Co-op Work Term #3
054 CS Co-op Work Term #4
055 CS Co-op Work Term #5
100 Introduction to Computers
110 Prog & Problem Solving
115 Object-Oriented Design
201 Intro to Digital System
203 Java Programming and Apps
207 Building Interactive Gadgets
210 Data Structures & Abstractions
215 Web and Database Programming
261 Methods in Numerical Analysis
280 Risk and Reward in Information
310 Discrete Computal Structure
320 Intro Artificial Intelligence

Scrapy：CSS 选择器只从表中提取前两行，我想要每一行

如何解决Scrapy：CSS 选择器只从表中提取前两行，我想要每一行

解决方法

相关推荐