如何在 Python 中使用 Selenium 或 Beautifulsoup 抓取星级？

如何解决如何在 Python 中使用 Selenium 或 Beautifulsoup 抓取星级？

我正在尝试根据星级来抓取评级。星星有不同的颜色，可以在 Chrome 中区分。但是，标签中的星星都是一样的。有没有办法根据星星的颜色来抓取每个子类别的评分，例如，工作/生活平衡的评分应为 3。

网页可以在这里找到：https://www.glassdoor.ca/Reviews/Employee-Review-AAR-RVW40036525.htm

解决方法

为了区分评级，每个评级类别的类名称都不同。这是基于评级的所有类名的示例，值是类名。这可以让你从你需要的东西开始

{
"one_star" : "css-152xdkl","two_star" : "css-19o85uz","three_star" : "css-1ihykkv","four_star" : "css-1c07csa","five_star" : "css-1dc0bv4",}

这就是我所做的。我主要使用 BeautifulSoup，因为我更喜欢它。

# Find all the reviews on the page
reviews = driver.find_elements_by_class_name('gdReview')

# I used BeautifulSoup to collect the ratings
for review in reviews:
    # Convert the Selenium element for a review into a BeautifulSoup object
    review_source = review.get_attribute('innerHTML')
    soup = BeautifulSoup(review_source,'lxml')

    # Find the sub-ratings tag
    sub_ratings_tag = soup.find("div",{"class": "tooltipContainer"})
    # Find all the "li" tags
    li_tags = sub_ratings_tag.find_all("li")

    # Loop over each "li" tag and collect the ratings
    star_dict = {"css-152xdkl": 1,"css-19o85uz": 2,"css-1ihykkv": 3,"css-1c07csa": 4,"css-1dc0bv4": 5}
    sub_rating_dict = {}
    for li_tag in li_tags:
        div_tags = li_tag.find_all("div")
        for div_tag in div_tags:                
            # Get the classname and the rating name
            if div_tag.has_attr("class"):
                div_class=div_tag["class"][0]
            else:
                sub_cat = div_tag.text.strip()
        star_value = star_dict[div_class]
        sub_rating_dict[sub_cat] = star_value

如何在 Python 中使用 Selenium 或 Beautifulsoup 抓取星级？

如何解决如何在 Python 中使用 Selenium 或 Beautifulsoup 抓取星级？

解决方法

相关推荐