使用Python中的BeautifulSoup从Google搜索中检索链接

如何解决使用Python中的BeautifulSoup从Google搜索中检索链接

我正在使用Tweepy和BeautifulSoup4构建一个Twitter机器人。我想将请求的结果保存在列表中，但是我的脚本不再起作用了（但几天前就可以了）。我一直在看，但我听不懂。这是我的功能：

import requests
import tweepy
from bs4 import BeautifulSoup
import urllib
import os
from tweepy import StreamListener
from TwitterEngine import TwitterEngine
from ConfigEngine import TwitterAPIConfig
import urllib.request
import emoji
import random

# desktop user-agent
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
# mobile user-agent
MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"




# Récupération des liens
def parseLinks(url):
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(url,headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content,"html.parser")
        results = []
        for g in soup.find_all('div',class_='r'):
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                results.append(link)
        return results

在其余代码中，“ url”参数是100％正确的。作为输出，我得到“无”。更准确地说，执行立即在“结果= []”行之后停止（因此它不会输入for）。

有什么主意吗？提前非常感谢您！

解决方法

似乎Google更改了页面上的HTML标记。尝试将搜索从class="r"更改为class="rc"：

import requests
from bs4 import BeautifulSoup


USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"

def parseLinks(url):
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(url,headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content,"html.parser")
        results = []
        for g in soup.find_all('div',class_='rc'): # <-- change 'r' to 'rc'
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                results.append(link)
        return results

url = 'https://www.google.com/search?q=tree'
print(parseLinks(url))

打印：

['https://en.wikipedia.org/wiki/Tree','https://simple.wikipedia.org/wiki/Tree','https://www.britannica.com/plant/tree','https://www.treepeople.org/tree-benefits','https://books.google.sk/books?id=yNGrqIaaYvgC&pg=PA20&lpg=PA20&dq=tree&source=bl&ots=_TP8PqSDlT&sig=ACfU3U16j9xRJgr31RraX0HlQZ0ryv9rcA&hl=sk&sa=X&ved=2ahUKEwjOq8fXyKjsAhXhAWMBHToMDw4Q6AEwG3oECAcQAg','https://teamtrees.org/','https://www.woodlandtrust.org.uk/trees-woods-and-wildlife/british-trees/a-z-of-british-trees/','https://artsandculture.google.com/entity/tree/m07j7r?categoryId=other']

使用Python中的BeautifulSoup从Google搜索中检索链接

如何解决使用Python中的BeautifulSoup从Google搜索中检索链接

解决方法

相关推荐