使用requests-html提取堆栈溢出问题下的答案时要搜索哪些html类/ id

如何解决使用requests-html提取堆栈溢出问题下的答案时要搜索哪些html类/ id

问题简介 语言版本:Python 3.8

操作系统:Windows 10

其他相关软件:Jupyter笔记本和html请求

上下文: 我一直跟随this tutorial来抓取stackoverflow的问题。 我的目标是(从问题的网址中)提取答案以及由谁回答。 但是,我很难确定要搜索的类/ id在问题的html中

我尝试过的事情: 我试图在('.container')下搜索('.post-layout'),'。mb0','#answers'和'#answers-headers'之类的东西,但结果有些许混乱。

我用来解析页面的代码摘录(不是问题)here is the github link

def parse_tagged_page(html):
    question_summaries = html.find(".question-summary")
    key_names = ['question','votes','tags']
    classes_needed = ['.question-hyperlink','.vote','.tags']
    datas = []
    for q_el in question_summaries:
        question_data = {}
        for i,_class in enumerate(classes_needed):
            sub_el = q_el.find(_class,first=True)
            keyname = key_names[i]
            question_data[keyname] = clean_scraped_data(sub_el.text,keyname=keyname)
        datas.append(question_data)
    return datas

下面是我正在寻找的html代码示例。 html代码on this question

<div id="answers">

                    <a name="tab-top"></a>
                    <div id="answers-header">
                        <div class="answers-subheader grid ai-center mb8">
                            <div class="grid--cell fl1">
                                <h2 class="mb0" data-answercount="13">
                                        13 Answers
                                    <span style="display:none;" itemprop="answerCount">13</span>
                                </h2>
                            </div>
                            <div class="grid--cell">
                                <div class=" grid s-btn-group js-filter-btn">
        <a class="grid--cell s-btn s-btn__muted s-btn__outlined" href="/questions/19254583/how-do-i-host-multiple-node-js-sites-on-the-same-ip-server-with-different-domain?answertab=active#tab-top" data-nav-xhref="" title="Answers with the latest activity first" data-value="active" data-shortcut="A">
            Active</a>
        <a class="grid--cell s-btn s-btn__muted s-btn__outlined" href="/questions/19254583/how-do-i-host-multiple-node-js-sites-on-the-same-ip-server-with-different-domain?answertab=oldest#tab-top" data-nav-xhref="" title="Answers in the order they were provided" data-value="oldest" data-shortcut="O">
            Oldest</a>
        <a class="youarehere is-selected grid--cell s-btn s-btn__muted s-btn__outlined" href="/questions/19254583/how-do-i-host-multiple-node-js-sites-on-the-same-ip-server-with-different-domain?answertab=votes#tab-top" data-nav-xhref="" title="Answers with the highest score first" data-value="votes" data-shortcut="V">
            Votes</a>
</div>

                            </div>
                        </div>
                            
                    </div>


                                          
<a name="19254824"></a>
<div id="answer-19254824" class="answer accepted-answer" data-answerid="19254824" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer">
    <div class="post-layout">
        <div class="votecell post-layout--left">
            <div class="js-voting-container grid fd-column ai-stretch gs4 fc-black-200" data-post-id="19254824">
        <button class="js-vote-up-btn grid--cell s-btn s-btn__unset c-pointer" data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Up vote" data-selected-classes="fc-theme-primary" aria-describedby="--stacks-s-tooltip-zxmm3912"><svg aria-hidden="true" class="m0 svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 26h32L18 10 2 26z"></path></svg></button><div id="--stacks-s-tooltip-zxmm3912" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">This answer is useful<div class="s-popover--arrow"></div></div>
        <div class="js-vote-count grid--cell fc-black-500 fs-title grid fd-column ai-center" itemprop="upvoteCount" data-value="83">83</div>
        <button class="js-vote-down-btn grid--cell s-btn s-btn__unset c-pointer" data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Down vote" data-selected-classes="fc-theme-primary" aria-describedby="--stacks-s-tooltip-waz8801n"><svg aria-hidden="true" class="m0 svg-icon iconArrowDownLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 10h32L18 26 2 10z"></path></svg></button><div id="--stacks-s-tooltip-waz8801n" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">This answer is not useful<div class="s-popover--arrow"></div></div>

    
            <div class="js-accepted-answer-indicator grid--cell fc-green-500 ta-center py4" data-s-tooltip-placement="right" title="Loading when this answer was accepted…" tabindex="0" role="note" aria-label="Accepted">
                <svg aria-hidden="true" class="svg-icon iconCheckmarkLg" width="36" height="36" viewBox="0 0 36 36"><path d="M6 14l8 8L30 6v8L14 30l-8-8v-8z"></path></svg>
            </div>

    
        <a class="js-post-issue grid--cell s-btn s-btn__unset c-pointer py6 mx-auto" href="/posts/19254824/timeline" data-shortcut="T" data-controller="s-tooltip" data-s-tooltip-placement="right" aria-label="Timeline" aria-describedby="--stacks-s-tooltip-djt8qt69"><svg aria-hidden="true" class="mln2 mr0 svg-icon iconHistory" width="19" height="18" viewBox="0 0 19 18"><path d="M3 9a8 8 0 113.73 6.77L8.2 14.3A6 6 0 105 9l3.01-.01-4 4-4-4h3L3 9zm7-4h1.01L11 9.36l3.22 2.1-.6.93L10 10V5z"></path></svg></a><div id="--stacks-s-tooltip-djt8qt69" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">Show activity on this post.<div class="s-popover--arrow"></div></div>

</div>

        </div>

        

<div class="answercell post-layout--right">
    
    <div class="s-prose js-post-body" itemprop="text">
<p>Choose one of:</p>

<ul>
<li>Use some other server (<a href="https://stackoverflow.com/a/5015178/436776">like nginx</a>) as a reverse proxy.</li>
<li>Use <a href="https://github.com/nodejitsu/node-http-proxy" rel="noreferrer">node-http-proxy</a> as a reverse proxy.</li>
<li>Use the <a href="http://www.senchalabs.org/connect/vhost.html" rel="noreferrer">vhost middleware</a> if each domain can be served from the same Connect/Express codebase and node.js instance.</li>
</ul>
    </div>
    <div class="mt24">
        <div class="grid fw-wrap ai-start jc-end gs8 gsy">
            <time itemprop="dateCreated" datetime="2013-10-08T17:53:13"></time>
            <div class="grid--cell mr16" style="flex: 1 1 100px;">
                

<div class="post-menu">
    <a href="/a/19254824/14340924" rel="nofollow" itemprop="url" class="js-share-link js-gps-track" title="short permalink to this answer" data-gps-track="post.click({ item: 2,priv: -1,post_type: 2 })" data-controller="se-share-sheet s-popover" data-se-share-sheet-title="Share a link to this answer" data-se-share-sheet-subtitle="(includes your user id)" data-se-share-sheet-post-type="answer" data-se-share-sheet-social="facebook twitter devto" data-se-share-sheet-location="2" data-se-share-sheet-license-url="https%3a%2f%2fcreativecommons.org%2flicenses%2fby-sa%2f3.0%2f" data-se-share-sheet-license-name="CC BY-SA 3.0" data-s-popover-placement="bottom-start" aria-controls="se-share-sheet-1" data-action=" s-popover#toggle se-share-sheet#preventNavigation s-popover:show->se-share-sheet#willShow s-popover:shown->se-share-sheet#didShow">share</a><div class="s-popover z-dropdown" style="width: unset; max-width: 28em;" id="se-share-sheet-1"><div class="s-popover--arrow"></div><div><span class="js-title fw-bold">Share a link to this answer</span> <span class="js-subtitle">(includes your user id)</span></div><div class="my8"><input type="text" class="js-input s-input wmn3 sm:wmn-initial" readonly=""></div><div class="d-flex jc-space-between mbn4"><button class="js-copy-link-btn s-btn s-btn__link">Copy link</button><a href="https://creativecommons.org/licenses/by-sa/3.0/" rel="license" class="s-block-link fc-blue-600 js-license" target="_blank" title="The current license for this post: CC BY-SA 3.0">CC BY-SA 3.0</a><div class="js-social-container"></div></div></div>
        <span class="lsep">|</span>
                <a href="/posts/19254824/edit" class="suggest-edit-post js-gps-track" data-gps-track="post.click({ item: 6,post_type: 2 })" title="revise and improve this post">edit</a>
        <span class="lsep">|</span>
    <button id="btnFollowPost-19254824" class="s-btn s-btn__link fc-black-400 h:fc-black-700 pb2 js-follow-post js-follow-answer js-gps-track" role="button" data-gps-track="post.click({ item: 14,post_type: 2 })" data-controller="s-tooltip " data-s-tooltip-placement="bottom" data-s-popover-placement="bottom" aria-controls="" aria-describedby="--stacks-s-tooltip-nb9azr0k">
        follow
    </button><div id="--stacks-s-tooltip-nb9azr0k" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">Follow this answer to receive notifications<div class="s-popover--arrow"></div></div>
        <span class="lsep">|</span>
</div>

            </div>
            <div class="post-signature grid--cell fl0">
<div class="user-info user-hover">
    <div class="user-action-time">
        <a href="/posts/19254824/revisions" title="show all edits to this post" class="js-gps-track" data-gps-track="post.click({ item: 4,post_type: 2 })">edited <span title="2017-05-23 11:33:25Z" class="relativetime">May 23 '17 at 11:33</span></a>
    </div>
    <div class="user-gravatar32">
        <a href="/users/-1/community"><div class="gravatar-wrapper-32"><img src="https://www.gravatar.com/avatar/a007be5a61f6aa8f3e85ae2fc18dd66e?s=32&amp;d=identicon&amp;r=PG" alt="" width="32" height="32" class="bar-sm"></div></a>
    </div>
    <div class="user-details">
        <a href="/users/-1/community">Community</a><span class="mod-flair " title="moderator">♦</span>
        <div class="-flair">
            <span class="reputation-score" title="reputation score " dir="ltr">1</span><span title="1 silver badge" aria-hidden="true"><span class="badge2"></span><span class="badgecount">1</span></span><span class="v-visible-sr">1 silver badge</span>
        </div>
    </div>
</div>            </div>


            <div class="post-signature grid--cell fl0">
                <div class="user-info user-hover">
    <div class="user-action-time">
        answered <span title="2013-10-08 17:53:13Z" class="relativetime">Oct 8 '13 at 17:53</span>
    </div>
    <div class="user-gravatar32">
        <a href="/users/201952/josh3736"><div class="gravatar-wrapper-32"><img src="https://i.stack.imgur.com/eLXTL.jpg?s=32&amp;g=1" alt="" width="32" height="32" class="bar-sm"></div></a>
    </div>
    <div class="user-details" itemprop="author" itemscope="" itemtype="http://schema.org/Person">
        <a href="/users/201952/josh3736">josh3736</a><span class="d-none" itemprop="name">josh3736</span>
        <div class="-flair">
            <span class="reputation-score" title="reputation score 119,818" dir="ltr">120k</span><span title="24 gold badges" aria-hidden="true"><span class="badge1"></span><span class="badgecount">24</span></span><span class="v-visible-sr">24 gold badges</span><span title="198 silver badges" aria-hidden="true"><span class="badge2"></span><span class="badgecount">198</span></span><span class="v-visible-sr">198 silver badges</span><span title="245 bronze badges" aria-hidden="true"><span class="badge3"></span><span class="badgecount">245</span></span><span class="v-visible-sr">245 bronze badges</span>
        </div>
    </div>
</div>

            </div>
        </div>
    </div>
    
</div>




                <div class="post-layout--right">
        <div id="comments-19254824" class="comments js-comments-container bt bc-black-2 mt12 " data-post-id="19254824" data-min-length="15">
            <ul class="comments-list js-comments-list" data-remaining-comments-count="0" data-canpost="false" data-cansee="true" data-comments-unavailable="false" data-addlink-disabled="true">

                        <li id="comment-45028507" class="comment js-comment " data-comment-id="45028507">
        <div class="js-comment-actions comment-actions">
            <div class="comment-score js-comment-edit-hide">
                    <span title="number of 'useful comment' votes received" class="cool">3</span>
            </div>
        </div>
        <div class="comment-text js-comment-text-and-form">
            <div class="comment-body js-comment-edit-hide">
                
                <span class="comment-copy">that's a very good and brief list of the options I've read elsewhere. Do you happen to know for each of these solutions which processes would need to be restarted when a new domain is added? For 1) none. For 2) only the node-http-proxy. For 3) the entire thread of all sites would need to be restarted. Is this correct?</span>
                
–&nbsp;<a href="/users/977206/flion" title="8,539 reputation" class="comment-user">Flion</a>
                <span class="comment-date" dir="ltr"><a class="comment-link" href="#comment45028507_19254824"><span title="2015-02-05 10:48:37Z,License: CC BY-SA 3.0" class="relativetime-clean">Feb 5 '15 at 10:48</span></a></span>
            </div>
        </div>
    </li>
    <li id="comment-45045094" class="comment js-comment " data-comment-id="45045094">
        <div class="js-comment-actions comment-actions">
            <div class="comment-score js-comment-edit-hide">
                    <span title="number of 'useful comment' votes received" class="cool">1</span>
            </div>
        </div>
        <div class="comment-text js-comment-text-and-form">
            <div class="comment-body js-comment-edit-hide">
                
                <span class="comment-copy">@Flion: You could write the node-based proxies in such a way that you could reload the domain configuration without requiring a process restart.  It really depends on your app's exact requirements.</span>
                
–&nbsp;<a href="/users/201952/josh3736" title="119,818 reputation" class="comment-user">josh3736</a>
                <span class="comment-date" dir="ltr"><a class="comment-link" href="#comment45045094_19254824"><span title="2015-02-05 17:50:17Z,License: CC BY-SA 3.0" class="relativetime-clean">Feb 5 '15 at 17:50</span></a></span>
            </div>
        </div>
    </li>
    <li id="comment-107457123" class="comment js-comment " data-comment-id="107457123">
        <div class="js-comment-actions comment-actions">
            <div class="comment-score js-comment-edit-hide">
            </div>
        </div>
        <div class="comment-text js-comment-text-and-form">
            <div class="comment-body js-comment-edit-hide">
                
                <span class="comment-copy">Not what was asked.</span>
                
–&nbsp;<a href="/users/5616722/patrick-sturm" title="315 reputation" class="comment-user">Patrick Sturm</a>
                <span class="comment-date" dir="ltr"><a class="comment-link" href="#comment107457123_19254824"><span title="2020-03-18 07:47:44Z,License: CC BY-SA 4.0" class="relativetime-clean">Mar 18 at 7:47</span></a></span>
            </div>
        </div>
    </li>

            </ul>
        </div>

        <div id="comments-link-19254824" data-rep="50" data-reg="true">
                    <a class="js-add-link comments-link disabled-link" title="Use comments to ask for more information or suggest improvements. Avoid comments like “+1” or “thanks”." href="#" role="button">add a comment</a>
                <span class="js-link-separator dno">&nbsp;|&nbsp;</span>
            <a class="js-show-link comments-link dno" title="expand to show all comments on this post" href="#" onclick="" role="button"></a>
        </div>         
    </div>
    </div>
</div>

解决方法

您应该寻找.answercell

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res