如何找到该网站的xpath表达式?

如何解决如何找到该网站的xpath表达式?

我要剪贴此网页。

我有很多简历,我的任务是收集每一份简历的技能。 这是网页的链接==> https://www.livecareer.com/resume-search/search?jt=software%20engineer

enter image description here

解决方法

实际上,这里不需要使用strsplit。您可以使用result <- do.call(rbind,strsplit(df$COL1,'(_)(?!.*_)',perl = TRUE)) 轻松地做到这一点。这是完整的代码:

selenium

输出:

BeautifulSoup

U还可通过将以下几行添加到您的代码中来创建漂亮的import requests from bs4 import BeautifulSoup r = requests.get('https://www.livecareer.com/resume-search/search?jt=software%20engineer').text soup = BeautifulSoup(r,'html.parser') ul = soup.find('ul',class_ = 'resume-list list-unstyled') li_items = ul.find_all('li')[1:] links = [] for li in li_items: links.append('https://www.livecareer.com/'+li.a['href']) skills = [] for link in links: r = requests.get(link).text soup = BeautifulSoup(r,'html.parser') div = soup.find('div',class_ = 'field singlecolumn') skills.append(div.text) print(skills) (以提高可读性):

['agile,AutoCAD,C++,CAD,Oral,data entry,database,Engineer in Training,EIT,Engineering analysis,XML,functional,GUI,HTML,JavaScript,Team leadership,Lockheed Martin,macros,Manufacturing processes,MATLAB,mechanical,meetings,Excel,Organizational skills,presentations,Process improvement,program management,programming,Project planning,Python,research,scrum,Six Sigma,Software development,Solidworks,SQL,switches,telemetry,video,Web design,website,Written communication',"Senior Outreach at Senior Center\xa0Planned and organized a joint celebration of the Chinese New Year with the collaboration of the Westborough Public Schools. \xa0Promoted cultural awareness and broke the language barrier of different races of backgrounds.Volunteer at ChurchIdentified problems and implemented a process to eliminate a data-entry camp registration process by 100% by building new online Registration Forms for registration and the student's cultural classes arrangement.Designed posters,flyers and presentation slides with graphics and photos for the different organizational events with Microsoft Word and PowerPointDeveloped a structural documentation on publishing an annual report with detailed steps and instructions on the process that are easy to follow and quickly learn by others. \xa0Implemented a 30th Anniversary Special Edition project in a commercial quality of work with excellent time management skills to meet the deadline.Cayenne SoftwareExperienced team spirit in effort of reducing the workload of bugs fixes on the software product.\u200bAllmerica Financial CompanyProvided a sole support to the Hanover 1099 system with strong commitment and responsibility. Fined tuned the system resulting in cost savings for the Allmerica Financial Company. Winner of the Gold Crown Customer Recognition award.",'Motivated Software Engineer seeking employment as part of a dynamic software development team. Fluent in C,JAVA and python.','Developed peer-to-peer secure file transfer system in JAVA.This involved the application of symmetric\r\n     and asymmetric key cryptography algorithms,and JAVA concepts like multi-threading,socket\r\n     programming,etc.Implemented a system to query XML in JAVA.The query language was a subset of XPath\r\n    Modeled a project "Personal Health Management System" using UML and implemented it in Visual C#.The code was tested using NUnit.Object oriented software development process was used for this\r\n     project\r\n    Developed a \'license plate game\' in C on LINUX o/s using client/server architecture.This required the\r\n     application of distributed programming concepts like Sockets,RPC,multi-threading,etc.RESEARCH PAPER:\r\n    XMorph: A Shape-Polymorphic,Domain-Specific XML Data Transformation Language,\r\n     International Conference on Data Engineering (ICDE 2010),IEEE CS,Los Angeles,USA,March 2010.','Performance evaluation of In-Kernel System Call Implemented and evaluated In-kernel system call using dynamic loadable kernel module on x_86_64 architecture.Re-Development free approach to migrate Java applications to cloud at College of Engineering,Pune Implemented file access sub-system of a WebJDK which leverages the File-System API provided by HTML5.This allows the use of standard Java APIs for accessing client files.2016 2013.','Accomplished Computer Technician with a rapidly increasing range of industry experience looking to bring strong instincts and a proven record of procedural compliance,process management and strong operational skills to a rapidly growing company. ','Seeking a fulltime position as a Developer / Systems Admin / DBA for a company needing a hard working,\r\ntaskoriented person with an indepth understanding of software development and database tuning.','3 Years of experience in Information Technology with emphasis on Design,Development and End to End Implementation of Consulting based solutions with expertise on working with Object Orient Analysis and Design using Java/J2EE Technologies viz. JSP/Servlets/EJB,JDBC,Web services,Web sockets,Spring Frameworks,Spring-boot,Angular,JQuery,XML/XSLT,JSON,Integration Developer Service Component Architecture & Service Data Objects,Rational Application Developer,Test Driven Development using JUnit,Jenkins,GIT,Cloud Foundry,Eclipse/Intelij IDE,UNIX,Gradle Scripts,DB2/Oracle/MySQL Databases.','.NET 3.5,.NET,ASP .NET 3.5,ASP.NET 2.0,ASP.NET 3.5,AJAX,ASM,Banking,Basic,Business Objects,c,CSS,CSS 2,customer satisfaction,data analysis,Database,delivery,EBusiness,editor,Electronics,HP,HTML 4,IDE,IIS 7.0,ITIL,C#,C# 3.0,Windows,windows applications,2000,3.1,Windows 98,Enterprise,Oct,Operating systems,Oracle 9,Oracle database,PL/SQL,personnel,recording,reporting,sales,Servers,Service Level Agreement,SLA,Visual SourceSafe,Visual  SourceSafe,SQL Server,technical support,TOAD,vi,Microsoft Visual Studio,Visual studio,Windows server','Represent Stanford  Ballroom Dance team in various competitions in the Bay area.\r\n*Represented University of Maryland in Ballroom dance competitions in UMD,UPenn,MIT,Columbia University & Ohio \r\n*Have a keen interest in photography,especially of dancers in motion.']

输出:

DataFrame

希望这会有所帮助!

,

您可以使用浏览器的开发人员工具查找任何html元素的XPath。以下示例适用于Chrome浏览器,但是其他浏览器的操作步骤非常相似:

  1. 右键单击页面上您想要了解XPath的项目。
  2. 单击“检查”。这将打开“开发工具”,并突出显示有问题的元素。
  3. 如果您要查找的元素不是突出显示的元素,请在显示的交互式html中导航。通常,当您将鼠标悬停在某个元素上时,匹配项就会在主页上突出显示。
  4. 右键单击导航器中的html元素。
  5. 选择“复制->复制XPath”。

在抓取这些页面时可能遇到的主要问题是,目标可能每次都不在同一位置。如果这些是用户创建的文档,则它们可能每次都有不同的布局,因此XPath可能与每个页面上的同一部分都不匹配。如果是这种情况,您可能需要执行更复杂的方法(jQuery,Selenium,Cypress等都具有用于按文本内容搜索,在父/子元素之间导航的工具)。

,

要把它扔在那里。....

如前所述,二手铬 右键单击所需的元素 点击检查 按ctrl + f打开搜索窗口

然后编写您的xpath,使其返回一个(并且只有一个)页面对象。

例如:

//a[contains(text(),'my text')] 
//div[@id='myDivID']

创建项目的xpath。从未使用过“复制Xpath”选项。您将获得如下所示的路径,这是您可以编写的最易碎的xpath。

//*[@id="wrapper"]/div[2]/div[2]/div[1]/aside[1]/div/div/div[2]/div/div[1]/a

如果您不熟悉编写xpath,请转到此处阅读https://www.w3schools.com/xml/xpath_intro.asp

您在这里遇到的一个问题是,文本位于div标记内,并且实际上应该在范围内。您可以尝试以下操作:

//div[@class='field singlecolumn']/text()

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res