如何解决如何找到该网站的xpath表达式?
我要剪贴此网页。
我有很多简历,我的任务是收集每一份简历的技能。 这是网页的链接==> https://www.livecareer.com/resume-search/search?jt=software%20engineer
解决方法
实际上,这里不需要使用strsplit
。您可以使用result <- do.call(rbind,strsplit(df$COL1,'(_)(?!.*_)',perl = TRUE))
轻松地做到这一点。这是完整的代码:
selenium
输出:
BeautifulSoup
U还可通过将以下几行添加到您的代码中来创建漂亮的import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.livecareer.com/resume-search/search?jt=software%20engineer').text
soup = BeautifulSoup(r,'html.parser')
ul = soup.find('ul',class_ = 'resume-list list-unstyled')
li_items = ul.find_all('li')[1:]
links = []
for li in li_items:
links.append('https://www.livecareer.com/'+li.a['href'])
skills = []
for link in links:
r = requests.get(link).text
soup = BeautifulSoup(r,'html.parser')
div = soup.find('div',class_ = 'field singlecolumn')
skills.append(div.text)
print(skills)
(以提高可读性):
['agile,AutoCAD,C++,CAD,Oral,data entry,database,Engineer in Training,EIT,Engineering analysis,XML,functional,GUI,HTML,JavaScript,Team leadership,Lockheed Martin,macros,Manufacturing processes,MATLAB,mechanical,meetings,Excel,Organizational skills,presentations,Process improvement,program management,programming,Project planning,Python,research,scrum,Six Sigma,Software development,Solidworks,SQL,switches,telemetry,video,Web design,website,Written communication',"Senior Outreach at Senior Center\xa0Planned and organized a joint celebration of the Chinese New Year with the collaboration of the Westborough Public Schools. \xa0Promoted cultural awareness and broke the language barrier of different races of backgrounds.Volunteer at ChurchIdentified problems and implemented a process to eliminate a data-entry camp registration process by 100% by building new online Registration Forms for registration and the student's cultural classes arrangement.Designed posters,flyers and presentation slides with graphics and photos for the different organizational events with Microsoft Word and PowerPointDeveloped a structural documentation on publishing an annual report with detailed steps and instructions on the process that are easy to follow and quickly learn by others. \xa0Implemented a 30th Anniversary Special Edition project in a commercial quality of work with excellent time management skills to meet the deadline.Cayenne SoftwareExperienced team spirit in effort of reducing the workload of bugs fixes on the software product.\u200bAllmerica Financial CompanyProvided a sole support to the Hanover 1099 system with strong commitment and responsibility. Fined tuned the system resulting in cost savings for the Allmerica Financial Company. Winner of the Gold Crown Customer Recognition award.",'Motivated Software Engineer seeking employment as part of a dynamic software development team. Fluent in C,JAVA and python.','Developed peer-to-peer secure file transfer system in JAVA.This involved the application of symmetric\r\n and asymmetric key cryptography algorithms,and JAVA concepts like multi-threading,socket\r\n programming,etc.Implemented a system to query XML in JAVA.The query language was a subset of XPath\r\n Modeled a project "Personal Health Management System" using UML and implemented it in Visual C#.The code was tested using NUnit.Object oriented software development process was used for this\r\n project\r\n Developed a \'license plate game\' in C on LINUX o/s using client/server architecture.This required the\r\n application of distributed programming concepts like Sockets,RPC,multi-threading,etc.RESEARCH PAPER:\r\n XMorph: A Shape-Polymorphic,Domain-Specific XML Data Transformation Language,\r\n International Conference on Data Engineering (ICDE 2010),IEEE CS,Los Angeles,USA,March 2010.','Performance evaluation of In-Kernel System Call Implemented and evaluated In-kernel system call using dynamic loadable kernel module on x_86_64 architecture.Re-Development free approach to migrate Java applications to cloud at College of Engineering,Pune Implemented file access sub-system of a WebJDK which leverages the File-System API provided by HTML5.This allows the use of standard Java APIs for accessing client files.2016 2013.','Accomplished Computer Technician with a rapidly increasing range of industry experience looking to bring strong instincts and a proven record of procedural compliance,process management and strong operational skills to a rapidly growing company. ','Seeking a fulltime position as a Developer / Systems Admin / DBA for a company needing a hard working,\r\ntaskoriented person with an indepth understanding of software development and database tuning.','3 Years of experience in Information Technology with emphasis on Design,Development and End to End Implementation of Consulting based solutions with expertise on working with Object Orient Analysis and Design using Java/J2EE Technologies viz. JSP/Servlets/EJB,JDBC,Web services,Web sockets,Spring Frameworks,Spring-boot,Angular,JQuery,XML/XSLT,JSON,Integration Developer Service Component Architecture & Service Data Objects,Rational Application Developer,Test Driven Development using JUnit,Jenkins,GIT,Cloud Foundry,Eclipse/Intelij IDE,UNIX,Gradle Scripts,DB2/Oracle/MySQL Databases.','.NET 3.5,.NET,ASP .NET 3.5,ASP.NET 2.0,ASP.NET 3.5,AJAX,ASM,Banking,Basic,Business Objects,c,CSS,CSS 2,customer satisfaction,data analysis,Database,delivery,EBusiness,editor,Electronics,HP,HTML 4,IDE,IIS 7.0,ITIL,C#,C# 3.0,Windows,windows applications,2000,3.1,Windows 98,Enterprise,Oct,Operating systems,Oracle 9,Oracle database,PL/SQL,personnel,recording,reporting,sales,Servers,Service Level Agreement,SLA,Visual SourceSafe,Visual SourceSafe,SQL Server,technical support,TOAD,vi,Microsoft Visual Studio,Visual studio,Windows server','Represent Stanford Ballroom Dance team in various competitions in the Bay area.\r\n*Represented University of Maryland in Ballroom dance competitions in UMD,UPenn,MIT,Columbia University & Ohio \r\n*Have a keen interest in photography,especially of dancers in motion.']
输出:
DataFrame
希望这会有所帮助!
,您可以使用浏览器的开发人员工具查找任何html元素的XPath。以下示例适用于Chrome浏览器,但是其他浏览器的操作步骤非常相似:
- 右键单击页面上您想要了解XPath的项目。
- 单击“检查”。这将打开“开发工具”,并突出显示有问题的元素。
- 如果您要查找的元素不是突出显示的元素,请在显示的交互式html中导航。通常,当您将鼠标悬停在某个元素上时,匹配项就会在主页上突出显示。
- 右键单击导航器中的html元素。
- 选择“复制->复制XPath”。
在抓取这些页面时可能遇到的主要问题是,目标可能每次都不在同一位置。如果这些是用户创建的文档,则它们可能每次都有不同的布局,因此XPath可能与每个页面上的同一部分都不匹配。如果是这种情况,您可能需要执行更复杂的方法(jQuery,Selenium,Cypress等都具有用于按文本内容搜索,在父/子元素之间导航的工具)。
,要把它扔在那里。....
如前所述,二手铬 右键单击所需的元素 点击检查 按ctrl + f打开搜索窗口
然后编写您的xpath,使其返回一个(并且只有一个)页面对象。
例如:
//a[contains(text(),'my text')]
//div[@id='myDivID']
创建项目的xpath。从未使用过“复制Xpath”选项。您将获得如下所示的路径,这是您可以编写的最易碎的xpath。
//*[@id="wrapper"]/div[2]/div[2]/div[1]/aside[1]/div/div/div[2]/div/div[1]/a
如果您不熟悉编写xpath,请转到此处阅读https://www.w3schools.com/xml/xpath_intro.asp
您在这里遇到的一个问题是,文本位于div标记内,并且实际上应该在范围内。您可以尝试以下操作:
//div[@class='field singlecolumn']/text()
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。