如何使用python的sklearn在文本文件中查找关键字

如何解决如何使用python的sklearn在文本文件中查找关键字

我想创建一种使用python脚本优化简历的方法。为此，我试图找到工作清单中使用的关键字，可以将其添加到我的简历中，以使其在通过ATS运行时脱颖而出。目前，我正在使用以下代码查找与我的简历相匹配的职位百分比。如何使用此比较并找到如何使用职位列表中的特定关键字来改善我的简历？

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

resume = open("resume.txt",encoding='latin-1')
reference = open("reference.txt",encoding='latin-1')
compare = [resume.read(),reference.read()]
cMatrix = CountVectorizer().fit_transform(compare)

#prints how well the resume matches as a percentage
matPercent = cosine_similarity(cMatrix)[0][1] * 100
matPercent = round(matPercent,2) # round to two decimal
print("Resume is a "+ str(matPercent)+ "% match to the job.")

我正在使用以下内容来生成关键字，但是，这省略了一些重要的单词，并且我认为可以使用sklearn更好地对其进行优化。我不使用FindKeywords()，而是如何从CountVectorizer().fit_transform(compare)

访问信息

def FindKeywords():
    file = open("reference.txt",encoding='latin-1')
    string = file.read().replace("\n"," ").replace("\t"," ").lower()
    kwDict = {}
    avoidables = set(['skilled','skills','skill','minimum','tools','work','features','looking','highly','',' ','the','of','to','and','a','in','is','it','you','that','he','was','for','on','are','with','as','I','his','they','be','at','one','have','this','from','or','had','by','not','word','but','what','some','we','can','out','other','were','all','there','when','up','use','your','how','said','an','each','she','which','do','their','time','if','will','way','about','many','then','them','write','would','like','so','these','her','long','make','thing','see','him','two','has','look','more','day','Could','go','come','did','number','sound','no','most','people','my','over','kNow','water','than','call','first','who','may','down','side','been','Now','find','any','new','part','take','get','place','made','live','where','after','back','little','only','round','man','year','came','show','every','good','me','give','our','under','name','very','through','just','form','sentence','great','think','say','help','low','line','differ','turn','cause','much','mean','before','move','right','boy','old','too','same','tell','does','set','three','want','air','well','also','play','small','end','put','home','read','hand','port','large','spell','add','even','land','here','must','big','high','such','follow','act','why','ask','men','change','went','light','kind','off','need','house','picture','try','us','again','animal','point','mother','world','near','build','self','earth','father','head','stand','own','page','should','country','found','answer','school','grow','study','still','learn','plant','cover','food','sun','four','between','state','keep','eye','never','last','let','thought','city','tree','cross','farm','hard','start','might','story','saw','far','sea','draw','left','late','run',"don't",'while','press','close','night','real','life','few','north','open','seem','together','next','white','children','begin','got','walk','example','ease','paper','group','always','music','those','both','mark','often','letter','until','mile','river','car','feet','care','second','book','carry','took','science','eat','room','friend','began','idea','fish','mountain','stop','once','base','hear','horse','cut','sure','watch','color','face','wood','main','enough','plain','girl','usual','young','ready','above','ever','red','list','though','feel','talk','bird','soon','body','dog','family','direct','pose','leave','song','measure','door','product','black','short','numeral','class','wind','question','happen','complete','ship','area','half','rock','order','fire','south','problem','piece','told','knew','pass','since','top','whole','king','space','heard','best','hour','better','true','during','hundred','five','remember','step','early','hold','west','ground','interest','reach','fast','verb','sing','listen','six','table','travel','less','morning','ten','simple','several','vowel','toward','war','lay','against','pattern','slow','center','love','person','money','serve','appear','road','map','rain','rule','govern','pull','cold','notice','voice','unit','power','town','fine','certain','fly','fall','lead','cry','dark','machine','note','wait','plan','figure','star','Box','noun','field','rest','correct','able','pound','done','beauty','drive','stood','contain','front','teach','week','final','gave','green','oh','quick','develop','ocean','warm','free','minute','strong','special','mind','behind','clear','tail','produce','fact','street','inch','multiply','nothing','course','stay','wheel','full','force','blue','object','decide','surface','deep','moon','island','foot','system','busy','test','record','boat','common','gold','possible','plane','stead','dry','wonder','laugh','thousand','ago','ran','check','game','shape','equate','hot','miss','brought','heat','sNow','tire','bring','yes','distant','fill','east','paint','language','among','grand','ball','yet','wave','drop','heart','am','present','heavy','dance','engine','position','arm','wide','sail','material','size','vary','settle','speak','weight','general','ice','matter','circle','pair','include','divide','syllable','felt','perhaps','pick','sudden','count','square','reason','length','represent','art','subject','region','energy','hunt','probable','bed','brother','egg','ride','cell','believe','fraction','forest','sit','race','window','store','summer','train','sleep','prove','lone','leg','exercise','wall','catch','mount','wish','sky','board','joy','winter','sat','written','wild','instrument','kept','glass','grass','cow','job','edge','sign','visit','past','soft','fun','bright','gas','weather','month','million','bear','finish','happy','hope','flower','clothe','strange','gone','jump','baby','eight','village','meet','root','buy','raise','solve','Metal','whether','push','seven','paragraph','third','shall','held','hair','describe','cook','floor','either','result','burn','hill','safe','cat','century','consider','type','law','bit','coast','copy','phrase','silent','tall','sand','soil','roll','temperature','finger','industry','value','fight','lie','beat','excite','natural','view','sense','ear','else','quite','broke','case','middle','kill','son','lake','moment','scale','loud','spring','observe','child','straight','consonant','nation','dictionary','milk','speed','method','organ','pay','age','section','dress','cloud','surprise','quiet','stone','tiny','climb','cool','design','poor','lot','experiment','bottom','key','iron','single','stick','flat','twenty','skin','smile','crease','hole','Trade','melody','trip','office','receive','row','mouth','exact','symbol','die','least','trouble','shout','except','wrote','seed','tone','join','suggest','clean','break','lady','yard','rise','bad','blow','oil','blood','touch','grew','cent','mix','team','wire','cost','lost','brown','wear','garden','equal','sent','choose','fell','fit','flow','fair','bank','collect','save','control','decimal','gentle','woman','captain','practice','separate','difficult','doctor','please','protect','noon','whose','locate','ring','character','insect','caught','period','indicate','radio','spoke','atom','human','history','effect','electric','expect','crop','modern','element','hit','student','corner','party','supply','bone','rail','imagine','provide','agree','thus','capital',"won't",'chair','danger','fruit','rich','thick','soldier','process','operate','guess','necessary','sharp','wing','create','neighbor','wash','bat','rather','crowd','corn','compare','poem','string','bell','depend','meat','rub','tube','famous','dollar','stream','fear','sight','thin','triangle','planet','hurry','chief','colony','clock','mine','tie','enter','major','fresh','search','send','yellow','gun','allow','print','dead','spot','desert','suit','current','lift','rose','continue','block','chart','hat','sell','success','company','subtract','event','particular','deal','swim','term','opposite','wife','shoe','shoulder','spread','arrange','camp','invent','cotton','born','determine','quart','nine','truck','noise','level','chance','gather','shop','stretch','throw','shine','property','column','molecule','select','wrong','gray','repeat','require','broad','prepare','salt','nose','plural','anger','claim','continent','oxygen','sugar','death','pretty','women','season','solution','magnet','silver','thank','branch','match','suffix','especially','fig','afraid','huge','sister','steel','discuss','forward','similar','guide','experience','score','apple','bought','led','pitch','coat','mass','card','band','rope','slip','win','dream','evening','condition','Feed','tool','total','basic','smell','valley','nor','double','seat','arrive','master','track','parent','shore','division','sheet','substance','favor','connect','post','spend','chord','fat','glad','original','share','station','dad','bread','charge','proper','bar','offer','segment','slave','duck','instant','market','degree','populate','chick','dear','enemy','reply','drink','occur','support','speech','nature','range','steam','motion','path','liquid','log','meant','quotient','teeth','shell','neck'])
    for word in string.split(' '):
        if word not in kwDict and word not in avoidables:
            kwDict[word] = 1
        elif word not in avoidables:
            kwDict[word] += 1
    returns = [key for key in kwDict.keys() if kwDict[key]>0]
    return [kw for kw in returns if kw not in avoidables]

解决方法

您可以按照here中的说明使用CountVectorizer中的origin/X方法。

因此，通过代码中的一个具体示例（稍作调整），它看起来可能像这样：

val df = spark.read.option("header","true").csv("C:spark\\sample_data\\*.csv)

get_feature_names()

然后获取关键字：

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

reference = "This is an example resume for a job"
reference = "This is an example reference for a job advertisement"
compare = [resume,reference]
cVect = CountVectorizer()
cMatrix = cVect.fit_transform(compare)

#prints how well the resume matches as a percentage
matPercent = cosine_similarity(cMatrix)[0][1] * 100
matPercent = round(matPercent,2) # round to two decimal
print("Resume is a "+ str(matPercent)+ "% match to the job.")

返回的关键字：

Resume is a 80.18% match to the job.

如果您只希望简历或参考文献中没有其他关键字，那么您可以仅对数据进行cVect.get_feature_names()另一个['advertisement','an','example','for','is','job','reference','resume','this']，然后从中获取关键字。

要记住的重要事情是，您需要“保存”训练有素的CountVectorizer，因此，

fit_transform()

您需要使用

CountVectorizer()

这样您以后就可以访问您的CountVectorizer().fit_transform(compare)实例。

如何使用python的sklearn在文本文件中查找关键字

如何解决如何使用python的sklearn在文本文件中查找关键字

解决方法

相关推荐