如何解决MeSH医学受试者标题数据集mesh.nt在RDFLib Python中不起作用
我正在尝试使用N-Triple格式的MeSH数据集来计算RDFLib的加载时间,遍历时间和查询响应时间。
我试图执行查询两天,但是没有运气。
代码如下:
import time
from rdflib import ConjunctiveGraph
def mem_func():
query1="""
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-Syntax-ns#
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX xsd: http://www.w3.org/2001/XMLSchema#
PREFIX owl: http://www.w3.org/2002/07/owl#
PREFIX meshv: http://id.nlm.nih.gov/mesh/vocab#
PREFIX mesh: http://id.nlm.nih.gov/mesh/
PREFIX mesh2015: http://id.nlm.nih.gov/mesh/2015/
PREFIX mesh2016: http://id.nlm.nih.gov/mesh/2016/
PREFIX mesh2017: http://id.nlm.nih.gov/mesh/2017/
SELECT ?d ?dName ?c ?cName
FROM <http://id.nlm.nih.gov/mesh>
WHERE {
?d a meshv:Descriptor .
?d meshv:active 1 .
?d meshv:concept ?c .
?d rdfs:label ?dName .
?c rdfs:label ?cName
FILTER(REGEX(?dName,"infection","i") || REGEX(?cName,"i"))
}
ORDER BY ?d
"""
g = ConjunctiveGraph()
print("RDFlib DS-2 Loading Started:")
start_time = time.time()
g.parse("mesh.nt",format="nt")
print("RDFlib DS-2 Loading Finished:")
print("--- Loading Time approx %s ---" % (time.time() - start_time))
print("RDF DS-2 Traversal Time Started ")
start_time_traversal = time.time()
file1 = open('mesh.nt','r')
Lines = file1.readlines()
for line in Lines:
print(line.strip())
print("RDF DS-2 Traversal Time Finished ")
print("--- Traversal Time approx %s ---" % (time.time() - start_time_traversal))
results = g.query(query1)
for row in results:
print(row)
if name == "main":
mem_func()
当我的程序到达执行查询的行时,它将引发异常:
https://id.nlm.nih.gov/mesh/!DOCTYPE html does not look like a valid URI,trying to serialize this will break.
https://id.nlm.nih.gov/mesh/html lang="en" does not look like a valid URI,trying to serialize this
will break.
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\parsers\ntriples.py",line 154,in parse
self.parseline()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\parsers\ntriples.py",line 197,in parseline
subject = self.subject()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\parsers\ntriples.py",line 224,in subject
subj = self.uriref() or self.nodeid()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\parsers\ntriples.py",line 243,in uriref
uri = self.eat(r_uriref).group(1)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\parsers\ntriples.py",line 218,in eat
raise ParseError("Failed to eat %s at %s" % (pattern.pattern,self.line))
rdflib.plugins.parsers.ntriples.ParseError: Failed to eat <([^:]+:[^\s"<>]*)> at
During handling of the above exception,another exception occurred:
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\sparql\sparql.py",line 285,in _load
return graph.load(source,format='nt',**kwargs)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\graph.py",line 1085,in load
self.parse(source,publicID,format)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\graph.py",line 1549,in parse
context.parse(source,publicID=publicID,format=format,**args)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\graph.py",line 1078,in parse
parser.parse(source,self,**args)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\parsers\nt.py",line 26,in parse
parser.parse(f)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\parsers\ntriples.py",line 156,in parse
raise ParseError("Invalid line: %r" % self.line)
rdflib.plugins.parsers.ntriples.ParseError: Invalid line: ''
During handling of the above exception,another exception occurred:
Traceback (most recent call last):
File "E:\ToP_RDF_research\rdflib_python_eclipse\src\pyRDF_D2.py",line 135,in
mem_func()
File "E:\ToP_RDF_research\rdflib_python_eclipse\src\pyRDF_D2.py",line 107,in mem_func
results = g.query(query1)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\graph.py",line 1131,in query
return result(processor.query(query_object,initBindings,initNs,**kwargs))
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\sparql\processor.py",line 80,in query
return evalQuery(self.graph,query,base)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\sparql\evaluate.py",line 526,in evalQuery
ctx.load(d.default,default=True)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\sparql\sparql.py",line 299,in load
_load(self.graph,source)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
packages\rdflib\plugins\sparql\sparql.py",line 289,in _load
source))
Exception: Could not load http://id.nlm.nih.gov/mesh as either RDF/XML,N3 or NTriples
到目前为止,我只知道需要大约3个小时的加载时间和14分钟的遍历时间。
我要去哪里了,如何才能成功运行它?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。