MeSH医学受试者标题数据集mesh.nt在RDFLib Python中不起作用

如何解决MeSH医学受试者标题数据集mesh.nt在RDFLib Python中不起作用

我正在尝试使用N-Triple格式的MeSH数据集来计算RDFLib的加载时间,遍历时间和查询响应时间。

我试图执行查询两天,但是没有运气。

代码如下:

import time
from rdflib import ConjunctiveGraph
def mem_func():
query1="""
    PREFIX rdf: http://www.w3.org/1999/02/22-rdf-Syntax-ns#
    PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
    PREFIX xsd: http://www.w3.org/2001/XMLSchema#
    PREFIX owl: http://www.w3.org/2002/07/owl#
    PREFIX meshv: http://id.nlm.nih.gov/mesh/vocab#
    PREFIX mesh: http://id.nlm.nih.gov/mesh/
    PREFIX mesh2015: http://id.nlm.nih.gov/mesh/2015/
    PREFIX mesh2016: http://id.nlm.nih.gov/mesh/2016/
    PREFIX mesh2017: http://id.nlm.nih.gov/mesh/2017/

     SELECT ?d ?dName ?c ?cName
     FROM <http://id.nlm.nih.gov/mesh>
     
     WHERE {
       
     ?d a meshv:Descriptor .
     ?d meshv:active 1 .
     ?d meshv:concept ?c .
     ?d rdfs:label ?dName .
     ?c rdfs:label ?cName
     FILTER(REGEX(?dName,"infection","i") || REGEX(?cName,"i"))
     
     }
     ORDER BY ?d
    """
g = ConjunctiveGraph()

print("RDFlib DS-2 Loading Started:")
start_time = time.time()

g.parse("mesh.nt",format="nt")
print("RDFlib DS-2 Loading Finished:")

print("--- Loading Time approx %s ---" % (time.time() - start_time))
print("RDF DS-2 Traversal Time Started ")

start_time_traversal = time.time()
file1 = open('mesh.nt','r')
Lines = file1.readlines()

for line in Lines:
    print(line.strip())

print("RDF DS-2 Traversal Time Finished ")
print("--- Traversal Time approx %s ---" % (time.time() - start_time_traversal))

results = g.query(query1)

for row in results:
     print(row)

if name == "main":
mem_func()

当我的程序到达执行查询的行时,它将引发异常:

https://id.nlm.nih.gov/mesh/!DOCTYPE html does not look like a valid URI,trying to serialize this will break.
        https://id.nlm.nih.gov/mesh/html lang="en" does not look like a valid URI,trying to serialize this 
        will break.
        Traceback (most recent call last):
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site-
        packages\rdflib\plugins\parsers\ntriples.py",line 154,in parse
        self.parseline()
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\parsers\ntriples.py",line 197,in parseline
        subject = self.subject()
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\parsers\ntriples.py",line 224,in subject
        subj = self.uriref() or self.nodeid()
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\parsers\ntriples.py",line 243,in uriref
        uri = self.eat(r_uriref).group(1)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\parsers\ntriples.py",line 218,in eat
        raise ParseError("Failed to eat %s at %s" % (pattern.pattern,self.line))
        rdflib.plugins.parsers.ntriples.ParseError: Failed to eat <([^:]+:[^\s"<>]*)> at
       
        During handling of the above exception,another exception occurred:
       
        Traceback (most recent call last):
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\sparql\sparql.py",line 285,in _load
        return graph.load(source,format='nt',**kwargs)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\graph.py",line 1085,in load
        self.parse(source,publicID,format)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\graph.py",line 1549,in parse
        context.parse(source,publicID=publicID,format=format,**args)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\graph.py",line 1078,in parse
        parser.parse(source,self,**args)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\parsers\nt.py",line 26,in parse
        parser.parse(f)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\parsers\ntriples.py",line 156,in parse
        raise ParseError("Invalid line: %r" % self.line)
        rdflib.plugins.parsers.ntriples.ParseError: Invalid line: ''
       
        During handling of the above exception,another exception occurred:
       
        Traceback (most recent call last):
        File "E:\ToP_RDF_research\rdflib_python_eclipse\src\pyRDF_D2.py",line 135,in
        mem_func()
        File "E:\ToP_RDF_research\rdflib_python_eclipse\src\pyRDF_D2.py",line 107,in mem_func
        results = g.query(query1)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\graph.py",line 1131,in query
        return result(processor.query(query_object,initBindings,initNs,**kwargs))
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\sparql\processor.py",line 80,in query
        return evalQuery(self.graph,query,base)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\sparql\evaluate.py",line 526,in evalQuery
        ctx.load(d.default,default=True)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\sparql\sparql.py",line 299,in load
        _load(self.graph,source)
        File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\python37_64\lib\site- 
        packages\rdflib\plugins\sparql\sparql.py",line 289,in _load
        source))
        Exception: Could not load http://id.nlm.nih.gov/mesh as either RDF/XML,N3 or NTriples

到目前为止,我只知道需要大约3个小时的加载时间和14分钟的遍历时间。

我要去哪里了,如何才能成功运行它?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?
Java在半透明框架/面板/组件上重新绘画。
Java“ Class.forName()”和“ Class.forName()。newInstance()”之间有什么区别?
在此环境中不提供编译器。也许是在JRE而不是JDK上运行?
Java用相同的方法在一个类中实现两个接口。哪种接口方法被覆盖?
Java 什么是Runtime.getRuntime()。totalMemory()和freeMemory()?
java.library.path中的java.lang.UnsatisfiedLinkError否*****。dll
JavaFX“位置是必需的。” 即使在同一包装中
Java 导入两个具有相同名称的类。怎么处理?
Java 是否应该在HttpServletResponse.getOutputStream()/。getWriter()上调用.close()?
Java RegEx元字符(。)和普通点?