解决方法
示例dih-config.xml
这是一个来自实际工作站点的示例dih-config.xml(这里没有伪样本,我的朋友).请注意,它从LAMP服务器上的本地目录中获取xml文件.如果您希望直接通过HTTP发布xml文件,则需要配置ContentStreamDataSource.
在本示例中,传入的xml已经采用标准的Solr update xml格式,并且所有XSL都会删除空字段节点,而真正的转换,例如从“ignored_seriestitle”构建“ispartof_t”的内容,“使用DIH Regex变换器完成ignored_seriesvolume“和”ignored_seriesissue“. (首先执行XSLT,然后将其输出提供给DIH变换器.)属性“useSolrAddSchema”告诉DIH xml已经是标准的Solr xml格式.如果不是这种情况,则需要在XPathEntityProcessor上使用另一个属性“xpath”来从传入的xml文档中选择内容.
<dataConfig> <dataSource encoding="UTF-8" type="FileDataSource" /> <document> <!-- Pickupdir fetches all files matching the filename regex in the supplied directory and passes them to other entities which parse the file contents. --> <entity name="pickupdir" processor="FileListEntityProcessor" rootEntity="false" dataSource="null" fileName="^[\w\d-]+\.xml$" baseDir="/var/lib/tomcat6/solr/cci/import/" recursive="true" newerThan="${dataimporter.last_index_time}" > <!-- Pickupxmlfile parses standard Solr update XML. Incoming values are split into multiple tokens when given a splitBy attribute. Dates are transformed into valid Solr dates when given a dateTimeFormat to parse. --> <entity name="xml" processor="XPathEntityProcessor" transformer="RegexTransformer,TemplateTransformer" datasource="pickupdir" stream="true" useSolrAddSchema="true" url="${pickupdir.fileAbsolutePath}" xsl="xslt/dih.xsl" > <field column="abstract_t" splitBy="\|" /> <field column="coverage_t" splitBy="\|" /> <field column="creator_t" splitBy="\|" /> <field column="creator_facet" template="${xml.creator_t}" /> <field column="description_t" splitBy="\|" /> <field column="format_t" splitBy="\|" /> <field column="identifier_t" splitBy="\|" /> <field column="ispartof_t" sourceColName="ignored_seriestitle" regex="(.+)" replaceWith="$1" /> <field column="ispartof_t" sourceColName="ignored_seriesvolume" regex="(.+)" replaceWith="${xml.ispartof_t}; vol. $1" /> <field column="ispartof_t" sourceColName="ignored_seriesissue" regex="(.+)" replaceWith="${xml.ispartof_t}; no. $1" /> <field column="ispartof_t" regex="\|" replaceWith=" " /> <field column="language_t" splitBy="\|" /> <field column="language_facet" template="${xml.language_t}" /> <field column="location_display" sourceColName="ignored_class" regex="(.+)" replaceWith="$1" /> <field column="location_display" sourceColName="ignored_location" regex="(.+)" replaceWith="${xml.location_display} $1" /> <field column="location_display" regex="\|" replaceWith=" " /> <field column="othertitles_display" splitBy="\|" /> <field column="publisher_t" splitBy="\|" /> <field column="responsibility_display" splitBy="\|" /> <field column="source_t" splitBy="\|" /> <field column="sourceissue_display" sourceColName="ignored_volume" regex="(.+)" replaceWith="vol. $1" /> <field column="sourceissue_display" sourceColName="ignored_issue" regex="(.+)" replaceWith="${xml.sourceissue_display},no. $1" /> <field column="sourceissue_display" sourceColName="ignored_year" regex="(.+)" replaceWith="${xml.sourceissue_display} ($1)" /> <field column="src_facet" template="${xml.src}" /> <field column="subject_t" splitBy="\|" /> <field column="subject_facet" template="${xml.subject_t}" /> <field column="title_t" sourceColName="ignored_title" regex="(.+)" replaceWith="$1" /> <field column="title_t" sourceColName="ignored_subtitle" regex="(.+)" replaceWith="${xml.title_t} : $1" /> <field column="title_sort" template="${xml.title_t}" /> <field column="toc_t" splitBy="\|" /> <field column="type_t" splitBy="\|" /> <field column="type_facet" template="${xml.type_t}" /> </entity> </entity> </document> </dataConfig>
要设置DIH:
>确保DIH jar从solrconfig.xml引用,因为它们在Solr WAR文件中默认不包含在内.一种简单的方法是在Solr实例目录中创建一个包含DIH jar的lib文件夹,因为solrconfig.xml默认在lib文件夹中查找引用.下载Solr软件包时,在apache-solr-x.x.x / dist文件夹中找到DIH jar.
>在Solr“conf”目录中创建dih-config.xml(如上所示).
>如果已经存在,则向solrconfig.xml添加DIH请求处理程序.
请求处理程序
<requestHandler name="/update/dih" startup="lazy" class="org.apache.solr.handler.dataimport.dataimporthandler"> <lst name="defaults"> <str name="config">dih-config.xml</str> </lst> </requestHandler>
触发DIH:
在Data Import Handler Commands的wiki描述中有更多信息重新完全导入与delta-import以及是否提交,优化等,但以下将触发DIH操作而不首先删除现有索引,并提交更改在处理完所有文件之后.上面给出的示例将收集在拾取目录中找到的所有文件,转换它们,索引它们,最后,将update / s提交到索引(这将使它们可以在即时提交完成时进行搜索).
http://localhost:8983/solr/update/dih?command=full-import&clean=false&commit=true
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。