在巨大的目录中转换XML时的java.lang.OutOfMemoryError

我想在一个包含很多级别的庞大目录中使用XSLT2转换 XML 文件.有超过100万个文件,每个文件是4到10 kB.过了一会儿我总是收到java.lang.OutOfMemoryError： Java堆空间.

我的命令是：
java -Xmx3072M -XX：UseConcmarkSweepGC -XX：CMSClassUnloadingEna
bled -XX：MaxPermSize = 512M ……

向-Xmx添加更多内存不是一个好的解决方案.

这是我的代码：

for (File file : dir.listFiles()) {
    if (file.isDirectory()) {
        pushDocuments(file);
    } else {
        indexFiles.index(file);
    }
}

public void index(File file) {
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    try {
        xslTransformer.xslTransform(outputStream,file);
        outputStream.flush();
        outputStream.close();
    } catch (IOException e) {
        System.err.println(e.toString());
    }
}

由net.sf.saxon.s9api进行的XSLT转换

public void xslTransform(ByteArrayOutputStream outputStream,File xmlFile) {
    try {
        XdmNode source = proc.newDocumentBuilder().build(new StreamSource(xmlFile));
        Serializer out = proc.newSerializer();
        out.setoutputStream(outputStream);
        transformer.setinitialContextNode(source);
        transformer.setDestination(out);
        transformer.transform();

        out.close();
    } catch (SaxonApiException e) {
        System.err.println(e.toString());
    }
}

解决方法

我对Saxon s9api接口的通常建议是重用XsltExecutable对象,但为每次转换创建一个新的XsltTransformer. XsltTransformer会缓存您已阅读过的文档,以防再次需要它们.在这种情况下,这不是您想要的.

作为替代方案,您可以在每次转换后调用xsltTransformer.getUnderlyingController().clearDocumentPool().

(请注意,您可以在saxonica.plan.io上询问Saxon的问题,这很有可能我们[Saxonica]会注意到并回答它们.您也可以在这里问他们并将它们标记为“saxon”,这意味着我们将会可能在某些时候回答这个问题,但并不总是立即回答.如果你在StackOverflow上询问没有产品特定的标签,那么是否有人会注意到这个问题,这完全是一次又一次.

在巨大的目录中转换XML时的java.lang.OutOfMemoryError

解决方法

相关推荐