我试图解析一个相当小(<100MB)的xml文件:
(require '[clojure.data.xml :as xml] '[clojure.java.io :as io]) (xml/parse (io/reader "data/small-sample.xml"))
OutOfMemoryError Java heap space clojure.lang.Numbers.byte_array (Numbers.java:1216) clojure.tools.nrepl.bencode/read-bytes (bencode.clj:101) clojure.tools.nrepl.bencode/read-netstring* (bencode.clj:153) clojure.tools.nrepl.bencode/read-token (bencode.clj:244) clojure.tools.nrepl.bencode/read-bencode (bencode.clj:254) clojure.tools.nrepl.bencode/token-seq/fn--3178 (bencode.clj:295) clojure.core/repeatedly/fn--4705 (core.clj:4642) clojure.lang.LazySeq.sval (LazySeq.java:42) clojure.lang.LazySeq.seq (LazySeq.java:60) clojure.lang.RT.seq (RT.java:484) clojure.core/seq (core.clj:133) clojure.core/take-while/fn--4236 (core.clj:2564)
这是我的project.clj:
(defproject dats "0.1.0-SNAPSHOT" ... :dependencies [[org.clojure/clojure "1.5.1"] [org.clojure/data.xml "0.0.7"] [criterium "0.4.1"]] :jvm-opts ["-Xmx1g"])
我尝试在.bash_profile中设置LEIN_JVM_OPTS和JVM_OPTS但没有成功.
当我尝试以下project.clj时:
(defproject barber "0.1.0-SNAPSHOT" ... :dependencies [[org.clojure/clojure "1.5.1"] [org.clojure/data.xml "0.0.7"] [criterium "0.4.1"]] :jvm-opts ["-xms128m"])
我收到以下错误:
Error occurred during initialization of VM Incompatible minimum and maximum heap sizes specified Exception in thread "Thread-5" clojure.lang.ExceptionInfo: Subprocess Failed {:exit-code 1}
知道如何增加leiningen repl的堆大小吗?
谢谢.
解决方法
由于Read-Eval-Print-Loop的打印步骤,在repl的顶层评估的任何形式都是完全实现的.它也存储在堆中,以便您以后可以通过* 1访问它.
如果存储返回值,如下所示:
(def解析(xml / parse(io / reader“data / small-sample.xml”)))
这会立即返回,即使对于数百兆字节的文件(我已在本地验证).然后,您可以遍历结果,该结果通过迭代返回的clojure.data.xml.Element树完全实现,因为它是从输入流中解析的.
如果你没有坚持元素(通过绑定它们以便它们仍然可访问),你可以迭代整个结构而不使用更多的ram而不是保存xml树的单个节点.
user> (time (def n (xml/parse (clojure.java.io/reader "/home/justin/clojure/ok/data.xml")))) "Elapsed time: 0.739795 msecs" #'user/n user> (time (keys n)) "Elapsed time: 0.025683 msecs" (:tag :attrs :content) user> (time (-> n :tag)) "Elapsed time: 0.031224 msecs" :catalog user> (time (-> n :attrs)) "Elapsed time: 0.136522 msecs" {} user> (time (-> n :content first)) "Elapsed time: 0.095145 msecs" #clojure.data.xml.Element{:tag :book,:attrs {:id "bk101"},:content (#clojure.data.xml.Element{:tag :author,:attrs {},:content ("Gambardella,Matthew")} #clojure.data.xml.Element{:tag :title,:content ("XML Developer's Guide")} #clojure.data.xml.Element{:tag :genre,:content ("Computer")} #clojure.data.xml.Element{:tag :price,:content ("44.95")} #clojure.data.xml.Element{:tag :publish_date,:content ("2000-10-01")} #clojure.data.xml.Element{:tag :description,:content ("An in-depth look at creating applications \n with XML.")})} user> (time (-> n :content count)) "Elapsed time: 48178.512106 msecs" 459000 user> (time (-> n :content count)) "Elapsed time: 86.931114 msecs" 459000 ;; redefining n so that we can test the performance without the pre-parsing done when we counted user> (time (def n (xml/parse (clojure.java.io/reader "/home/justin/clojure/ok/data.xml")))) "Elapsed time: 0.702885 msecs" #'user/n user> (time (doseq [el (take 100 (drop 100 (-> n :content)))] (println (:tag el)))) :book :book .... ;; output truncated "Elapsed time: 26.019374 msecs" nil user>
请注意,只有当我第一次询问n的内容计数(从而强制解析整个文件)时才会发生巨大的时间延迟.如果我对结构的各个部分进行了测量,则会很快发生这种情况.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。