微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

从 lark-parser 返回的 AST 中删除正则表达式终端

如何解决从 lark-parser 返回的 AST 中删除正则表达式终端

我对使用 lark 解析 website crawler 的典型输出很感兴趣。以下是基于我自己的 github 网站的一些示例输出示例:

--------------------------------------------------------------------
All found URLs:
https://awa5114.github.io/2021/01/12/#step-2-select-location-and-environment-for-new-project
https://awa5114.github.io/2021/01/12/mypy-pycharm.html
https://awa5114.github.io/2021/01/12/#step-6-test-the-template-on-a-script
--------------------------------------------------------------------
All local URLs:
https://awa5114.github.io/2021/01/12/#step-2-select-location-and-environment-for-new-project
https://awa5114.github.io/2021/01/12/mypy-pycharm.html
--------------------------------------------------------------------
All foreign URLs:
https://github.com/awa5114
https://github.com/jekyll/jekyll
https://github.com/jekyll/minima
--------------------------------------------------------------------
All broken URLs:

我使用以下语法:

start: section~4
section: (bar  "All " descriptor " URLs:"  link_list)
link_list: (url)*
descriptor: "found" | "local" | "foreign" | "broken"
url: /.+/
bar: /-{68}/

%import common.NEWLINE
%ignore NEWLINE

在结果树上调用 pretty 会产生以下结果:

start
  section
    bar --------------------------------------------------------------------
    descriptor
    link_list
      url   https://awa5114.github.io/2021/01/12/#step-2-select-location-and-environment-for-new-project
      url   https://awa5114.github.io/2021/01/12/mypy-pycharm.html
      url   https://awa5114.github.io/2021/01/12/#step-6-test-the-template-on-a-script
  section
    bar --------------------------------------------------------------------
    descriptor
    link_list
      url   https://awa5114.github.io/2021/01/12/#step-2-select-location-and-environment-for-new-project
      url   https://awa5114.github.io/2021/01/12/mypy-pycharm.html
  section
    bar --------------------------------------------------------------------
    descriptor
    link_list
      url   https://github.com/awa5114
      url   https://github.com/jekyll/jekyll
      url   https://github.com/jekyll/minima
  section
    bar --------------------------------------------------------------------
    descriptor
    link_list

这看起来不错,但我想在我的树中包含终端 bar。我怎样才能做到这一点?我查看了 docs 并尝试在 bar 前面加上下划线和/或问号,但由于某种原因,这无济于事...

解决方法

我刚刚才发现它。这样做的方法不仅是在 bar 前面加一个下划线,而且还要使其大写,如下所示:

start: section~4
section: (_BAR  "All " descriptor " URLs:"  link_list)
link_list: (url)*
descriptor: "found" | "local" | "foreign" | "broken"
url: /.+/
_BAR: /-{68}/

%import common.NEWLINE
%ignore NEWLINE

结果如下树:

start
  section
    descriptor
    link_list
      url   https://awa5114.github.io/2021/01/12/#step-2-select-location-and-environment-for-new-project
      url   https://awa5114.github.io/2021/01/12/mypy-pycharm.html
      url   https://awa5114.github.io/2021/01/12/#step-6-test-the-template-on-a-script
  section
    descriptor
    link_list
      url   https://awa5114.github.io/2021/01/12/#step-2-select-location-and-environment-for-new-project
      url   https://awa5114.github.io/2021/01/12/mypy-pycharm.html
  section
    descriptor
    link_list
      url   https://github.com/awa5114
      url   https://github.com/jekyll/jekyll
      url   https://github.com/jekyll/minima
  section
    descriptor
    link_list

如果在 lark-parser 文档中明确说明这一点就好了...

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。