Scala：通过谓词将Iterable分组为Iterarable Iterable

我有非常大的迭代器,我想分成几块.我有一个查看项目的谓词,如果它是新作品的开头,则返回true.我需要将这些碎片作为迭代器,因为即使碎片也不适合记忆.有很多部分,我会担心一个递归的解决方案吹出你的堆栈.情况类似于 this question,但我需要Iterators而不是Lists,并且“sentinels”(谓词为true的项目)在一个片段的开头出现(并且应该包括在内).生成的迭代器将仅按顺序使用,但有些可能根本不使用,它们应该只使用O(1)内存.我想这意味着它们应该共享相同的底层迭代器.表现很重要.

如果我要对函数签名进行攻击,那就是：

def groupby[T](iter: Iterator[T])(startsGroup: T => Boolean): Iterator[Iterator[T]] = ...

我本来喜欢使用takeWhile,但它失去了最后一个元素.我调查了跨度,但它缓冲了结果.我目前最好的想法涉及BufferedIterator,但也许有更好的方法.

你会知道你做得对,因为这样的事情不会让你的JVM崩溃：

groupby((1 to Int.MaxValue).iterator)(_ % (Int.MaxValue / 2) == 0).foreach(group => println(group.sum))
groupby((1 to Int.MaxValue).iterator)(_ % 10 == 0).foreach(group => println(group.sum))

解决方法

你有一个固有的问题. Iterable意味着您可以获得多个迭代器.迭代器意味着你只能通过一次.这意味着你的Iterable [Iterable [T]]应该能够产生Iterator [Iterable [T]] s.但是当它返回一个元素 – 一个Iterable [T] – 并且要求多个迭代器时,底层的单个迭代器无法在没有缓存列表结果(太大)或调用原始迭代并通过的情况下遵守绝对一切(非常低效).

所以,虽然你可以这样做,但我认为你应该以不同的方式设想你的问题.

如果您可以从Seq开始,则可以将子集作为范围.

如果您已经知道如何使用iterable,那么您可以编写一个方法

def process[T](source: Iterable[T])(starts: T => Boolean)(handlers: T => Unit *)

每次启动时,通过一组处理程序递增会触发“true”.如果有任何方法可以在一次扫描中进行处理,那么这样就可以了. (但是,您的处理程序必须通过可变数据结构或变量来保存状态.)

如果你可以允许在外部列表上进行迭代来破坏内部列表,那么你可以拥有一个带有附加约束的Iterable [Iterator [T]],一旦你迭代到后面的子迭代器,所有以前的子迭代器都是无效的.

这是最后一种类型的解决方案(从Iterator [T]到Iterator [Iterator [T]];可以包装它以使外层变为Iterable).

class GroupedBy[T](source: Iterator[T])(starts: T => Boolean)
extends Iterator[Iterator[T]] {
  private val underlying = source
  private var saved: T = _
  private var cached = false
  private var starting = false
  private def cacheNext() {
    saved = underlying.next
    starting = starts(saved)
    cached = true
  }
  private def oops() { throw new java.util.NoSuchElementException("empty iterator") }
  // Comment the next line if you do NOT want the first element to always start a group
  if (underlying.hasNext) { cacheNext(); starting = true }
  def hasNext = {
    while (!(cached && starting) && underlying.hasNext) cacheNext()
    cached && starting
  }
  def next = {
    if (!(cached && starting) && !hasNext) oops()
    starting = false
    new Iterator[T] {
      var presumablyMore = true
      def hasNext = {
        if (!cached && !starting && underlying.hasNext && presumablyMore) cacheNext()
        presumablyMore = cached && !starting
        presumablyMore
      }
      def next = {
        if (presumablyMore && (cached || hasNext)) { 
          cached = false
          saved
        }
        else oops()
      }
    }
  }
}

Scala：通过谓词将Iterable分组为Iterarable Iterable

解决方法

相关推荐