算法：如何在没有单独数字的情况下将列表数据拆分为固定大小的子列表

如何解决算法：如何在没有单独数字的情况下将列表数据拆分为固定大小的子列表

我有一个算法问题要问你。不需要应用上下文，我直接举个例子。

这是一个可能的输入：input = [ 1,1,2,3,4,4 ]。让我们假设批量大小为 5。这里的想法是输出最大大小为 5 的列表而没有单独的数字，简而言之：2 个相同的数字不能在单独的子列表中。示例输出：[ [1,1],[2,2],[3,4] ]

假设：数字总是排序的，batch_size 总是大于可能的数字数量

你有比我刚刚找到的更优雅的解决方案吗？

i = 0
batch_size = 5
res = []
while i < len(input):
    # Retrieve the data list according to the batch size
    data = input[i: i + size]
    # Increment the index
    i += size
    # See what's the next output looks like
    future_data = input[i: i + size]
    if future_data and future_data[0] == data[-1]:
        # So we count how many times this number appears in our current list 
        # and subtract that from our index
        cp = data.count(data[-1])
        i -= cp
        # Then remove from the current list all occurrence of that number
        data = data[:-cp]
    res.append(data)

编辑：根据@juanpa.arrivillaga 的回答：

感谢大家的反应和回答。

我继续第 2 集，我在这里给了您我的简化问题，我认为您的解决方案就足够了，尽管您做出了回应，但实际上我不知道如何将 @juanpa.arrivillaga 的解决方案调整为我的数据格式输入看起来更像：

input = { 
    'data_1' : { 
        'id': [1,4],'char': ['A','B','C','D','E','F','G','H','I','J','K','L'] 
    }
}

！'id' 和 'char' 值中的列表大小必须相等！

输出必须如下所示：

[ 
    [1,'A','C'],'G'],'L'] 
]

我知道数据结构不是最佳的，不幸的是我没有掌握它，因此无法更改...

仍然和以前一样约束（批量大小只对 id 起作用，我够清楚了吗？）

解决方法

这是我一次性完成的方法：

>>> import itertools
>>> batch_size = 5
>>> result = [[]]
>>> input_data = [ 1,1,2,3,4,4 ]
>>> for _,g in itertools.groupby(input_data):
...     current = list(g)
...     if len(result[-1]) + len(current) <= batch_size:
...         result[-1].extend(current)
...     else:
...         result.append(current)
...
>>> result
[[1,1],[2,3],[4,4]]

让我们将其分解为中间步骤以帮助理解，首先，这就是 itertools.groupby 热切评估的内容：

>>> import itertools
>>> batch_size = 5
>>> grouped = [list(g) for k,g in itertools.groupby([ 1,4 ])]
>>> grouped
[[1,2],[3,4]]

然后，只需建立您的结果，即子列表列表。如果它可以放入当前子列表中，则将该组添加到子列表中，否则，追加一个由该组组成的新子列表（我们可以假设它不大于batch_size）：

>>> result = [[]]
>>> for group in grouped:
...     if len(result[-1]) + len(group) <= batch_size:
...         result[-1].extend(group)
...     else:
...         result.append(group[:])
...
>>> result
[[1,4]]

上面对数据进行了两次传递，发布的第一个示例进行了一次传递。

请注意，如果使用 itertools.groupby 感觉像是“作弊”，您可以实现一些相对容易的方法：

def simple_groupby(data):
    it = iter(data)
    empty = object()
    current = next(it,empty)
    if current is empty:
        return
    prev = current
    acc = [current]
    for current in it:
        if prev == current:
            acc.append(current)
        else:
            yield acc
            acc = [current]
        prev = current
    yield acc

您可以将 itertools.groupby 与递归生成器函数结合使用，以查找符合您条件的可能合并。这样，您可以更好地处理更模棱两可的情况，其中不清楚哪个“兄弟”组应该吸收双配对和/或截断结果，其中组的长度大于 batch_size:

from itertools import groupby
data = {'data_1': {'id': [1,4],'char': ['A','B','C','D','E','F','G','H','I','J','K','L']}}
batch_size = 5
def get_groups(d,c = [],p = []):
  if not d and not p and all((l:=len(i)) <= batch_size and i and l != 2 for i in c):
     #found valid combo
     yield c
  elif d:
     _p = (k:=(p+d[0]))[(b:=(l if (l:=len(k)) <= batch_size else -1*(l-batch_size))):]
     if l == 2 and not _p:
        #if group size is two,then we have to find possible merges for group
        yield from get_groups(d[1:],c=c,p = k)
        yield from get_groups([c[-1]+k]+d[1:],c=c[:-1],p = [])
     elif _p:
        #group size > batch_size,need to find possible siblings that can take subgroups of groups
        for i in range(batch_size):
           yield from get_groups(d[1:],c=c+[k[:i]],p = k[i:])
           if c and len(c[-1]) + i <= batch_size:
              yield from get_groups(d[1:],c=c[:-1]+[c[-1]+k[batch_size:batch_size+i]]+[k[:batch_size]],p = k[batch_size+i:])
     yield from get_groups(d[1:],c=c+[k[:b]],p = _p)
  elif p:
     yield from get_groups(d[1:],c=c+[p],p = [])

combo = next(get_groups([list(b) for _,b in groupby(data['data_1']['id'])]))
c_m = iter(data['data_1']['char'])
result = [[i for j in [(x,next(c_m)) for x in y] for i in j] for y in combo]

输出：

[[1,'A','C'],'G'],'L']]

如果您不想使用任何库（模块），那么此解决方案适合您。

batch_size = 5
id = list(map(int,input("Enter multiple ids in one line with space between them: ").split()))
char = list(map(str,input("Enter multiple char in one line with space between them: ").split()))
id_char_input = {char[i]: id[i] for i in range(len(id))}
id_unique = list(set([value for key,value in id_char_input.items()]))
group_input = [[] for _ in range(len(id_unique))]
output = []
j = 0

for key,value in id_char_input.items():
    group_input[id_unique.index(value)].append(value)
    group_input[id_unique.index(value)].append(key)

while j < len(group_input):
    if j != len(group_input) - 1:
        if len(group_input[j] + group_input[j + 1]) <= 2*batch_size:
            output.append(group_input[j] + group_input[j + 1])
            j += 2
        else:
            if group_input[j] not in output:
                output.append(group_input[j])
            j += 1
    else:
        output.append(group_input[j])
        break

print(output)

Output:
Enter multiple ids in one line with space between them: 1 1 1 2 2 2 2 3 3 4 4 4
Enter multiple char in one line with space between them: A B C D E F G H I J K L
[[1,'L']]

编辑：现在您可以从输入中获取 id 和 char 的值。

算法：如何在没有单独数字的情况下将列表数据拆分为固定大小的子列表

如何解决算法：如何在没有单独数字的情况下将列表数据拆分为固定大小的子列表

解决方法

相关推荐