如何解决算法:如何在没有单独数字的情况下将列表数据拆分为固定大小的子列表
我有一个算法问题要问你。不需要应用上下文,我直接举个例子。
这是一个可能的输入:input = [ 1,1,2,3,4,4 ]。 让我们假设批量大小为 5。 这里的想法是输出最大大小为 5 的列表而没有单独的数字,简而言之:2 个相同的数字不能在单独的子列表中。 示例输出:[ [1,1],[2,2],[3,4] ]
假设:数字总是排序的,batch_size 总是大于可能的数字数量
你有比我刚刚找到的更优雅的解决方案吗?
i = 0
batch_size = 5
res = []
while i < len(input):
# Retrieve the data list according to the batch size
data = input[i: i + size]
# Increment the index
i += size
# See what's the next output looks like
future_data = input[i: i + size]
if future_data and future_data[0] == data[-1]:
# So we count how many times this number appears in our current list
# and subtract that from our index
cp = data.count(data[-1])
i -= cp
# Then remove from the current list all occurrence of that number
data = data[:-cp]
res.append(data)
编辑:根据@juanpa.arrivillaga 的回答:
感谢大家的反应和回答。
我继续第 2 集,我在这里给了您我的简化问题,我认为您的解决方案就足够了,尽管您做出了回应,但实际上我不知道如何将 @juanpa.arrivillaga 的解决方案调整为我的数据格式输入看起来更像:
input = {
'data_1' : {
'id': [1,4],'char': ['A','B','C','D','E','F','G','H','I','J','K','L']
}
}
!'id' 和 'char' 值中的列表大小必须相等!
输出必须如下所示:
[
[1,'A','C'],'G'],'L']
]
我知道数据结构不是最佳的,不幸的是我没有掌握它,因此无法更改...
仍然和以前一样约束(批量大小只对 id 起作用,我够清楚了吗?)
解决方法
这是我一次性完成的方法:
>>> import itertools
>>> batch_size = 5
>>> result = [[]]
>>> input_data = [ 1,1,2,3,4,4 ]
>>> for _,g in itertools.groupby(input_data):
... current = list(g)
... if len(result[-1]) + len(current) <= batch_size:
... result[-1].extend(current)
... else:
... result.append(current)
...
>>> result
[[1,1],[2,3],[4,4]]
让我们将其分解为中间步骤以帮助理解,首先,这就是 itertools.groupby
热切评估的内容:
>>> import itertools
>>> batch_size = 5
>>> grouped = [list(g) for k,g in itertools.groupby([ 1,4 ])]
>>> grouped
[[1,2],[3,4]]
然后,只需建立您的结果,即子列表列表。如果它可以放入当前子列表中,则将该组添加到子列表中,否则,追加一个由该组组成的新子列表(我们可以假设它不大于batch_size):
>>> result = [[]]
>>> for group in grouped:
... if len(result[-1]) + len(group) <= batch_size:
... result[-1].extend(group)
... else:
... result.append(group[:])
...
>>> result
[[1,4]]
上面对数据进行了两次传递,发布的第一个示例进行了一次传递。
请注意,如果使用 itertools.groupby
感觉像是“作弊”,您可以实现一些相对容易的方法:
def simple_groupby(data):
it = iter(data)
empty = object()
current = next(it,empty)
if current is empty:
return
prev = current
acc = [current]
for current in it:
if prev == current:
acc.append(current)
else:
yield acc
acc = [current]
prev = current
yield acc
,
您可以将 itertools.groupby
与递归生成器函数结合使用,以查找符合您条件的可能合并。这样,您可以更好地处理更模棱两可的情况,其中不清楚哪个“兄弟”组应该吸收双配对和/或截断结果,其中组的长度大于 batch_size
:
from itertools import groupby
data = {'data_1': {'id': [1,4],'char': ['A','B','C','D','E','F','G','H','I','J','K','L']}}
batch_size = 5
def get_groups(d,c = [],p = []):
if not d and not p and all((l:=len(i)) <= batch_size and i and l != 2 for i in c):
#found valid combo
yield c
elif d:
_p = (k:=(p+d[0]))[(b:=(l if (l:=len(k)) <= batch_size else -1*(l-batch_size))):]
if l == 2 and not _p:
#if group size is two,then we have to find possible merges for group
yield from get_groups(d[1:],c=c,p = k)
yield from get_groups([c[-1]+k]+d[1:],c=c[:-1],p = [])
elif _p:
#group size > batch_size,need to find possible siblings that can take subgroups of groups
for i in range(batch_size):
yield from get_groups(d[1:],c=c+[k[:i]],p = k[i:])
if c and len(c[-1]) + i <= batch_size:
yield from get_groups(d[1:],c=c[:-1]+[c[-1]+k[batch_size:batch_size+i]]+[k[:batch_size]],p = k[batch_size+i:])
yield from get_groups(d[1:],c=c+[k[:b]],p = _p)
elif p:
yield from get_groups(d[1:],c=c+[p],p = [])
combo = next(get_groups([list(b) for _,b in groupby(data['data_1']['id'])]))
c_m = iter(data['data_1']['char'])
result = [[i for j in [(x,next(c_m)) for x in y] for i in j] for y in combo]
输出:
[[1,'A','C'],'G'],'L']]
,
如果您不想使用任何库(模块),那么此解决方案适合您。
batch_size = 5
id = list(map(int,input("Enter multiple ids in one line with space between them: ").split()))
char = list(map(str,input("Enter multiple char in one line with space between them: ").split()))
id_char_input = {char[i]: id[i] for i in range(len(id))}
id_unique = list(set([value for key,value in id_char_input.items()]))
group_input = [[] for _ in range(len(id_unique))]
output = []
j = 0
for key,value in id_char_input.items():
group_input[id_unique.index(value)].append(value)
group_input[id_unique.index(value)].append(key)
while j < len(group_input):
if j != len(group_input) - 1:
if len(group_input[j] + group_input[j + 1]) <= 2*batch_size:
output.append(group_input[j] + group_input[j + 1])
j += 2
else:
if group_input[j] not in output:
output.append(group_input[j])
j += 1
else:
output.append(group_input[j])
break
print(output)
Output:
Enter multiple ids in one line with space between them: 1 1 1 2 2 2 2 3 3 4 4 4
Enter multiple char in one line with space between them: A B C D E F G H I J K L
[[1,'L']]
编辑:现在您可以从输入中获取 id 和 char 的值。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。