将 JSON 格式结构化为指定的数据结构

如何解决将 JSON 格式结构化为指定的数据结构

data_list = [
  '__att_names' : [
        ['id','name'],--> "__t_idx": 0
        ['location','address'] --> "__t_idx": 1
        ['random_key1','random_key2'] "__t_idx": 2
        ['random_key3','random_key4'] "__t_idx": 3
  ]
  "__root": {
      "comparables": [
            "__g_id": "153564396","__atts": [
                1,--> This would be technically __att_names[0][1]
                'somerandomname',--> This would be technically __att_names[0][2]
                {
                    "__atts": [
                        'location_value',--> This would be technically __att_names[1][1]
                        'address_value',--> This would be technically __att_names[1][2]
                        "__atts": [
                        ]
                        "__t_idx": 1 --> It can keep getting nested.. further and further.
                    ]
                    "__t_idx": 1
                }
                {
                    "__atts": [
                        'random_key3value'
                        'random_key3value'
                    ]
                    "__t_idx": 3
                }
                {
                    "__atts": [
                        'random_key1value'
                        'random_key2value'
                    ]
                    "__t_idx": 2
                }
            ],"__t_idx": 0 ---> This maps to the first item in __att_names
    ]
  }
]

在这种情况下我想要的输出是

[
    {
        'id': 1,'name': 'somerandomname','location': 'address_value','random_key1': 'random_key1value','random_key2': 'random_key2value','random_key3': 'random_key3value','random_key4': 'random_key4value',}
]

我能够让它为 __att_names 的前几个嵌套字段工作，但是当我进行嵌套时，我的代码变得非常长而且不稳定，而且感觉非常重复。我觉得有一种更简洁的递归方法可以解决这个问题。

这是我目前的方法：截至目前，以下代码确实首先处理第一个嵌套对象..

payload_names =  data_list['__att_names']
comparable_data = data_list['__root']['comparables']
output_arr = []
for items in comparable_data[:1]:
        output = {}
        index_number = items.get('__t_idx')
        attributes = items.get('__atts')
        if attributes:
            recursive_function(index_number,attributes,payload_names,output)
        output_arr.append(output)


def recursive_function(index,output):
    category_location = payload_names[index]
    for index,categories in enumerate(category_location):
        output[categories] = attributes[index]
        if type(attributes[index]) == dict:
            has_nested_index = attributes[index].get('__t_idx')
            has_nested_attributes = attributes[index].get('__atts')
            if has_nested_attributes and has_nested_index:
                recursive_function(has_nested_index,has_nested_attributes,output)
            else:
                continue

进一步解释给定的例子：

[ {
            'id': 1,}
    ]

特别是 'location': 'address_value', 值 'address_value' 源自 comparables 键数组，该数组具有带有键值对的字典数组。即 __g_id 和 __atts 以及 __t_idx 注意其中一些可能没有 __g_id 但是当有一个键 __atts 时还有 __t_idx它将索引与 __att_names

中的数组映射

总体来说 __att_names 基本上都是不同的键并且可比对象中的所有项目 -> __atts 基本上都是 __att_names 中键名的值。

__t_idx 帮助我们将 __atts 数组项映射到 __att_names 并创建字典键值作为结果。

解决方法

如果你想重构一个复杂的 JSON 对象，我的建议是使用 jq。

您提供的数据确实令人困惑和混淆，因此我不确定您的案例需要什么样的精确过滤。但是您的问题涉及无限嵌套的数据，据我所知。因此，您可以创建一个循环，将数据取消嵌套到您想要的普通结构中，而不是递归函数。已经有一个 question on that topic。

您可以遍历结构，同时跟踪与不是字典的列表元素对应的 __t_idx 键值：

data_list = {'__att_names': [['id','name'],['location','address'],['random_key1','random_key2'],['random_key3','random_key4']],'__root': {'comparables': [{'__g_id': '153564396','__atts': [1,'somerandomname',{'__atts': ['location_value','address_value',{'__atts': [],'__t_idx': 1}],'__t_idx': 1},{'__atts': ['random_key3value','random_key4value'],'__t_idx': 3},{'__atts': ['random_key1value','random_key2value'],'__t_idx': 2}],'__t_idx': 0}]}}
def get_vals(d,f = False,t_idx = None):
   if isinstance(d,dict) and '__atts' in d:
       yield from [i for a,b in d.items() for i in get_vals(b,t_idx = d.get('__t_idx'))]
   elif isinstance(d,list):
       yield from [i for b in d for i in get_vals(b,f = True,t_idx = t_idx)]
   elif f and t_idx is not None:
       yield (d,t_idx)

result = []
for i in data_list['__root']['comparables']:
    new_d = {}
    for a,b in get_vals(i):
       new_d[b] = iter([*new_d.get(b,[]),a])
    result.append({j:next(new_d[i]) for i,a in enumerate(data_list['__att_names']) for j in a})

print(result)

输出：

[
   {'id': 1,'name': 'somerandomname','location': 'location_value','address': 'address_value','random_key1': 'random_key1value','random_key2': 'random_key2value','random_key3': 'random_key3value','random_key4': 'random_key4value'
    }
]