Working With Large Nested JSON Data

Ankush kunwar
2 min readJan 8, 2023

To work with JSON data in Python, you can use the json module. This module provides functions for working with JSON in Python.

Here is an example of how to parse a JSON string in Python:

import json

# Some JSON data
json_data = '{"name": "John", "age": 30, "city": "New York"}'

# Parse the JSON data
data = json.loads(json_data)

# Print the data
print(data)

This will parse the JSON data and store it in a dictionary. You can access the data in the dictionary like this:

name = data['name']
age = data['age']
city = data['city']

Working with large nested JSON data

To extract data from a nested JSON object using recursion, you can use a function that iterates through the object and extracts the desired values. Here is an example of how you might do this:

def extract_values(obj, key):
"""Pull all values of specified key from nested JSON."""
arr = []

def extract(obj, arr, key):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif k == key:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, key)
return arr

results = extract(obj, arr, key)
return results

You can then call the function with a JSON object and the key you want to extract values for, like this:

values = extract_values(json_object, 'key')

This will return a list of all the values for the specified key in the JSON object.

Second method by using Generator(memory efficient):

def item_generator(json_input, lookup_key):
if isinstance(json_input, dict):
for k, v in json_input.items():
if k == lookup_key:
yield v
else:
yield from item_generator(v, lookup_key)
elif isinstance(json_input, list):
for item in json_input:
yield from item_generator(item, lookup_key)

You can then call the function with a JSON object and the key you want to extract values for, like this:

# suppose this 
data = {
"type": "video",
"videoID": "vid001",
"links": [
{"type": "video", "videoID": "vid002", "links": []},
{"type": "video",
"videoID": "vid003",
"links": [
{"type": "video", "videoID": "vid004"},
{"type": "video", "videoID": "vid005"},
]
},
{"type": "video", "videoID": "vid006"},
{"type": "video",
"videoID": "vid007",
"links": [
{"type": "video", "videoID": "vid008", "links": [
{"type": "video",
"videoID": "vid009",
"links": [{"type": "video", "videoID": "vid010"}]
}
]}
]},
]
}


output = []
for i in item_generator(data, "videoID"):
ans = {"videoID": i}
output.append(ans)

print(output)
# output
[{'videoID': 'vid001'}, {'videoID': 'vid002'},
{'videoID': 'vid003'}, {'videoID': 'vid004'},
{'videoID': 'vid005'}, {'videoID': 'vid006'},
{'videoID': 'vid007'}, {'videoID': 'vid008'},
{'videoID': 'vid009'}, {'videoID': 'vid010'}]

Thank you for reading !!!

If you enjoy this article and would like to Buy Me a Coffee, please click here.

you can connect with me on Linkedin.

--

--