Working With Large Nested JSON Data

Ankush kunwar
2 min readJan 8, 2023

--

To work with JSON data in Python, you can use the json module. This module provides functions for working with JSON in Python.

Here is an example of how to parse a JSON string in Python:

import json

# Some JSON data
json_data = '{"name": "John", "age": 30, "city": "New York"}'

# Parse the JSON data
data = json.loads(json_data)

# Print the data
print(data)

This will parse the JSON data and store it in a dictionary. You can access the data in the dictionary like this:

name = data['name']
age = data['age']
city = data['city']

Working with large nested JSON data

To extract data from a nested JSON object using recursion, you can use a function that iterates through the object and extracts the desired values. Here is an example of how you might do this:

def extract_values(obj, key):
"""Pull all values of specified key from nested JSON."""
arr = []

def extract(obj, arr, key):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif k == key:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, key)
return arr

results = extract(obj, arr, key)
return results

You can then call the function with a JSON object and the key you want to extract values for, like this:

values = extract_values(json_object, 'key')

This will return a list of all the values for the specified key in the JSON object.

Second method by using Generator(memory efficient):

def item_generator(json_input, lookup_key):
if isinstance(json_input, dict):
for k, v in json_input.items():
if k == lookup_key:
yield v
else:
yield from item_generator(v, lookup_key)
elif isinstance(json_input, list):
for item in json_input:
yield from item_generator(item, lookup_key)

You can then call the function with a JSON object and the key you want to extract values for, like this:

# suppose this 
data = {
"type": "video",
"videoID": "vid001",
"links": [
{"type": "video", "videoID": "vid002", "links": []},
{"type": "video",
"videoID": "vid003",
"links": [
{"type": "video", "videoID": "vid004"},
{"type": "video", "videoID": "vid005"},
]
},
{"type": "video", "videoID": "vid006"},
{"type": "video",
"videoID": "vid007",
"links": [
{"type": "video", "videoID": "vid008", "links": [
{"type": "video",
"videoID": "vid009",
"links": [{"type": "video", "videoID": "vid010"}]
}
]}
]},
]
}


output = []
for i in item_generator(data, "videoID"):
ans = {"videoID": i}
output.append(ans)

print(output)
# output
[{'videoID': 'vid001'}, {'videoID': 'vid002'},
{'videoID': 'vid003'}, {'videoID': 'vid004'},
{'videoID': 'vid005'}, {'videoID': 'vid006'},
{'videoID': 'vid007'}, {'videoID': 'vid008'},
{'videoID': 'vid009'}, {'videoID': 'vid010'}]

Thank you for reading !!!

If you enjoy this article and would like to Buy Me a Coffee, please click here.

you can connect with me on Linkedin.

--

--