Multiprocessing vs multithreading with use case and Example:- Must Known Concept!!!

Ankush kunwar

4 min readAug 10, 2024

1. Understanding Multiprocessing and Multithreading:

Multiprocessing:

Definition: Multiprocessing involves the use of two or more CPUs within a single computer system. In Python, this is achieved using the multiprocessing module, which allows multiple processes to run concurrently.
How It Works: Each process runs independently, with its own memory space. This means that multiple processes can run in parallel on separate CPUs or cores, allowing for true parallelism.
Use Case: Multiprocessing is particularly useful for CPU-bound tasks, such as mathematical computations, data processing, or any operation where the task needs significant CPU time.
Example: If you’re running a computation-heavy program like processing large datasets, using multiprocessing can help distribute the workload across multiple processors, thereby speeding up execution.

Multithreading:

Definition: Multithreading involves running multiple threads (smaller units of a process) within a single process. Python supports multithreading through the threading module.
How It Works: Threads share the same memory space but run concurrently. Due to Python’s Global Interpreter Lock (GIL), true parallelism isn’t achieved in multithreading; instead, threads take turns executing.
Use Case: Multithreading is most effective for I/O-bound tasks, such as reading/writing files, network operations, or user interactions where the task involves waiting for an external resource.
Example: When developing a web scraper that reads data from multiple websites, using multithreading allows one thread to wait for a network response while another thread processes data, leading to faster overall execution.

2. When to Use Multiprocessing vs. Multithreading:

Use Multiprocessing When:

The task is CPU-bound.
You need true parallelism.
The task can be divided into independent processes.
Example: Image processing, large-scale data computations.

Use Multithreading When:

The task is I/O-bound.
You are dealing with tasks that involve waiting (e.g., network requests).
The task requires shared resources or memory.
Example: Real-time applications, web servers, file I/O operations.

3. Code Examples:

Multiprocessing Example:

import multiprocessing

def compute_square(num):
    return num * num

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]
    pool = multiprocessing.Pool(processes=4)
    result = pool.map(compute_square, numbers)
    print(result)

Multithreading Example:

import threading

def print_square(num):
    print(f"Square: {num * num}")

if __name__ == "__main__":
    threads = []
    for i in range(5):
        thread = threading.Thread(target=print_square, args=(i,))
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

Lets deep dive more

real world scenario

suppose you are getting list of id from api and each it have 3000 or more document in dB and you have to move all document from one place to another how you can do efficiently ? (so many solution we have but here i will tell just one using multithreading)

Solution:-

To efficiently handle the task of moving 3000 documents for each ID retrieved from an API, you can use a combination of multithreading and batch processing. The approach involves the following steps:

Fetch IDs from the API: Retrieve the list of IDs from the API.
Fetch Document Paths: For each ID, retrieve or generate the list of document paths to be moved.
Use Multithreading to Move Documents: Move the documents concurrently using multithreading.

Implementation Steps

Fetching IDs and Document Paths:

Make an API call to retrieve the list of IDs.
For each ID, retrieve the corresponding document paths that need to be moved.

2. Moving Documents with Multithreading:

Use a thread pool to move documents in parallel.
Batch the document moves to optimize resource usage

Implementation

Python Code Example (this is one way you can do multiple way)

import requests
import os
import shutil
import threading
from queue import Queue

# Step 1: Fetch IDs from API
def fetch_ids(api_url):
    response = requests.get(api_url)
    if response.status_code == 200:
        return response.json()  # Assuming the response is a JSON array of IDs
    else:
        response.raise_for_status()

# Step 2: Fetch document paths for each ID
def fetch_document_paths(id):
    # This function should return a list of file paths for the given ID
    # Example:
    return [f"/path_to_source_directory/{id}/doc_{i}.txt" for i in range(3000)]

# Step 3: Move files using multithreading
def move_file(file_path, destination_dir):
    try:
        shutil.move(file_path, destination_dir)
        print(f"Moved: {file_path}")
    except Exception as e:
        print(f"Error moving {file_path}: {e}")

def worker(queue, destination_dir):
    while not queue.empty():
        file_path = queue.get()
        move_file(file_path, destination_dir)
        queue.task_done()

def move_documents_for_id(id, destination_dir, num_threads=10):
    # Get the list of document paths for this ID
    document_paths = fetch_document_paths(id)
    
    # Create a thread-safe queue
    file_queue = Queue()
    
    # Add all document paths to the queue
    for file_path in document_paths:
        file_queue.put(file_path)
    
    # Create threads
    threads = []
    for _ in range(num_threads):
        thread = threading.Thread(target=worker, args=(file_queue, destination_dir))
        thread.start()
        threads.append(thread)
    
    # Wait for all threads to finish
    for thread in threads:
        thread.join()
    
    print(f"All documents for ID {id} moved successfully!")

# Main function to execute the full operation
def main(api_url, destination_dir):
    ids = fetch_ids(api_url)
    
    for id in ids:
        print(f"Processing ID: {id}")
        move_documents_for_id(id, destination_dir)

# Example usage
if __name__ == "__main__":
    API_URL = "https://api.example.com/get_ids"  # Replace with your API endpoint
    DESTINATION_DIR = "/path_to_destination_directory"  # Replace with your destination directory

    main(API_URL, DESTINATION_DIR)

Thank you for reading !!!

If you enjoy this article and would like to Buy Me a Coffee, please click here.

you can connect with me on Linkedin.

Multiprocessing vs multithreading with use case and Example:- Must Known Concept!!!

real world scenario

Solution:-

Implementation Steps

Thank you for reading !!!

Written by Ankush kunwar

No responses yet