A command line tool to get csv summaries using Sweetviz

Working with data involves handling a lot of data files, most commonly CSV files.

It can be very helpful to do some exploratory data analysis on the CSV file to know what kind of data we are working with. Exploratory data analysis can involve finding out such things as how many rows of data there are, what the columns and how many as well as how often certain values appear.

Luckily there is a Python package called Sweetviz which calculates a lot of the things we want to know and presents them in a html file.

Incorporating the Sweetviz library into a python script allows us to easily see the details of a csv file with a simple command in the terminal:

Below is the full code to a python script called ‘sweetviz_summary.py’ to create a summary report of a csv file and saves HTML output:

# /// script
# requires-python = "==3.12"
# dependencies = [
#   "sweetviz",
#   "pandas",
#   "click",
#   "setuptools",
#   "pathlib",
#   "numpy<2",
# ]
# ///

import sweetviz as sv
import pandas as pd
import click
from pathlib import Path
from datetime import datetime


@click.command()
@click.option("--file_path", help="The path to the CSV file to summarize.")
@click.option(
    "--save_location",
    default="/Users/msmith16/Documents/sweetviz_summaries",
    help="Folder to save the summary report.",
)
def create_summary_report(file_path, save_location):
    """
    Save a summary report of a CSV file using Sweetviz.

    """

    # Get the file name from the file path
    my_path = Path(file_path)
    file_name = my_path.stem

    # Create the output folder if it does not exist
    output_folder = Path(save_location)
    output_folder.mkdir(parents=True, exist_ok=True)

    # current date and time as string
    now = datetime.now()
    dt_string = now.strftime("%Y%m%d%H%M%S")

    # ensure the file path refers to a CSV file
    if not file_path.endswith(".csv"):
        raise ValueError("The file path must refer to a CSV file.")

    # Read in the CSV file as dataframes
    df1 = pd.read_csv(file_path)

    # Create the summary report
    report = sv.analyze(df1)

    # Save the report to an HTML file and open it in the default browser
    report.show_html(output_folder / f"{dt_string}-{file_name}.html")


if __name__ == "__main__":
    create_summary_report()

The python script can be executed by using uv run. (the iris.csv file is in the same directory)

uv run sweetviz_summary.py --file_path iris.csv

An example of the output HTML:

Send a Comment

Your email address will not be published.