SweetViz: Streamlining EDA with Elegant Visualizations¶

SweetViz, an open-source Python library, simplifies the process of Exploratory Data Analysis (EDA) with just two lines of code, generating stunning, high-density visualizations. The output is a fully self-contained HTML application, making it a convenient tool for quick insights into data characteristics and comparisons between datasets.

Key Features:¶

Target Analysis: Visualize how a target value (e.g., "Survived" in the Titanic dataset) correlates with other features, aiding in understanding target characteristics.
Visualize and Compare: Effortlessly compare distinct datasets, such as training versus test data, and explore intra-set characteristics (e.g., male versus female).
Mixed-Type Associations: SweetViz seamlessly integrates associations for numerical (Pearson's correlation), categorical (uncertainty coefficient), and categorical-numerical (correlation ratio) datatypes, providing comprehensive information for diverse data types.
Type Inference: Automatically detects numerical, categorical, and text features, with the option for manual overrides if needed.
Summary Information: Gain insights into type, unique values, missing values, duplicate rows, and the most frequent values within the dataset.
Numerical Analysis: Explore numerical features with statistics such as min/max/range, quartiles, mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, and skewness.

Installation:¶

To get started with SweetViz, install the library along with other necessary dependencies using the following command:

In [ ]:

Copied!

%%capture
! pip install -U sweetviz ydata-profiling
%%capture
! pip install -U sweetviz ydata-profiling

Basic Example:¶

Performing a basic EDA with SweetViz involves just two lines of code. In the example below, we analyze a Pokémon dataset and generate a detailed report:

In [ ]:

Copied!





import sweetviz as sv
import pandas as pd
from ydata_profiling.utils.cache import cache_file


file_name = cache_file(
    "pokemon.csv",
    "https://raw.githubusercontent.com/bryanpaget/html/main/pokemon.csv"
)

pokemon_df = pd.read_csv(file_name)

my_report = sv.analyze(pokemon_df)
my_report.show_notebook()
import sweetviz as sv
import pandas as pd
from ydata_profiling.utils.cache import cache_file


file_name = cache_file(
    "pokemon.csv",
    "https://raw.githubusercontent.com/bryanpaget/html/main/pokemon.csv"
)

pokemon_df = pd.read_csv(file_name)

my_report = sv.analyze(pokemon_df)
my_report.show_notebook()

Comparing Two Dataframes:¶

SweetViz extends its capabilities to compare two datasets, such as training and test sets. In this more complex example, we examine the differences in the "Attack" feature:

In [ ]:

Copied!





from sklearn.model_selection import train_test_split

X = pokemon_df[['Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']]
y = pokemon_df[['Type 1', 'Type 2']]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

train_df = X_train
test_df = X_test

comparison_report = sv.compare(train_df, test_df, target_feat='Attack')
comparison_report.show_notebook()
from sklearn.model_selection import train_test_split

X = pokemon_df[['Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']]
y = pokemon_df[['Type 1', 'Type 2']]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

train_df = X_train
test_df = X_test

comparison_report = sv.compare(train_df, test_df, target_feat='Attack')
comparison_report.show_notebook()

Empower your data exploration journey with SweetViz, creating visually appealing and informative reports with minimal effort. Whether you're a data scientist, student, or researcher, SweetViz offers a user-friendly approach to gain quick insights into your datasets.