Name: Data Visualization with Matplotlib & Seaborn
Author: Alderi KAMTCHOUA

Setup and Data

📖 Term: Figure and Axes

Definition: In Matplotlib, a figure is the overall window containing everything. Axes (or subplots) are individual plot areas inside the figure. You can have multiple axes in one figure.

Purpose: Organize plots hierarchically — a figure contains one or more axes.

Why here: Understanding the figure/axes distinction is crucial for creating complex layouts. fig, (ax1, ax2) = plt.subplots(1, 2) creates a figure with 2 axes side by side (1 row, 2 columns).

📖 Term: Subplot

Definition: A subplot (or sub-plot) is an individual plot area within a figure. plt.subplots(2, 1) creates a 2×1 grid (2 rows, 1 column) with 2 axes.

Purpose: Create multiple layouts to compare several visualizations side by side or stacked.

Why here: Subplots allow showing multiple perspectives of data in one figure — price + volume, distribution + boxplot, etc. It's more informative than a single chart.

setup.py

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import pandas as pd
import numpy as np

# ── Global style ──
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_theme(style='darkgrid', palette='husl')

COLORS = {
    'primary': '#2196F3',
    'success': '#4CAF50',
    'danger': '#F44336',
    'warning': '#FF9800',
    'dark': '#212121'
}

# ── Generate realistic financial data ──
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=252, freq='B')  # Business days

# Simulate prices with geometric Brownian motion
returns = np.random.normal(0.0003, 0.015, 252)
prices = 100 * np.exp(np.cumsum(returns))

df = pd.DataFrame({
    'date': dates,
    'close': prices,
    'open': prices * (1 + np.random.normal(0, 0.003, 252)),
    'volume': np.random.randint(500000, 3000000, 252),
    'revenue': np.random.randint(50000, 200000, 252)
})
df['returns'] = df['close'].pct_change()
df['MA20'] = df['close'].rolling(20).mean()
df['MA50'] = df['close'].rolling(50).mean()
df = df.set_index('date')

📖 Term: Geometric Brownian Motion

Definition: Mathematical model for asset prices. Prices change proportionally to themselves, and returns (log-returns) are normally distributed. Formula: price = 100 * exp(cumsum(returns)).

Purpose: Simulate realistic asset prices with trend and realistic volatility.

Why here: This is the standard model used in finance (Black-Scholes model). More realistic than a simple random walk — prices can't be negative, and volatility increases with price (GBM property).

chart_price.py

# ── Figure with 2 subplots (price + volume) ──
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8),
                                  gridspec_kw={'height_ratios': [3, 1]},
                                  sharex=True)
fig.suptitle('Price Analysis — 2023', fontsize=16, fontweight='bold', y=0.98)

# ── Price with colored zones (positive/negative) ──
ax1.plot(df.index, df['close'], color=COLORS['primary'], linewidth=1.5, label='Price', zorder=3)
ax1.plot(df.index, df['MA20'], color=COLORS['warning'], linewidth=1.2, label='MA20', linestyle='--')
ax1.plot(df.index, df['MA50'], color=COLORS['danger'], linewidth=1.2, label='MA50', linestyle='-.')

# Zone between MA20 and MA50 (crossover signal)
ax1.fill_between(df.index, df['MA20'], df['MA50'],
                  where=(df['MA20'] >= df['MA50']),
                  alpha=0.15, color=COLORS['success'], label='Bullish trend')
ax1.fill_between(df.index, df['MA20'], df['MA50'],
                  where=(df['MA20'] < df['MA50']),
                  alpha=0.15, color=COLORS['danger'], label='Bearish trend')

ax1.set_ylabel('Price ($)', fontsize=12)
ax1.legend(loc='upper left')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:.0f}'))

# ── Volume in colored bars ──
colors = [COLORS['success'] if r >= 0 else COLORS['danger'] for r in df['returns']]
ax2.bar(df.index, df['volume'] / 1e6, color=colors, alpha=0.7, width=0.8)
ax2.set_ylabel('Volume (M)', fontsize=10)

# Format dates on X axis
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
ax2.xaxis.set_major_locator(mdates.MonthLocator())
plt.xticks(rotation=45)

plt.tight_layout()
plt.savefig('chart_financial.png', dpi=150, bbox_inches='tight')
plt.show()

This block creates a 2×1 figure with two subplots: price (3x larger) + volume (1x). sharex=True aligns the X axes. fill_between colors the zone between two lines based on a condition: green if MA20 > MA50 (bullish trend), red otherwise. Volume bars are also colored by daily return (green if positive, red if negative).

2. Returns Distribution with Seaborn

distribution.py

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ── Histogram + KDE ──
returns_clean = df['returns'].dropna() * 100
sns.histplot(returns_clean, bins=40, kde=True, ax=axes[0],
             color=COLORS['primary'], alpha=0.7)

# Reference lines for statistics
mean_r = returns_clean.mean()
std_r = returns_clean.std()
axes[0].axvline(mean_r, color='red', linestyle='--', label=f'Mean: {mean_r:.3f}%')
axes[0].axvline(mean_r - 2*std_r, color='orange', linestyle=':', label=f'VaR 95%: {mean_r-2*std_r:.2f}%')
axes[0].set_title('Distribution of Daily Returns', fontsize=13)
axes[0].set_xlabel('Return (%)')
axes[0].legend()

# ── Monthly boxplot ──
df_monthly = df['returns'].dropna() * 100
monthly_data = df_monthly.groupby(df_monthly.index.month).apply(list)
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

box_data = [monthly_data[m] for m in sorted(monthly_data.index)]
bp = axes[1].boxplot(box_data, patch_artist=True, labels=months[:len(box_data)])

for patch, median in zip(bp['boxes'], bp['medians']):
    median_val = median.get_ydata()[0]
    patch.set_facecolor(COLORS['success'] if median_val >= 0 else COLORS['danger'])
    patch.set_alpha(0.7)

axes[1].axhline(y=0, color='black', linestyle='-', linewidth=0.8)
axes[1].set_title('Monthly Returns — Boxplot', fontsize=13)
axes[1].set_ylabel('Return (%)')

plt.tight_layout()
plt.savefig('distribution_returns.png', dpi=150, bbox_inches='tight')
plt.show()

📖 Term: KDE (Kernel Density Estimate)

Definition: Non-parametric technique for estimating probability density. Instead of fixed bars (histogram), KDE smooths the distribution using Gaussian kernels around each data point.

Purpose: Visualize the smooth distribution of data without histogram noise.

Why here: KDE vs Histogram: histograms show discrete "bins" — the result depends on bin count. KDE is smooth and bin-independent — better for seeing the true distribution.

The KDE curve (smooth line) shows the true distribution of returns. Reference lines display the mean (red) and Value-at-Risk 95% (orange) — important in finance for quantifying risk. The boxplot below shows returns month by month: box = interquartile range (IQR), line = median, whiskers = outliers.

📖 Term: Boxplot

Definition: Chart that summarizes the distribution of a continuous variable. The box extends from Q1 (25th percentile) to Q3 (75th percentile), with a line in the middle (median/Q2). "Whiskers" extend to Q1-1.5×IQR and Q3+1.5×IQR, beyond which points are outliers.

Purpose: Visualize central tendency, spread, and outliers of a distribution.

Why here: Boxplot is compact and comparative — ideal for showing multiple distributions side by side (here: by month). The 50% IQR quickly shows where 50% of data lies.

3. Correlation Heatmap

heatmap.py

# Simulate a multi-asset portfolio
assets = ['AAPL', 'GOOG', 'MSFT', 'TSLA', 'AMZN', 'META', 'NVDA', 'BTC']
portfolio = pd.DataFrame(
    np.random.randn(252, 8) @ np.linalg.cholesky(
        np.clip(np.random.randn(8, 8), -1, 1).T @ np.random.randn(8, 8) / 8 + np.eye(8) * 0.5,
    ).T,
    columns=assets
)

corr = portfolio.corr()

fig, ax = plt.subplots(figsize=(10, 8))
mask = np.triu(np.ones_like(corr, dtype=bool))  # Mask upper triangle
sns.heatmap(corr,
            mask=mask,
            annot=True, fmt='.2f',
            cmap='RdYlGn',  # Red (negative correlation) → green (positive)
            vmin=-1, vmax=1, center=0,
            square=True,
            linewidths=0.5,
            ax=ax)
ax.set_title('Portfolio Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('correlation_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()

📖 Term: Heatmap

Definition: Chart displaying a 2D matrix with colors representing values. Colors typically range from blue (low) to red (high) or red (negative) to green (positive).

Purpose: Visualize patterns and magnitudes in 2D tabular data — ideal for correlation matrices, cross-tabulations, etc.

Why here: An 8×8 correlation matrix is hard to read in raw numbers. The RdYlGn heatmap immediately shows: red = negative correlation (assets inversely related), green = positive correlation (assets linked).

The heatmap displays the symmetric correlation matrix. The mask eliminates the upper triangle (redundant data). Annotations (fmt='.2f') display exact values. The RdYlGn colormap (Red-Yellow-Green) with center=0 makes it intuitive: red = negative correlation (diversification), green = positive correlation (concentrated risk).

For PDF reports or presentations, export with dpi=300 for print quality (sharp, readable even zoomed). For web and email, dpi=72 suffices and generates lighter files. The bbox_inches='tight' option removes white margins around the chart.

Data Visualization with Financial Charts

Setup and Data

2. Returns Distribution with Seaborn

3. Correlation Heatmap