Visualization is the art of transforming raw data into understandable insights. This tutorial produces professional charts for financial data โ the kind of charts you see in analysis reports and dashboards.
Definition: In Matplotlib, a figure is the overall window containing everything. Axes (or subplots) are individual plot areas inside the figure. You can have multiple axes in one figure.
Purpose: Organize plots hierarchically โ a figure contains one or more axes.
Why here: Understanding the figure/axes distinction is crucial for creating complex layouts. fig, (ax1, ax2) = plt.subplots(1, 2) creates a figure with 2 axes side by side (1 row, 2 columns).
Definition: A subplot (or sub-plot) is an individual plot area within a figure. plt.subplots(2, 1) creates a 2ร1 grid (2 rows, 1 column) with 2 axes.
Purpose: Create multiple layouts to compare several visualizations side by side or stacked.
Why here: Subplots allow showing multiple perspectives of data in one figure โ price + volume, distribution + boxplot, etc. It's more informative than a single chart.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import pandas as pd
import numpy as np
# โโ Global style โโ
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_theme(style='darkgrid', palette='husl')
COLORS = {
'primary': '#2196F3',
'success': '#4CAF50',
'danger': '#F44336',
'warning': '#FF9800',
'dark': '#212121'
}
# โโ Generate realistic financial data โโ
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=252, freq='B') # Business days
# Simulate prices with geometric Brownian motion
returns = np.random.normal(0.0003, 0.015, 252)
prices = 100 * np.exp(np.cumsum(returns))
df = pd.DataFrame({
'date': dates,
'close': prices,
'open': prices * (1 + np.random.normal(0, 0.003, 252)),
'volume': np.random.randint(500000, 3000000, 252),
'revenue': np.random.randint(50000, 200000, 252)
})
df['returns'] = df['close'].pct_change()
df['MA20'] = df['close'].rolling(20).mean()
df['MA50'] = df['close'].rolling(50).mean()
df = df.set_index('date')
Definition: Mathematical model for asset prices. Prices change proportionally to themselves, and returns (log-returns) are normally distributed. Formula: price = 100 * exp(cumsum(returns)).
Purpose: Simulate realistic asset prices with trend and realistic volatility.
Why here: This is the standard model used in finance (Black-Scholes model). More realistic than a simple random walk โ prices can't be negative, and volatility increases with price (GBM property).
# โโ Figure with 2 subplots (price + volume) โโ
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8),
gridspec_kw={'height_ratios': [3, 1]},
sharex=True)
fig.suptitle('Price Analysis โ 2023', fontsize=16, fontweight='bold', y=0.98)
# โโ Price with colored zones (positive/negative) โโ
ax1.plot(df.index, df['close'], color=COLORS['primary'], linewidth=1.5, label='Price', zorder=3)
ax1.plot(df.index, df['MA20'], color=COLORS['warning'], linewidth=1.2, label='MA20', linestyle='--')
ax1.plot(df.index, df['MA50'], color=COLORS['danger'], linewidth=1.2, label='MA50', linestyle='-.')
# Zone between MA20 and MA50 (crossover signal)
ax1.fill_between(df.index, df['MA20'], df['MA50'],
where=(df['MA20'] >= df['MA50']),
alpha=0.15, color=COLORS['success'], label='Bullish trend')
ax1.fill_between(df.index, df['MA20'], df['MA50'],
where=(df['MA20'] < df['MA50']),
alpha=0.15, color=COLORS['danger'], label='Bearish trend')
ax1.set_ylabel('Price ($)', fontsize=12)
ax1.legend(loc='upper left')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:.0f}'))
# โโ Volume in colored bars โโ
colors = [COLORS['success'] if r >= 0 else COLORS['danger'] for r in df['returns']]
ax2.bar(df.index, df['volume'] / 1e6, color=colors, alpha=0.7, width=0.8)
ax2.set_ylabel('Volume (M)', fontsize=10)
# Format dates on X axis
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
ax2.xaxis.set_major_locator(mdates.MonthLocator())
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('chart_financial.png', dpi=150, bbox_inches='tight')
plt.show()
sharex=True aligns the X axes. fill_between colors the zone between two lines based on a condition: green if MA20 > MA50 (bullish trend), red otherwise. Volume bars are also colored by daily return (green if positive, red if negative).fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# โโ Histogram + KDE โโ
returns_clean = df['returns'].dropna() * 100
sns.histplot(returns_clean, bins=40, kde=True, ax=axes[0],
color=COLORS['primary'], alpha=0.7)
# Reference lines for statistics
mean_r = returns_clean.mean()
std_r = returns_clean.std()
axes[0].axvline(mean_r, color='red', linestyle='--', label=f'Mean: {mean_r:.3f}%')
axes[0].axvline(mean_r - 2*std_r, color='orange', linestyle=':', label=f'VaR 95%: {mean_r-2*std_r:.2f}%')
axes[0].set_title('Distribution of Daily Returns', fontsize=13)
axes[0].set_xlabel('Return (%)')
axes[0].legend()
# โโ Monthly boxplot โโ
df_monthly = df['returns'].dropna() * 100
monthly_data = df_monthly.groupby(df_monthly.index.month).apply(list)
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
box_data = [monthly_data[m] for m in sorted(monthly_data.index)]
bp = axes[1].boxplot(box_data, patch_artist=True, labels=months[:len(box_data)])
for patch, median in zip(bp['boxes'], bp['medians']):
median_val = median.get_ydata()[0]
patch.set_facecolor(COLORS['success'] if median_val >= 0 else COLORS['danger'])
patch.set_alpha(0.7)
axes[1].axhline(y=0, color='black', linestyle='-', linewidth=0.8)
axes[1].set_title('Monthly Returns โ Boxplot', fontsize=13)
axes[1].set_ylabel('Return (%)')
plt.tight_layout()
plt.savefig('distribution_returns.png', dpi=150, bbox_inches='tight')
plt.show()
Definition: Non-parametric technique for estimating probability density. Instead of fixed bars (histogram), KDE smooths the distribution using Gaussian kernels around each data point.
Purpose: Visualize the smooth distribution of data without histogram noise.
Why here: KDE vs Histogram: histograms show discrete "bins" โ the result depends on bin count. KDE is smooth and bin-independent โ better for seeing the true distribution.
Definition: Chart that summarizes the distribution of a continuous variable. The box extends from Q1 (25th percentile) to Q3 (75th percentile), with a line in the middle (median/Q2). "Whiskers" extend to Q1-1.5รIQR and Q3+1.5รIQR, beyond which points are outliers.
Purpose: Visualize central tendency, spread, and outliers of a distribution.
Why here: Boxplot is compact and comparative โ ideal for showing multiple distributions side by side (here: by month). The 50% IQR quickly shows where 50% of data lies.
# Simulate a multi-asset portfolio
assets = ['AAPL', 'GOOG', 'MSFT', 'TSLA', 'AMZN', 'META', 'NVDA', 'BTC']
portfolio = pd.DataFrame(
np.random.randn(252, 8) @ np.linalg.cholesky(
np.clip(np.random.randn(8, 8), -1, 1).T @ np.random.randn(8, 8) / 8 + np.eye(8) * 0.5,
).T,
columns=assets
)
corr = portfolio.corr()
fig, ax = plt.subplots(figsize=(10, 8))
mask = np.triu(np.ones_like(corr, dtype=bool)) # Mask upper triangle
sns.heatmap(corr,
mask=mask,
annot=True, fmt='.2f',
cmap='RdYlGn', # Red (negative correlation) โ green (positive)
vmin=-1, vmax=1, center=0,
square=True,
linewidths=0.5,
ax=ax)
ax.set_title('Portfolio Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('correlation_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()
Definition: Chart displaying a 2D matrix with colors representing values. Colors typically range from blue (low) to red (high) or red (negative) to green (positive).
Purpose: Visualize patterns and magnitudes in 2D tabular data โ ideal for correlation matrices, cross-tabulations, etc.
Why here: An 8ร8 correlation matrix is hard to read in raw numbers. The RdYlGn heatmap immediately shows: red = negative correlation (assets inversely related), green = positive correlation (assets linked).
dpi=300 for print quality (sharp, readable even zoomed). For web and email, dpi=72 suffices and generates lighter files. The bbox_inches='tight' option removes white margins around the chart.