Python for Data Analysis
Learn pandas, NumPy, and visualization for data roles
A practical introduction to Python data analysis with pandas and NumPy. Learn to load, clean, analyze, and visualize data — the core skills for data science and analytics roles.
STEP-BY-STEP GUIDE
How to Python for Data Analysis
Set Up Your Python Data Environment
Install Python 3.10+ and run: pip install pandas numpy matplotlib seaborn jupyter. Use Jupyter Notebooks for interactive data exploration — each cell runs independently, ideal for iterative analysis. Alternatively, Google Colab gives you a free hosted Jupyter environment with no setup.
Load and Inspect Data with pandas
pandas DataFrames are the core data structure. Key operations:
import pandas as pd
# Load data
df = pd.read_csv("data.csv")
# Inspect
df.head() # First 5 rows
df.shape # (rows, columns)
df.dtypes # Column types
df.describe() # Summary statistics
df.isnull().sum() # Missing value countsClean and Transform Data
Real data is messy. Common cleaning operations:
# Drop duplicates df.drop_duplicates(inplace=True) # Fill missing values df['salary'].fillna(df['salary'].median(), inplace=True) # Drop columns with too many nulls df.dropna(axis=1, thresh=len(df)*0.7, inplace=True) # Convert types df['date'] = pd.to_datetime(df['date']) df['salary'] = df['salary'].astype(float)
Analyze Data with groupby and pivot tables
The most powerful analysis patterns:
# Group by and aggregate
df.groupby('department')['salary'].agg(['mean', 'median', 'count'])
# Pivot table
pd.pivot_table(df,
values='revenue',
index='region',
columns='product',
aggfunc='sum',
fill_value=0)Visualize with matplotlib and seaborn
import matplotlib.pyplot as plt
import seaborn as sns
# Line chart
df.groupby('date')['revenue'].sum().plot()
plt.title("Revenue Over Time")
plt.show()
# Distribution
sns.histplot(df['salary'], bins=30)
# Correlation heatmap
sns.heatmap(df.select_dtypes('number').corr(),
annot=True, cmap='coolwarm')PRACTICE
Exercises
Load the Titanic dataset from Kaggle and calculate survival rates by passenger class.
Clean a messy CSV with missing values and duplicates. Document every transformation.
Build a pivot table analysis from sales data to show revenue by product and region.
Create a matplotlib visualization showing a trend over time from any open dataset.
Write a reusable data cleaning function that handles common issues in your domain.
CAREER IMPACT
Career Paths That Use This Skill
| Career Path | How It's Used | Salary Range |
|---|---|---|
| Data Analyst | Primary tool for data manipulation and reporting | $75K–$120K |
| Data Scientist | Foundation for ML feature engineering and EDA | $120K–$190K |
| ML Engineer | Data preprocessing and pipeline development | $140K–$250K |
| Finance Analyst (AI) | Financial data modeling and automation | $90K–$150K |
FAQ
Common Questions
What version of Python should I use?+
Is pandas enough for data science jobs?+
How long to become proficient enough for a data analyst role?+
Related Academy Tracks
Put this skill into action
Take our quiz to get your personalized learning path and start applying these skills immediately.
Find My TrackReady to Apply? Use HireKit's Free Tools
AI-powered job search tools for Python for Data Analysis
ATS Resume Checker
Apply what you've learned — check your resume for free
Explore HireKit
AI-powered job search tools to accelerate your career
hirekit.co — AI-powered job search platform