Skip to content
TUTORIALBeginner

Python for Data Analysis

Learn pandas, NumPy, and visualization for data roles

A practical introduction to Python data analysis with pandas and NumPy. Learn to load, clean, analyze, and visualize data — the core skills for data science and analytics roles.

25 min5 stepsUpdated 2026-01-30
Prerequisites:Basic Python familiarity (variables, loops, functions)

STEP-BY-STEP GUIDE

How to Python for Data Analysis

1

Set Up Your Python Data Environment

Install Python 3.10+ and run: pip install pandas numpy matplotlib seaborn jupyter. Use Jupyter Notebooks for interactive data exploration — each cell runs independently, ideal for iterative analysis. Alternatively, Google Colab gives you a free hosted Jupyter environment with no setup.

2

Load and Inspect Data with pandas

pandas DataFrames are the core data structure. Key operations:

import pandas as pd

# Load data
df = pd.read_csv("data.csv")

# Inspect
df.head()          # First 5 rows
df.shape           # (rows, columns)
df.dtypes          # Column types
df.describe()      # Summary statistics
df.isnull().sum()  # Missing value counts
3

Clean and Transform Data

Real data is messy. Common cleaning operations:

# Drop duplicates
df.drop_duplicates(inplace=True)

# Fill missing values
df['salary'].fillna(df['salary'].median(), inplace=True)

# Drop columns with too many nulls
df.dropna(axis=1, thresh=len(df)*0.7, inplace=True)

# Convert types
df['date'] = pd.to_datetime(df['date'])
df['salary'] = df['salary'].astype(float)
4

Analyze Data with groupby and pivot tables

The most powerful analysis patterns:

# Group by and aggregate
df.groupby('department')['salary'].agg(['mean', 'median', 'count'])

# Pivot table
pd.pivot_table(df,
  values='revenue',
  index='region',
  columns='product',
  aggfunc='sum',
  fill_value=0)
5

Visualize with matplotlib and seaborn

import matplotlib.pyplot as plt
import seaborn as sns

# Line chart
df.groupby('date')['revenue'].sum().plot()
plt.title("Revenue Over Time")
plt.show()

# Distribution
sns.histplot(df['salary'], bins=30)

# Correlation heatmap
sns.heatmap(df.select_dtypes('number').corr(),
  annot=True, cmap='coolwarm')

PRACTICE

Exercises

Load the Titanic dataset from Kaggle and calculate survival rates by passenger class.

Clean a messy CSV with missing values and duplicates. Document every transformation.

Build a pivot table analysis from sales data to show revenue by product and region.

Create a matplotlib visualization showing a trend over time from any open dataset.

Write a reusable data cleaning function that handles common issues in your domain.

CAREER IMPACT

Career Paths That Use This Skill

Career PathHow It's UsedSalary Range
Data AnalystPrimary tool for data manipulation and reporting$75K–$120K
Data ScientistFoundation for ML feature engineering and EDA$120K–$190K
ML EngineerData preprocessing and pipeline development$140K–$250K
Finance Analyst (AI)Financial data modeling and automation$90K–$150K

FAQ

Common Questions

What version of Python should I use?+
Python 3.10 or later. Install via python.org or use Anaconda for a data science-ready environment.
Is pandas enough for data science jobs?+
pandas + NumPy + matplotlib/seaborn covers the core data manipulation and visualization skills. For ML roles, add scikit-learn and PyTorch/TensorFlow.
How long to become proficient enough for a data analyst role?+
3-6 months of consistent practice on real datasets. Build 3-5 portfolio projects analyzing publicly available data relevant to your target industry.

Related Academy Tracks

Put this skill into action

Take our quiz to get your personalized learning path and start applying these skills immediately.

Find My Track

Ready to Apply? Use HireKit's Free Tools

AI-powered job search tools for Python for Data Analysis

hirekit.co — AI-powered job search platform