This cheat sheet is designed to help you quickly reference important SQL commands, best practices, and analytical techniques specific to the role of a SQL Data Analyst. It covers everything from querying data to advanced SQL functions and real-world data analysis scenarios. Whether you’re preparing for an interview or enhancing your day-to-day skills, this resource will serve as a handy guide.
1. SQL Query Basics
- SELECT: Used to retrieve data from a table.
SELECT column1, column2
FROM table_name;
- WHERE: Filters records that meet specific conditions.
SELECT *
FROM employees
WHERE age > 30;
- ORDER BY: Sorts the result set in ascending or descending order.
SELECT *
FROM products
ORDER BY price DESC;
- LIMIT: Limits the number of records returned.
SELECT *
FROM customers
LIMIT 10;
2. Aggregate Functions
- COUNT(): Returns the number of rows.
SELECT COUNT(*)
FROM orders;
- SUM(): Adds up the values in a numeric column.
SELECT SUM(total_sales)
FROM sales;
- AVG(): Returns the average value of a numeric column.
SELECT AVG(salary)
FROM employees;
- MIN() / MAX(): Finds the minimum or maximum value.
SELECT MIN(salary), MAX(salary)
FROM employees;
3. Data Filtering & Conditions
- AND / OR: Combines multiple conditions.
SELECT *
FROM employees
WHERE age > 25 AND department = ‘IT’;
- IN: Matches values within a list.
SELECT *
FROM customers
WHERE country IN (‘USA’, ‘UK’, ‘Canada’);
- BETWEEN: Filters values within a range.
SELECT *
FROM orders
WHERE order_date BETWEEN ‘2023-01-01’ AND ‘2023-01-31’;
- LIKE: Searches for a specified pattern.
SELECT *
FROM products
WHERE product_name LIKE ‘A%’;
4. JOINS
- INNER JOIN: Returns only matching records from both tables.
SELECT orders.order_id, customers.customer_name
FROM orders
INNER JOIN customers
ON orders.customer_id = customers.customer_id;
- LEFT JOIN: Returns all records from the left table and matching records from the right table.
SELECT orders.order_id, customers.customer_name
FROM orders
LEFT JOIN customers
ON orders.customer_id = customers.customer_id;
- RIGHT JOIN: Returns all records from the right table and matching records from the left table.
SELECT employees.name, departments.department_name
FROM employees
RIGHT JOIN departments
ON employees.department_id = departments.department_id;
- FULL JOIN: Returns all records when there is a match in either table.
SELECT employees.name, projects.project_name
FROM employees
FULL JOIN projects
ON employees.project_id = projects.project_id;
5. Subqueries
- Single-Row Subquery: Used when the subquery returns a single value.
SELECT name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
- Multiple-Row Subquery: When the subquery returns multiple rows.
SELECT product_name
FROM products
WHERE category_id IN (SELECT category_id FROM categories WHERE category_name = ‘Electronics’);
6. Grouping & Aggregating
- GROUP BY: Groups rows sharing the same values.
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
- HAVING: Filters groups after aggregation.
SELECT department, COUNT() FROM employees GROUP BY department HAVING COUNT() > 10;
7. Window Functions
- ROW_NUMBER(): Assigns a unique number to rows within a result set.
SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC)
FROM employees;
- RANK() / DENSE_RANK(): Assigns a rank to rows within a partition.
SELECT name, RANK() OVER (ORDER BY salary DESC)
FROM employees;
- LEAD() / LAG(): Accesses data from the subsequent or preceding row.
SELECT name, salary, LAG(salary, 1) OVER (ORDER BY salary)
FROM employees;
- SUM() OVER(): Performs cumulative calculations.
SELECT department, SUM(salary) OVER (PARTITION BY department)
FROM employees;
8. Data Modification Commands
- INSERT INTO: Adds new records to a table.
INSERT INTO employees (name, department, salary)
VALUES (‘John Doe’, ‘Finance’, 60000);
- UPDATE: Modifies existing records in a table.
UPDATE employees
SET salary = salary * 1.05
WHERE department = ‘IT’;
- DELETE: Removes records from a table.
DELETE FROM employees
WHERE department = ‘HR’;
9. Views and Stored Procedures
- Creating a View: Stores a saved query as a virtual table.
CREATE VIEW high_earners AS
SELECT name, salary
FROM employees
WHERE salary > 80000;
- Stored Procedure: Encapsulates SQL logic for reuse.
CREATE PROCEDURE GetEmployeeCountByDepartment
AS
BEGIN
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
END;
10. Indexing and Optimization
- Creating an Index: Speeds up the retrieval of data.
CREATE INDEX idx_employee_name
ON employees (name);
Query Optimization Tips:
- Avoid
SELECT *
; always specify the needed columns. - Use EXPLAIN or EXPLAIN ANALYZE to view the query execution plan.
- Ensure the proper use of indexes for faster querying.
- Avoid complex joins and subqueries when simple queries suffice.
11. Handling NULL Values
- IS NULL / IS NOT NULL: Checks for NULL values.
SELECT *
FROM employees
WHERE department IS NULL;
- COALESCE(): Returns the first non-NULL value.
SELECT name, COALESCE(salary, 0) AS salary
FROM employees;
- NULLIF(): Returns NULL if two expressions are equal.
SELECT NULLIF(salary, 0)
FROM employees;
12. Common SQL Interview Questions Cheat Sheet
- What is the difference between INNER JOIN and LEFT JOIN?
- INNER JOIN returns only matching rows between two tables, while LEFT JOIN returns all rows from the left table and the matched rows from the right table. If no match exists, NULLs are returned from the right table.
- How do you optimize a slow SQL query?
- Use indexes, avoid
SELECT *
, optimize joins, and analyze execution plans with EXPLAIN.
- Use indexes, avoid
- What is the difference between DELETE and TRUNCATE?
- DELETE removes rows based on conditions and can be rolled back. TRUNCATE removes all rows from a table and is faster but cannot be rolled back.
- How do you handle duplicate rows in SQL?
- Use the DISTINCT keyword or identify duplicates with GROUP BY and HAVING COUNT() > 1 to isolate and remove them.
- What are window functions, and when would you use them?
- Window functions allow you to perform calculations across a set of rows while retaining individual rows. Use them for running totals, rankings, and comparisons.
13. Real-World Data Analysis Scenarios
- Sales Analysis:
- Identify top-selling products using GROUP BY and SUM():
SELECT product_name, SUM(quantity_sold) AS total_sold
FROM sales
GROUP BY product_name
ORDER BY total_sold DESC;
Customer Segmentation:
- Segment customers by total spend:
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING total_spent > 1000;
Churn Prediction:
- Identify customers who haven’t made a purchase in over a year:
SELECT customer_id, MAX(order_date) AS last_order
FROM orders
GROUP BY customer_id
HAVING last_order < CURRENT_DATE – INTERVAL ‘1 YEAR’;
14. Best Practices for SQL Data Analysts
- Write Clean Queries:
- Use clear aliases, indentation, and comments to make your SQL queries readable and maintainable.
- Validate Data Early:
- Always check for NULL values, duplicates, and incorrect formats before performing analysis.
- Avoid Complex Queries When Possible:
- Break down large queries into smaller, modular parts using CTEs (Common Table Expressions) or views.
- Automate Recurring Tasks:
- Use stored procedures or scheduled tasks to automate frequent reports and data transformations.
This SQL Data Analyst cheat sheet provides essential SQL commands, tips, and real-world examples to help streamline your workflow and enhance your analytical capabilities. Keep this guide handy for quick reference during interviews, on-the-job tasks, or when learning new concepts!