Exam: PCAD-31-02
Status: ACTIVE
The PCAD™-31-02 exam consists of single-select and multiple-select items designed to evaluate a candidate’s ability to acquire, process, analyze, model, and communicate data using Python and SQL.
Each item is worth a maximum of 1 point. After the exam is completed, the candidate’s raw score is normalized, and the final result is expressed as a percentage.
The exam is divided into five blocks, each covering a specific area of data analytics and core technical skills in programming and SQL querying.
The distribution of items and their corresponding weights reflects the relative emphasis placed on each area in professional practice.
The table below summarizes the distribution of exam items and their corresponding weight within the total exam.
Block Number | Block Name | Number of Items | Weight |
---|---|---|---|
1 | Data Acquisition and Pre-Processing | 14 | 29.2% |
2 | Programming and Database Skills | 16 | 33.3% |
3 | Statistical Analysis | 4 | 8.3% |
4 | Data Analysis and Modeling | 9 | 18.8% |
5 | Data Communication and Visualization | 5 | 10.4% |
48 | 100% |
Last updated: July 15, 2025
Aligned with Exam PCAD-31-02
14 objectives & 54 sub-objectives covered by the block → 14 exam items
Objective 1.1.1 – Explain and compare data collection methods and their use in research, business, and analytics.
Objective 1.1.2 – Aggregate data from multiple sources and integrate them into datasets.
Objective 1.1.3 – Explain various data storage solutions.
Objective 1.2.1 – Understand structured and unstructured data and their implications in data analysis.
Objective 1.2.2 – Identify, rectify, or remove erroneous data.
Objective 1.2.3 – Understand data normalization and scaling.
Objective 1.2.4 – Apply data cleaning and standardization techniques.
Objective 1.3.1 – Execute and understand basic data validation methods.
Objective 1.3.2 – Establish and maintain data integrity through clear validation rules.
Objective 1.4.1 – Understand file formats in data acquisition.
Objective 1.4.2 – Access, manage, and effectively utilize datasets.
Objective 1.4.3 – Extract data from various sources.
Objective 1.4.4 – Apply spreadsheet best practices for readability and formatting.
Objective 1.4.5 – Prepare, adapt, and pre-process data for analysis.
16 objectives & 39 sub-objectives covered by the block → 16 exam items
Objective 2.1.1 – Apply Python syntax and control structures to solve data-related problems.
Objective 2.1.2 – Analyze and create Python functions.
Objective 2.1.3 – Evaluate and navigate the Python Data Science ecosystem.
Objective 2.1.4 – Organize and manipulate data using Python's core data structures.
Objective 2.1.5 – Explain and implement Python scripting best practices.
Objective 2.2.1 – Import modules and manage Python packages using PIP.
Objective 2.2.2 – Apply basic exception handling and maintain script robustness.
Objective 2.3.1 – Apply basic object-oriented programming to structure and model data.
Objective 2.3.2 – Apply object-oriented patterns to enhance code reuse and clarity in analysis workflows.
Objective 2.3.3 – Manage object identity and comparisons in data pipelines.
Objective 2.4.1 – Perform SQL queries to retrieve and manipulate data.
Objective 2.4.2 – Execute fundamental SQL commands to create, read, update, and delete data in database tables.
Objective 2.4.3 – Establish connections to databases using Python.
Objective 2.4.4 – Execute parameterized SQL queries through Python to safely interact with databases.
Objective 2.4.5 – Understand, manage, and convert SQL data types appropriately within Python scripts.
Objective 2.4.6 – Understand essential database security concepts, including strategies to prevent SQL query injection.
4 objectives & 17 sub-objectives covered by the block → 4 exam items
Objective 3.1.1 – Understand and apply statistical measures in data analysis.
Objective 3.1.2 – Analyze and evaluate data relationships.
Objective 3.2.1 – Understand and apply bootstrapping for sampling distributions.
Objective 3.2.2 – Explain when and how to use linear and logistic regression, including appropriateness and limitations.
7 objectives & 19 sub-objectives covered by the block → 7 exam items
Objective 4.1.1 – Organize and clean data using Pandas.
Objective 4.1.2 – Merge and reshape datasets using Pandas.
Objective 4.1.3 – Understand the relationship between Series and DataFrames.
Objective 4.1.4 – Access and manipulate data using locators and slicing.
Objective 4.1.5 – Perform array operations and distinguish between core data structures.
Objective 4.1.6 – Group, summarize, and extract insights from data.
Objective 4.2.1 – Apply Python's descriptive statistics for dataset analysis.
Objective 4.2.2 – Recognize the importance of test datasets in model evaluation.
Objective 4.2.3 – Analyze and evaluate supervised learning algorithms and model accuracy.
5 objectives & 25 sub-objectives covered by the block → 5 exam items
Objective 5.1.1 – Demonstrate essential proficiency in data visualization with Matplotlib and Seaborn.
Objective 5.1.2 – Assess the pros and cons of different data representations.
Objective 5.1.3 – Label, annotate, and refine data visualizations for clarity and insight.
Objective 5.2.1 – Tailor communication to different audience needs, and combine visualizations and text for clear data presentation.
Objective 5.2.2 – Summarize key findings and support claims with evidence and reasoning.
Download PCAD-31-02 Exam Syllabus in PDF
A Minimally Qualified Candidate (MQC) for the PCAD™ – Certified Associate Data Analyst with Python certification is expected to demonstrate the essential skills and knowledge required to support junior-level data analysis tasks in professional settings.
The candidate should understand how to acquire, clean, prepare, analyze, model, and communicate data using Python, SQL, and widely used data tools and libraries. They must be able to connect to data sources, including databases, spreadsheets, APIs, and HTML web pages, and collect relevant data using tools and libraries such as requests and BeautifulSoup.
The MQC is proficient in writing and organizing Python scripts that utilize variables, functions, control flow, and data structures like lists, dictionaries, and sets. They apply best practices in Python programming, including documentation, error handling, and modular design, and are able to manage packages using pip.
They can use libraries such as pandas, numpy, and statistics to clean, reshape, and analyze structured datasets, as well as calculate descriptive statistics, correlations, and basic aggregations. They should also be able to execute SQL queries to retrieve and manipulate data and use sqlite3 to connect Python scripts to databases. They understand how to use parameterized queries to ensure data integrity and avoid SQL injection.
The candidate can perform basic statistical modeling, including linear and logistic regression, and apply inferential techniques such as bootstrapping. They should recognize the importance of model validation, test data splitting, and the risks of overfitting.
Finally, the MQC is able to create clear, insightful, and audience-appropriate visualizations using Matplotlib and Seaborn, structure effective data stories, and present findings in both written and verbal formats, using design and communication best practices.
This profile represents a blend of technical proficiency, analytical thinking, and communication skills crucial for navigating the complexities of data-driven environments.
Weight: 29.2% of total exam (14 items)
The MQC understands what data is, how it is structured, and how it is transformed into usable information for analysis. They can describe different data types – including structured, semi-structured, and unstructured – and explain how these formats influence storage, processing, and analytical techniques. The MQC is able to explain and compare various data collection methods such as surveys, interviews, APIs, and web scraping using tools like BeautifulSoup, and understands how these methods are used in research, business, and analytics contexts.
They can identify appropriate storage options for different data types, including CSV, JSON, Excel, databases, data lakes, and warehouses, and explain the role of cloud-based storage in modern data ecosystems. They recognize how poor data collection or storage practices can lead to quality issues and errors later in the analytics process.
The MQC is capable of integrating data from multiple sources and resolving inconsistencies during aggregation. They understand the implications of format alignment, type mismatches, and schema discrepancies. They can apply basic data cleaning techniques, including identifying and correcting missing, duplicate, or invalid values. They understand the importance of encoding categorical data, scaling numerical values, and formatting date-time values for consistency.
They can apply basic validation methods (such as type checks, range checks, and cross-reference logic) to ensure data quality and integrity. They are also able to prepare data for analysis by sorting, filtering, reshaping (wide vs. long), and splitting data into training and testing sets, particularly in preparation for modeling tasks. The MQC also understands ethical and legal responsibilities when working with personal data, including anonymization, consent, and compliance with frameworks such as GDPR and HIPAA.
Weight: 33.3% of total exam (16 items)
The MQC is proficient in Python and uses the language to support data processing tasks. They can define and manipulate variables, use core data types (integers, floats, strings, booleans), and work with fundamental data structures such as lists, dictionaries, tuples, and sets. They write clear and reusable code using functions with parameters and return values, and they apply control flow constructs such as conditional statements and loops to process and analyze data efficiently.
They demonstrate awareness of clean coding practices, including proper indentation, naming conventions, and documentation using PEP 8 and PEP 257 standards. They are familiar with Python’s standard library modules – such as csv, os, math, statistics, datetime, and collections – and know how to install and manage third-party packages using pip.
The MQC is comfortable using object-oriented programming (OOP) concepts to structure and encapsulate data. They can define classes, create objects, and organize attributes and methods in a way that supports reusability and clarity in data workflows.
In addition to Python, the MQC is expected to use SQL to query and manipulate structured data. They can retrieve and filter data using SELECT, WHERE, and various types of JOIN clauses, and they are capable of aggregating and grouping data using GROUP BY, HAVING, and ORDER BY. They understand how to perform CRUD operations and can write SQL statements to insert, update, and delete data.
They can connect Python scripts to relational databases using libraries such as sqlite3, and they understand how to execute parameterized queries to protect against SQL injection and ensure data integrity. The MQC also understands how to convert data types appropriately between SQL and Python when extracting or inserting data.
Weight: 8.3% of total exam (4 items)
The MQC has a solid grasp of foundational statistical concepts and can apply descriptive statistics to summarize datasets. They understand and can calculate measures of central tendency (mean, median, mode) and dispersion (standard deviation, variance), and they can describe basic distribution types, such as normal and uniform distributions.
They can assess relationships between variables using correlation analysis, particularly Pearson’s R, and can identify outliers both visually and statistically. The MQC understands that visualizations like histograms, boxplots, and scatterplots support exploratory understanding of data distributions and trends.
In terms of inferential statistics, the MQC is familiar with bootstrapping as a method for estimating sampling distributions, especially when theoretical distributions are unknown. They understand the difference between discrete and continuous data and recognize when bootstrapping is appropriate for evaluating reliability.
They can explain and apply linear and logistic regression, understand the assumptions behind these models, and interpret their outputs, including coefficients and model fit statistics. The MQC is also aware of common limitations, such as overfitting, and can discuss the importance of model validation.
Weight: 18.8% of total exam (9 items)
The MQC is proficient in using Pandas and NumPy for data analysis. They can clean and organize data using functions like dropna(), fillna(), sort_values(), and replace(), and they can reshape and restructure data using methods such as pivot(), melt(), groupby(), and merge(). They understand the difference between DataFrame and Series objects and can use .loc and .iloc to access and modify data precisely.
Using NumPy, the MQC performs numerical operations, including element-wise calculations, aggregations, and array broadcasting. They understand the performance advantages of NumPy arrays over native Python lists for large datasets.
They can compute descriptive summaries and perform basic feature engineering tasks such as bucketing, scaling, and encoding in preparation for modeling. The MQC is capable of analyzing datasets with grouped summaries and conditional filters, using combinations of Pandas and NumPy functionality to extract insights.
They understand the structure and purpose of supervised learning workflows and can apply train/test splits to evaluate models. They are able to fit and interpret basic linear and logistic regression models and understand key modeling concepts such as accuracy, overfitting, underfitting, and the bias-variance tradeoff.
Weight: 10.4% of total exam (5 items)
The MQC is able to interpret and create effective visual representations of data using Matplotlib and Seaborn. They understand how to choose appropriate chart types based on data type and purpose, and can generate bar charts, histograms, scatterplots, boxplots, line charts, and correlation heatmaps to highlight trends and patterns.
They know how to enhance chart readability by adding labels, titles, legends, gridlines, and appropriate color schemes. They can annotate graphs to draw attention to key insights and customize chart aesthetics to improve clarity.
The MQC understands how to communicate findings to different audiences by combining visuals with concise and informative summaries. They can adapt their messaging to suit both technical stakeholders (e.g., data teams) and non-technical audiences (e.g., managers, clients).
They are able to structure a data narrative that connects analysis to business or research questions, ensuring that conclusions are supported by data and that recommendations are evidence-based. They demonstrate awareness of design principles for slide decks and written reports, and avoid clutter, ambiguity, and misleading visuals.
To pass the PCAD exam, a candidate must achieve a cumulative average score of at least 75% across all exam blocks.