PCAD™ – Certified Associate Data Analyst with Python: EXAM SYLLABUS

Exam: PCAD-31-02
Status: ACTIVE


The PCAD™-31-02 exam consists of single-select and multiple-select items designed to evaluate a candidate’s ability to acquire, process, analyze, model, and communicate data using Python and SQL.

Each item is worth a maximum of 1 point. After the exam is completed, the candidate’s raw score is normalized, and the final result is expressed as a percentage.

The exam is divided into five blocks, each covering a specific area of data analytics and core technical skills in programming and SQL querying.

The distribution of items and their corresponding weights reflects the relative emphasis placed on each area in professional practice.

PCAD-31-02 badge

The table below summarizes the distribution of exam items and their corresponding weight within the total exam.

Block Number Block Name Number of Items Weight
1 Data Acquisition and Pre-Processing 14 29.2%
2 Programming and Database Skills 16 33.3%
3 Statistical Analysis 4 8.3%
4 Data Analysis and Modeling 9 18.8%
5 Data Communication and Visualization 5 10.4%
48 100%

Exam Syllabus

Last updated: July 15, 2025
Aligned with Exam PCAD-31-02


Exam Syllabus Contents


Block 1: Data Acquisition and Pre-Processing

14 objectives & 54 sub-objectives covered by the block → 14 exam items

Data Collection, Integration, and Storage

Objective 1.1.1 – Explain and compare data collection methods and their use in research, business, and analytics.

  1. Explore different techniques: Surveys, interviews, web scraping.
  2. Discuss representative sampling, challenges in data collection, and differences between qualitative and quantitative research.
  3. Examine legal and ethical considerations in data collection.
  4. Explain the importance of data anonymization in maintaining privacy and confidentiality, particularly with personally identifiable information (PII).
  5. Investigate the impact of data collection on business strategy formation, market research accuracy, risk assessment, policy-making, and business decisions.
  6. Explain the process and methodologies of data collection, including survey design, audience selection, and structured interviews.

Objective 1.1.2 – Aggregate data from multiple sources and integrate them into datasets.

  1. Explain techniques for combining data from various sources, such as databases, APIs, and file-based storage.
  2. Address challenges in data aggregation, including data format disparities and alignment issues.
  3. Understand the importance of data consistency and accuracy in aggregated datasets.

Objective 1.1.3 – Explain various data storage solutions.

  1. Understand various data storage methods and their appropriate applications.
  2. Distinguish between the concepts of data warehouses, data lakes, and file-based storage options like CSV and Excel.
  3. Explain the concepts of cloud storage solutions and their growing role in data management.

Data Cleaning and Standardization

Objective 1.2.1 – Understand structured and unstructured data and their implications in data analysis.

  1. Recognize the characteristics of structured data, such as databases and spreadsheets, and their straightforward use in analysis.
  2. Understand unstructured data, including text, images, and videos, and the additional processing required for analysis.
  3. Explore how the data structure impacts data storage, retrieval, and analytical methods.

Objective 1.2.2 – Identify, rectify, or remove erroneous data.

  1. Identify data errors and inconsistencies through various diagnostic methods.
  2. Address missing, inaccurate, or misleading information.
  3. Tackle specific data quality issues: numerical data problems, duplicate records, invalid data entries, and missing values.
  4. Explain different types of missingness (MCAR, MAR, MNAR), and their implications for data analysis.
  5. Explore various techniques for dealing with missing data, including data imputation methods.
  6. Understand the implications of data correction or removal on overall data integrity and analysis outcomes.
  7. Explain the importance of data collection in the context of outlier detection.
  8. Explain why high-quality data is crucial for accurate outlier detection.
  9. Explain how different data types (numerical, categorical) may influence outlier detection strategies.

Objective 1.2.3 – Understand data normalization and scaling.

  1. Understand the necessity of data normalization to bring different variables onto a similar scale for comparative analysis.
  2. Understand various scaling methods like Min-Max scaling and Z-score normalization.
  3. Explain encoding categorical variables for quantitative analysis, including one-hot encoding and label encoding methods.
  4. Explain the pros and cons of data reduction (reduce the number of variables under consideration or simplify the models vs loss of data explainability).
  5. Explain methods for handling outliers, including detection and treatment techniques to ensure data quality.
  6. Understand the importance of data format standardization across different datasets for consistency, especially when dealing with date-time formats and numerical values.

Objective 1.2.4 – Apply data cleaning and standardization techniques.

  1. Perform data imputation techniques, string manipulation, data format standardization, boolean normalization, string case normalization, and string-to-number conversions.
  2. Discuss the pros and cons of imputation vs. exclusion and their impact on the reliability and validity of the analysis.
  3. Explain the concept of One-Hot Encoding and its application in transforming categorical variables into a binary format, and preparing data for machine learning algorithms.
  4. Explain the concept of bucketization and its application in transforming continuous variables into categorical variables.

Data Validation and Integrity

Objective 1.3.1 – Execute and understand basic data validation methods.

  1. Define "validation" (type, range, cross-field) and match them to tools (Python logic, schema checks).
  2. Perform type, range, and cross-reference checks.
  3. Explain the benefit of early type checks in ingestion scripts.

Objective 1.3.2 – Establish and maintain data integrity through clear validation rules.

  1. Understand the concept of data integrity and its importance in maintaining reliable and accurate databases.
  2. Apply clear validation rules that enforce the correctness and consistency of data.

Data Preparation Techniques

Objective 1.4.1 – Understand file formats in data acquisition.

  1. Explain the roles and characteristics of common data file formats: CSV for tabular data, JSON for structured data, XML for hierarchically organized data, and TXT for unstructured text.
  2. Understand basic methods for importing and exporting these file types in data analysis tools, focusing on practical applications.

Objective 1.4.2 – Access, manage, and effectively utilize datasets.

  1. Understand the basics of accessing datasets from various sources like local files, databases, and online repositories.
  2. Understand the principles of data management, including organizing, sorting, and filtering data in preparation for analysis.

Objective 1.4.3 – Extract data from various sources.

  1. Explain fundamental techniques for extracting data from various sources, emphasizing methods to retrieve and collate data from databases, APIs, and online services.
  2. Extract data from HTML using Python tools and libraries (BeautifulSoup, requests).
  3. Understand basic challenges and considerations in data extraction, such as data compatibility and integrity.
  4. Discuss ethical web scraping practices, including respect for robots.txt and rate-limiting.

Objective 1.4.4 – Apply spreadsheet best practices for readability and formatting.

  1. Improve the readability and usability of data in spreadsheets, focusing on layout adjustments, formatting best practices, and basic formula applications.

Objective 1.4.5 – Prepare, adapt, and pre-process data for analysis.

  1. Understand the importance of the surrounding context, objectives, and stakeholder expectations to guide the preparation steps.
  2. Understand basic concepts of data pre-processing, including sorting, filtering, and preparing data sets for analytical work.
  3. Discuss the importance of proper data formatting for analysis, such as ensuring consistency in date-time formats and aligning data structures.
  4. Introduce concepts of dataset structuring, including the basics of transforming data into a format suitable for analysis (e.g., wide vs. long formats).
  5. Explain the concept of splitting data into training and testing sets, particularly for machine learning projects, emphasizing the importance of this step for model validation.
  6. Understand the impact of outlier management on data quality in preprocessing.

Block 2: Programming and Database Skills

16 objectives & 39 sub-objectives covered by the block → 16 exam items

Core Python Proficiency

Objective 2.1.1 – Apply Python syntax and control structures to solve data-related problems.

  1. Accurately use basic Python syntax for variables, scopes, and data types.
  2. Implement control structures like loops and conditionals to manage data flow.

Objective 2.1.2 – Analyze and create Python functions.

  1. Design functions with clear purpose, using both indexed and keyword arguments.
  2. Differentiate between optional and required arguments and apply them effectively.

Objective 2.1.3 – Evaluate and navigate the Python Data Science ecosystem.

  1. Identify key Python libraries and tools essential for data science tasks.
  2. Critically assess the suitability of various Python resources for different data analysis scenarios.

Objective 2.1.4 – Organize and manipulate data using Python's core data structures.

  1. Effectively use tuples, sets, lists, dictionaries, and strings for data organization and manipulation.
  2. Solve complex data handling tasks by choosing appropriate data structures.

Objective 2.1.5 – Explain and implement Python scripting best practices.

  1. Understand and apply PEP 8 guidelines for Python coding style.
  2. Comprehend and utilize PEP 257 for effective docstring conventions to enhance code documentation.

Module Management and Exception Handling

Objective 2.2.1 – Import modules and manage Python packages using PIP.

  1. Apply different types of module imports (standard imports, selective imports, aliasing).
  2. Understand importing modules from different sources (Python Standard Library, via package managers like PIP, and from locally developed modules/packages).
  3. Identify and import necessary Python modules for specific tasks, understanding the functionality and purpose of each.
  4. Demonstrate proficiency in managing Python packages using PIP, including installing, updating, and removing packages.

Objective 2.2.2 – Apply basic exception handling and maintain script robustness.

  1. Implement basic exception handling techniques to manage and respond to errors in Python scripts.
  2. Predict common errors in Python code and develop strategies to handle them effectively.
  3. Interpret error messages to diagnose and resolve issues, enhancing the robustness and reliability of Python scripts.

Object-Oriented Programming for Data Modeling

Objective 2.3.1 – Apply basic object-oriented programming to structure and model data.

  1. Define and instantiate classes that represent structured data records, including constructors and instance variables.
  2. Organize attributes and behaviors within objects using constructors and instance methods.
  3. Apply encapsulation principles by using naming conventions (e.g., _protected, __private) and method-based access (getters and setters) to manage internal object state and support clean design.

Objective 2.3.2 – Apply object-oriented patterns to enhance code reuse and clarity in analysis workflows.

  1. Use composition to group related data models (e.g., nesting a User object inside a Response object).
  2. Extend base classes using inheritance and override methods for specialized behavior (e.g., multiple exporter classes).
  3. Demonstrate polymorphism by calling the same method (e.g., .process(), .export()) on different subclasses within a data workflow.

Objective 2.3.3 – Manage object identity and comparisons in data pipelines.

  1. Use reference variables and understand shared vs independent object behavior (e.g., mutation of lists inside objects).
  2. Compare object content using ==, is, and implement custom equality with __eq__().

SQL for Data Analysts

Objective 2.4.1 – Perform SQL queries to retrieve and manipulate data.

  1. Compose and execute SQL queries to extract data from database tables.
  2. Apply SQL functions and clauses to manipulate and filter data effectively.
  3. Construct and execute SQL queries using SELECT, FROM, JOINS (INNER, LEFT, RIGHT, FULL), WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT.
  4. Analyze data retrieval needs and apply appropriate clauses from the SFJWGHOL set to meet those requirements effectively.

Objective 2.4.2 – Execute fundamental SQL commands to create, read, update, and delete data in database tables.

  1. Demonstrate the ability to use CRUD operations (Create, Read, Update, Delete) in SQL.
  2. Construct SQL statements for data insertion, retrieval, updating, and deletion.

Objective 2.4.3 – Establish connections to databases using Python.

  1. Understand and implement methods to establish database connections using Python libraries (e.g., sqlite3, pymysql).
  2. Analyze and resolve common issues encountered while connecting Python scripts to databases.

Objective 2.4.4 – Execute parameterized SQL queries through Python to safely interact with databases.

  1. Develop and execute parameterized SQL queries in Python to interact with databases securely.
  2. Evaluate the advantages of parameterized queries in preventing SQL injection and maintaining data integrity.

Objective 2.4.5 – Understand, manage, and convert SQL data types appropriately within Python scripts.

  1. Identify and understand various SQL data types and their counterparts in Python.
  2. Practice converting data types appropriately when transferring data between SQL databases and Python scripts.

Objective 2.4.6 – Understand essential database security concepts, including strategies to prevent SQL query injection.

  1. Comprehend fundamental database security principles, including measures to prevent SQL injection attacks.
  2. Assess and apply strategies for writing secure SQL queries within Python environments.

Block 3: Statistical Analysis

4 objectives & 17 sub-objectives covered by the block → 4 exam items

Descriptive Statistics

Objective 3.1.1 – Understand and apply statistical measures in data analysis.

  1. Understand and describe measures of central tendency and spread.
  2. Identify fundamental statistical distributions (Gaussian, Uniform) and interpret their trends in various contexts (over time, univariate, bivariate, multivariate).
  3. Apply confidence measures in statistical calculations to assess data reliability.

Objective 3.1.2 – Analyze and evaluate data relationships.

  1. Analyze datasets to identify outliers and evaluate negative and positive correlations using Pearson’s R coefficient.
  2. Interpret and critically assess information presented in various types of plots and graphs, including Boxplots, Histograms, Scatterplots, Lineplots, and Correlation heatmaps.

Inferential Statistics

Objective 3.2.1 – Understand and apply bootstrapping for sampling distributions.

  1. Understand the theoretical basis and statistical principles underlying bootstrapping.
  2. Differentiate between discrete and continuous data types in the context of bootstrapping.
  3. Recognize situations and data types where bootstrapping is an effective method for estimating sampling distributions.
  4. Demonstrate proficiency in applying bootstrapping methods using Python to generate and analyze sampling distributions.
  5. Analyze the reliability and validity of results obtained from bootstrapping in various statistical scenarios.

Objective 3.2.2 – Explain when and how to use linear and logistic regression, including appropriateness and limitations.

  1. Comprehend the theory, assumptions, and mathematical foundation of linear regression.
  2. Explain the concepts, use cases, and statistical underpinnings of logistic regression.
  3. Develop the ability to choose between linear and logistic regression based on the nature of the data and the research question.
  4. Apply the concepts of discrete and continuous data in choosing and implementing linear and logistic regression models.
  5. Demonstrate the application of linear and logistic regression models on datasets using Python, including parameter estimation and model fitting.
  6. Accurately interpret the outcomes of regression analyses, including coefficients and model fit statistics.
  7. Identify limitations, assumptions, and potential biases in linear and logistic regression models and their impact on results.

Block 4: Data Analysis and Modeling

7 objectives & 19 sub-objectives covered by the block → 7 exam items

Data Analysis with Pandas and NumPy

Objective 4.1.1 – Organize and clean data using Pandas.

  1. Use Pandas to filter, sort, and manage missing or inconsistent values in tabular datasets.
  2. Prepare raw data for analysis by applying foundational data cleaning techniques.

Objective 4.1.2 – Merge and reshape datasets using Pandas.

  1. Apply advanced data manipulation techniques such as merging, joining, pivoting, and reshaping data frames.
  2. Structure datasets appropriately to support specific analysis workflows.

Objective 4.1.3 – Understand the relationship between Series and DataFrames.

  1. Explain the conceptual differences and connections between Pandas Series and DataFrames.
  2. Use indexing techniques and vectorized functions to navigate and transform data.

Objective 4.1.4 – Access and manipulate data using locators and slicing.

  1. Retrieve and modify data accurately using .loc, .iloc, slicing, and conditional selection.
  2. Apply indexing strategies to ensure efficient and accurate data access.

Objective 4.1.5 – Perform array operations and distinguish between core data structures.

  1. Use NumPy to execute array-based operations including arithmetic, broadcasting, and aggregations.
  2. Differentiate between arrays, lists, Series, DataFrames, and NDArrays, and evaluate their use cases and performance.

Objective 4.1.6 – Group, summarize, and extract insights from data.

  1. Group data using groupby() and create summary tables using pivot and cross-tabulation techniques.
  2. Calculate descriptive statistics using Pandas and NumPy to identify trends, detect anomalies, and support decision-making.

Statistical Methods and Machine Learning

Objective 4.2.1 – Apply Python's descriptive statistics for dataset analysis.

  1. Calculate and interpret key statistical measures such as mean, median, mode, variance, and standard deviation using Python.
  2. Utilize Python libraries (like Pandas and NumPy) to generate and analyze descriptive statistics for real-world datasets.

Objective 4.2.2 – Recognize the importance of test datasets in model evaluation.

  1. Understand the role of test datasets in validating the performance of machine learning models.
  2. Demonstrate knowledge of proper test dataset selection and usage to ensure unbiased and accurate model evaluation.

Objective 4.2.3 – Analyze and evaluate supervised learning algorithms and model accuracy.

  1. Analyze various supervised learning algorithms to understand their specific characteristics and applications.
  2. Evaluate the concepts of overfitting and underfitting within these models, including a detailed explanation of the bias-variance tradeoff.
  3. Assess the intrinsic tendencies of linear and logistic regression in relation to this tradeoff, and apply this understanding to prevent model accuracy issues.

Block 5: Data Communication and Visualization

5 objectives & 25 sub-objectives covered by the block → 5 exam items

Data Visualization Techniques

Objective 5.1.1 – Demonstrate essential proficiency in data visualization with Matplotlib and Seaborn.

  1. Utilize Matplotlib and Seaborn to create various types of plots, including Boxplots, Histograms, Scatterplots, Lineplots, and Correlation heatmaps.
  2. Interpret the data and findings represented in these visualizations to gain deeper insights and communicate results effectively.

Objective 5.1.2 – Assess the pros and cons of different data representations.

  1. Evaluate the suitability of various chart types for different types of data and analysis objectives.
  2. Critically analyze the effectiveness of chosen visualizations in conveying the intended message or insight.

Objective 5.1.3 – Label, annotate, and refine data visualizations for clarity and insight.

  1. Incorporate labels, titles, and annotations in visualizations to clarify and emphasize key insights.
  2. Utilize visual exploration to generate hypotheses and test insights from datasets.
  3. Practice making data-driven decisions based on the interpretation of visualized data.
  4. Customize colors in plots to improve readability of a scatterplot.
  5. Label axes and add titles to improve data readability.
  6. Manipulate legend properties such as position, font size, and background color, to improve the esthetics and readability of data.

Effective Communication of Data Insights

Objective 5.2.1 – Tailor communication to different audience needs, and combine visualizations and text for clear data presentation.

  1. Analyze the audience to understand their background, interests, and knowledge level.
  2. Adapt communication style and content to meet the specific needs and expectations of diverse audiences.
  3. Create presentations and reports that effectively convey data insights to both technical and non-technical stakeholders.
  4. Integrate visualizations seamlessly into presentations and reports, aligning them with the narrative.
  5. Use concise and informative text to complement visualizations, providing context and key takeaways.
  6. Ensure visual and textual elements work harmoniously to enhance data clarity and understanding.
  7. Avoid slide clutter and optimize slide content to maintain focus on key messages.
  8. Craft a compelling data narrative that tells a story with data, highlighting insights and actionable takeaways.
  9. Select an appropriate and consistent color palette for visualizations, ensuring clarity and accessibility.

Objective 5.2.2 – Summarize key findings and support claims with evidence and reasoning.

  1. Understand the process of identifying and extracting key findings from data analysis.
  2. Apply techniques to condense complex information into concise and meaningful summaries.
  3. Prioritize and emphasize the most relevant insights based on context.
  4. Explain the importance of backing assertions and conclusions with data-driven evidence and reasoning.
  5. Articulate the basis for claims and recommendations, demonstrating transparency in decision-making.
  6. Demonstrate proficiency in clearly presenting evidence to support claims and recommendations.

Download PCAD-31-02 Exam Syllabus in PDF

MQC Profile

A Minimally Qualified Candidate (MQC) for the PCAD™ – Certified Associate Data Analyst with Python certification is expected to demonstrate the essential skills and knowledge required to support junior-level data analysis tasks in professional settings.

The candidate should understand how to acquire, clean, prepare, analyze, model, and communicate data using Python, SQL, and widely used data tools and libraries. They must be able to connect to data sources, including databases, spreadsheets, APIs, and HTML web pages, and collect relevant data using tools and libraries such as requests and BeautifulSoup.

The MQC is proficient in writing and organizing Python scripts that utilize variables, functions, control flow, and data structures like lists, dictionaries, and sets. They apply best practices in Python programming, including documentation, error handling, and modular design, and are able to manage packages using pip.

They can use libraries such as pandas, numpy, and statistics to clean, reshape, and analyze structured datasets, as well as calculate descriptive statistics, correlations, and basic aggregations. They should also be able to execute SQL queries to retrieve and manipulate data and use sqlite3 to connect Python scripts to databases. They understand how to use parameterized queries to ensure data integrity and avoid SQL injection.

The candidate can perform basic statistical modeling, including linear and logistic regression, and apply inferential techniques such as bootstrapping. They should recognize the importance of model validation, test data splitting, and the risks of overfitting.

Finally, the MQC is able to create clear, insightful, and audience-appropriate visualizations using Matplotlib and Seaborn, structure effective data stories, and present findings in both written and verbal formats, using design and communication best practices.

This profile represents a blend of technical proficiency, analytical thinking, and communication skills crucial for navigating the complexities of data-driven environments.

Contents


Block 1: Data Acquisition and Pre-Processing

Weight: 29.2% of total exam (14 items)

The MQC understands what data is, how it is structured, and how it is transformed into usable information for analysis. They can describe different data types – including structured, semi-structured, and unstructured – and explain how these formats influence storage, processing, and analytical techniques. The MQC is able to explain and compare various data collection methods such as surveys, interviews, APIs, and web scraping using tools like BeautifulSoup, and understands how these methods are used in research, business, and analytics contexts.

They can identify appropriate storage options for different data types, including CSV, JSON, Excel, databases, data lakes, and warehouses, and explain the role of cloud-based storage in modern data ecosystems. They recognize how poor data collection or storage practices can lead to quality issues and errors later in the analytics process.

The MQC is capable of integrating data from multiple sources and resolving inconsistencies during aggregation. They understand the implications of format alignment, type mismatches, and schema discrepancies. They can apply basic data cleaning techniques, including identifying and correcting missing, duplicate, or invalid values. They understand the importance of encoding categorical data, scaling numerical values, and formatting date-time values for consistency.

They can apply basic validation methods (such as type checks, range checks, and cross-reference logic) to ensure data quality and integrity. They are also able to prepare data for analysis by sorting, filtering, reshaping (wide vs. long), and splitting data into training and testing sets, particularly in preparation for modeling tasks. The MQC also understands ethical and legal responsibilities when working with personal data, including anonymization, consent, and compliance with frameworks such as GDPR and HIPAA.


Block 2: Programming and Database Skills

Weight: 33.3% of total exam (16 items)

The MQC is proficient in Python and uses the language to support data processing tasks. They can define and manipulate variables, use core data types (integers, floats, strings, booleans), and work with fundamental data structures such as lists, dictionaries, tuples, and sets. They write clear and reusable code using functions with parameters and return values, and they apply control flow constructs such as conditional statements and loops to process and analyze data efficiently.

They demonstrate awareness of clean coding practices, including proper indentation, naming conventions, and documentation using PEP 8 and PEP 257 standards. They are familiar with Python’s standard library modules – such as csv, os, math, statistics, datetime, and collections – and know how to install and manage third-party packages using pip.

The MQC is comfortable using object-oriented programming (OOP) concepts to structure and encapsulate data. They can define classes, create objects, and organize attributes and methods in a way that supports reusability and clarity in data workflows.

In addition to Python, the MQC is expected to use SQL to query and manipulate structured data. They can retrieve and filter data using SELECT, WHERE, and various types of JOIN clauses, and they are capable of aggregating and grouping data using GROUP BY, HAVING, and ORDER BY. They understand how to perform CRUD operations and can write SQL statements to insert, update, and delete data.

They can connect Python scripts to relational databases using libraries such as sqlite3, and they understand how to execute parameterized queries to protect against SQL injection and ensure data integrity. The MQC also understands how to convert data types appropriately between SQL and Python when extracting or inserting data.


Block 3: Statistical Analysis

Weight: 8.3% of total exam (4 items)

The MQC has a solid grasp of foundational statistical concepts and can apply descriptive statistics to summarize datasets. They understand and can calculate measures of central tendency (mean, median, mode) and dispersion (standard deviation, variance), and they can describe basic distribution types, such as normal and uniform distributions.

They can assess relationships between variables using correlation analysis, particularly Pearson’s R, and can identify outliers both visually and statistically. The MQC understands that visualizations like histograms, boxplots, and scatterplots support exploratory understanding of data distributions and trends.

In terms of inferential statistics, the MQC is familiar with bootstrapping as a method for estimating sampling distributions, especially when theoretical distributions are unknown. They understand the difference between discrete and continuous data and recognize when bootstrapping is appropriate for evaluating reliability.

They can explain and apply linear and logistic regression, understand the assumptions behind these models, and interpret their outputs, including coefficients and model fit statistics. The MQC is also aware of common limitations, such as overfitting, and can discuss the importance of model validation.


Block 4: Data Analysis and Modeling

Weight: 18.8% of total exam (9 items)

The MQC is proficient in using Pandas and NumPy for data analysis. They can clean and organize data using functions like dropna(), fillna(), sort_values(), and replace(), and they can reshape and restructure data using methods such as pivot(), melt(), groupby(), and merge(). They understand the difference between DataFrame and Series objects and can use .loc and .iloc to access and modify data precisely.

Using NumPy, the MQC performs numerical operations, including element-wise calculations, aggregations, and array broadcasting. They understand the performance advantages of NumPy arrays over native Python lists for large datasets.

They can compute descriptive summaries and perform basic feature engineering tasks such as bucketing, scaling, and encoding in preparation for modeling. The MQC is capable of analyzing datasets with grouped summaries and conditional filters, using combinations of Pandas and NumPy functionality to extract insights.

They understand the structure and purpose of supervised learning workflows and can apply train/test splits to evaluate models. They are able to fit and interpret basic linear and logistic regression models and understand key modeling concepts such as accuracy, overfitting, underfitting, and the bias-variance tradeoff.


Block 5: Data Communication and Visualization

Weight: 10.4% of total exam (5 items)

The MQC is able to interpret and create effective visual representations of data using Matplotlib and Seaborn. They understand how to choose appropriate chart types based on data type and purpose, and can generate bar charts, histograms, scatterplots, boxplots, line charts, and correlation heatmaps to highlight trends and patterns.

They know how to enhance chart readability by adding labels, titles, legends, gridlines, and appropriate color schemes. They can annotate graphs to draw attention to key insights and customize chart aesthetics to improve clarity.

The MQC understands how to communicate findings to different audiences by combining visuals with concise and informative summaries. They can adapt their messaging to suit both technical stakeholders (e.g., data teams) and non-technical audiences (e.g., managers, clients).

They are able to structure a data narrative that connects analysis to business or research questions, ensuring that conclusions are supported by data and that recommendations are evidence-based. They demonstrate awareness of design principles for slide decks and written reports, and avoid clutter, ambiguity, and misleading visuals.

Passing Requirement

To pass the PCAD exam, a candidate must achieve a cumulative average score of at least 75% across all exam blocks.