PCED™ – Certified Entry-Level Data Analyst with Python: EXAM SYLLABUS



PCED-30-01 Exam Syllabus


Status: ACTIVE

PCED-30-01 badge

The exam consists of five sections:

Section 1 → 8 items, Weight: 20%
Section 2 → 10 items, Weight: 25%
Section 3 → 8 items, Weight: 20%
Section 4 → 8 items, Weight: 20%
Section 5 → 6 items, Weight: 15%





Objectives covered by the block (8 exam items)

Data Collection

  • Objective 1a.1 – Assess candidates’ understanding of data collection methods, appropriate usage, and limitations
  • Objective 1a.2 – Test candidates’ ability to select and implement appropriate data collection techniques based on project requirements
  • Objective 1a.3 – Evaluate candidates’ knowledge of sourcing data from various databases and APIs
  • Objective 1a.4 – Evaluate candidates’ knowledge of using pandas for basic data loading, manipulation, and summarization
  • Objective 1a.5 – Test candidates’ proficiency in reshaping DataFrames to meet analysis goals

Data Cleaning

  • Objective 1b.1 – Assess candidates’ ability to identify what constitutes bad data
  • Objective 1b.2 – Evaluate candidates’ ability to identify and rectify common data quality issues such as missing data, inconsistent data, or anomalies
  • Objective 1b.3 – Test candidates’ proficiency using programming languages (like Python) to clean and pre-process data
  • Objective 1b.4 – Assess the candidates’ ability to implement strategies for handling outliers and missing data
  • Objective 1b.5 – Evaluate candidates’ understanding and application of data cleaning using Python libraries

Data Validation (text, numeric, audio, video, & social data)

  • Objective 1c.1 – Evaluate candidates’ understanding of data validation techniques, including data type validation, range validation, and cross-reference validation
  • Objective 1c.2 – Test candidates’ ability to validate data from various formats (text, numeric, audio, video, & social data)
  • Objective 1c.3 – Evaluate candidates’ ability to ensure the reliability and accuracy of the collected data
  • Objective 1c.4 – Assess candidates’ ability to implement robust data validation procedures to ensure the reliability and accuracy of the collected data
  • Objective 1c.5 – Test candidates’ knowledge of consolidating and validating data from multiple datasets and formats


Objectives covered by the block (10 exam items)

Python

  • Objective 2a.1 – Assess candidates’ understanding of the Python Data Science Ecosystem
  • Objective 2a.2 – Evaluate candidates’ understanding of Python syntax and data structures.
  • Objective 2a.3 – Test candidates’ ability to use Python libraries such as pandas for data loading, manipulation, and summarization
  • Objective 2a.4 – Assess candidates’ familiarity with key Python libraries for data analysis, such as Pandas, Numpy, and Scikit-learn.

Data Scripting

  • Objective 2b.1 – Test candidates’ proficiency in writing scripts for automated data manipulation tasks.
  • Objective 2b.2 – Assess candidates’ understanding of best practices in scripting for maintainability and efficiency.
  • Objective 2b.3 – Evaluate the ability of candidates to debug and optimize scripts.
  • Objective 2b.4 – Test candidates’ proficiency in preparing a dataset for modeling.

Extract-Transform-Load

  • Objective 2c.1 – Assess candidates’ understanding of ETL concepts and the roles of each stage (Extract, Transform, Load).
  • Objective 2c.2 – Evaluate candidates’ ability to design and implement ETL processes, including extracting data from various sources, transforming it to fit operational needs, and loading it into the end target.
  • Objective 2c.3 – Evaluate candidates’ proficiency in reshaping DataFrames to meet analysis goals.
  • Objective 2c.4 – Test candidates’ ability to troubleshoot and resolve common issues encountered during ETL processes.


Objectives covered by the block (8 exam items)

Statistics

  • Objective 3a.1 – Evaluate candidates’ understanding of basic statistical concepts, such as mean, median, mode, standard deviation, variance, correlation, and statistical significance.
  • Objective 3a.2 – Evaluate candidates’ understanding and application of basic statistical operations using pandas, such as data aggregation to calculate simple statistics.
  • Objective 3a.3 – Test candidates’ ability to select and correctly apply appropriate statistical tests (t-tests, chi-square tests, ANOVA, etc.) given different types of data and research questions.
  • Objective 3a.4 – Assess candidates’ proficiency in implementing and interpreting supervised learning models, a fundamental statistical technique for predictive analysis.
  • Objective 3a.5 – Test candidates’ ability to build and evaluate linear regression models, one of the most commonly used statistical methods for predictive analysis.
  • Objective 3a.5 – Evaluate candidates’ capability to build and evaluate logistic regression models, a fundamental statistical method for classification tasks.
  • Objective 3a.6 – Assess candidates’ understanding of the underlying statistical concepts and assumptions involved in preparing a dataset for modeling.
  • Objective 3a.7 – Test candidates’ ability to interpret the results of statistical analyses, including understanding p-values, confidence intervals, and effect sizes and making valid inferences from these results.


Objectives covered by the block (8 exam items)

Analyzing Data

  • Objective 4a.1 – Assess candidates’ ability to use pandas for basic data loading, manipulation, and summarization, essential elements in data analysis.
  • Objective 4a.2 – Test candidates’ ability to perform data reshaping tasks, such as subsetting and sorting data frames, to meet analysis goals.
  • Objective 4a.3 – Evaluate candidates’ ability to apply statistical operations such as aggregation for data analysis.

Building Data Models

  • Objective 4b.1 – Assess candidates’ understanding and application of supervised learning models, fundamental techniques in building data models.
  • Objective 4b.2 – Test candidates’ skills in preparing a dataset for modeling.
  • Objective 4b.3 – Evaluate the candidate’s ability to understand and diagnose issues related to overfitting, underfitting, and model complexity in linear and logistic regression models.
  • Objective 4b.4 – Test the candidate’s handling of categorical and continuous features in the model-building process, including one-hot encoding, feature scaling, and normalization techniques.
  • Objective 4b.5 – Assess the candidate’s capability to use cross-validation methods to evaluate the model’s performance.


Objectives covered by the block (6 exam items)

Building Data Reports

  • Objective 5a.1 – Assess the candidate’s ability to synthesize complex datasets into digestible information.
  • Objective 5a.2 – Evaluate the candidate’s understanding and application of appropriate data aggregation techniques.
  • Objective 5a.3 – Test the candidate’s proficiency in organizing and presenting data in a structured manner.

Building Dashboards

  • Objective 5b.1 – Examine the candidate’s capability to design and build interactive dashboards.
  • Objective 5b.2 – Assess the candidate’s ability to select relevant metrics and insights for dashboard display.
  • Objective 5b.3 – Evaluate the candidate’s ability to use dashboard-building tools.

Visualizing Data

  • Objective 5c.1 – Test the candidate’s skills using visualization libraries such as Matplotlib.
  • Objective 5c.2 – Evaluate the candidate’s ability to choose and create appropriate visualizations, including basic figures, standard graphs, and formatted/stylized figures.
  • Objective 5c.3 – Assess the candidate’s proficiency in visualizing high-dimensional data using techniques such as PCA or t-SNE.
  • Objective 5c.4 – Evaluate the candidate’s ability to effectively communicate insights gained from visualizations to both technical and non-technical audiences.

Last updated: September 20, 2023
Aligned with Exam PCED-30-01