The Big Data Developer Program

This non-credit career training program is designed to introduce the technically inclined student to the technologies and methodologies requested by hiring companies and used by real world data engineers. This program is fast paced and will cover a breadth of technologies, including Python programming, Hadoop and cloud-based services in Amazon Web Services (AWS). The student will also be introduced to such methods as data wrangling, munging, ingesting and modeling for analytics.

By the end of the program the successful student will be prepared for an entry-level position as a data engineer, Python programmer or business intelligence developer.

Career Outlook

According to an article on, "DICE's recent 2020 Tech Job Report reported Data Engineer as the fastest-growing job roles in 2019, growing by 50% in 2019. The report also found it takes an average of 46 days to fill data engineering roles and predicted that the time to hire Data Engineers may increase in 2020 "as more companies compete to find talent they need to handle their sprawling data infrastructure.

DICE also noted the Amazon, Accenture and Capitol One - all companies with deep pockets to pay high salaries - are hiring Data Engineers at high rates." (DuBois)

Program Prerequisites:

26 Week Course Outline

  • A brief history of Python
  • Python package management and repositories
  • Installing the Anaconda distribution of Python
  • Installing PyCharm
  • Language structure and syntax
  • Data Structures in Python
  • Variables
    • Strings
    • List
    • Sets
    • Tuples
    • Dictionaries
    • Other iterables
  • Conditional logic
      • For Loop
      • If Else and If, Elif, Else statements
  • Comprehensions
  • Error handling in Python
  • Explain what the cloud is
  • The major vendors in the cloud space
  • Regions, Availability Zones (AZ) and VPCs
  • How to create an AWS account
  • How to monitor AWS account billing
  • Understand various aspects of AWS technology
  • IAM and security using roles and key pairs
  • How AWS uses Infrastructure
  • How to launch an EC2 instances
  • Setup databases in RDS and in LightSail
  • Store data and other file objects in an S3 bucket
  • How to use AWS Glue and Athena
  • How to use SageMaker
  • How to spin up an EMR cluster
    • From console
    • From Boto3 using Python
  • A database vs. a data warehouse vs. a data lake
  • Data storage concepts
    • Row oriented
    • Columnar
    • JSON
    • File formats
    • Parquet
    • Avro
    • Hive
    • CSV
    • ORC
  • AWS S3 Buckets and File Compression
    • GZIP (.gz or .gzip)
    • SNAPPY (.snappy)
    • ZLIB (.zlib)
    • LZO (.lzo)
    • BZIP2 (.bzip or .bz2)
  • Preparing an S3 Bucket for a Data Lake Storage
  • AWS Lake Foundation
  • Amazon Redshift
  • Why Model Data?
  • Types of Data Models
  • Entity Relationships
  • An introduction to the
  • Relational Data Model
  • Normalization and
  • Normal Forms
    • NF1 through NF3
  • OLTP vs OLAP
    • The star schema for data warehousing
  • Notebooks
    • Popular notebooks
  • Jupyter
    • Classic vs Lab
    • Data Access
    • Local vs. Server
    • Clusters
    • JupyterHub
  • Jupyter Notebooks with AWS SageMaker
  • Jupyter Notebooks with AWS EMR –
  • Elastic MapReduce
  • Data Visualization
    • Tableau
    • Amazon QuickSight
  • Student will use the technologies and methods they learned in the Big Data Engineering Program to build a complete data and analytics platform and will present their platforms to the class with slide deck presentation and live demos.

Part-time.   Flexible. Affordable.

Online Class Format

Students meet online once a week. In addition to the weekly class meeting, students meet once a week for a remote, 30-minute one-on-one with a mentor.

Students start each academic week on Sunday and are required to watch all videos and start working on assignments/projects before their meeting online to be prepared.

For more Information please contact: