NaLette Brodnax

Python & Statistics Bootcamp - Summer 2018

Home Student Projects

Probability & Visualization

Topics

  • working with series and data frames
  • probability axioms, random variables
  • distribution functions
  • visualizing data and relationships

Materials


Datasets


Activity Instructions

  1. Download the dataset (above) that you would like to use for your project
  2. Confirm that the dataset is saved in the same directory as your notebook (in the bootcamp folder
  3. Review the pandas documentation for .isnull(), .notnull(), and .fillna()
  4. Create a data frame object from your dataset
  5. Create a subset with three quantitative variables
  6. Answer the following in your notebook: Are your variables missing any observations? If so, how many?
  7. Compute the following descriptive statistics for each variable: mean, variance, and standard deviation

Bootcamp Project Part I

For the final project, you will create a Jupyter notebook and facilitate a short discussion with the group. The notebook should have the following sections:

  • Background
  • Data
  • Analysis
  • Conclusion

For today, you will complete the Background and Data sections. These should be brief, or in outline form.

Background

  • What is the topic for your project?
  • What do you hope to learn?
  • Who or what is the source of your dataset?
  • Broadly, what kind of information is included in the dataset?

Data

  • Describe whether you are interested in a particular variable, or in a relationship between variables
  • Choose two or three variables of interest
  • Complete any data manipulation necessary so that you have a data frame with variables as columns and observations as rows
  • Complete the following
    • A table of summary statistics, including number of observations, min, max, mean, and standard deviation
    • A frequency distribution plot (see plt.hist()) for each variable
    • A scatterplot for two variables (see plt.scatter())

Resources

Python

Tools