Data Wrangling
Topics
- version control with git
- data structures, indexing
- working with arrays
- array attributes, methods, and functions
- mathematical operations
- summary statistics
Materials
Version Control with Git
Before the workshop
-
Check whether you already have Git.
Windows: From the command prompt, run
git --version
Mac: From the Terminal, run
git --version
If git is already installed, you will get a response with the version number, e.g.,
git version 2.15.1
. If not, proceed with the following steps. - Install Git by following these instructions for your operating system.
- From these instructions, Complete the sections First-Time Git Setup and Your Identity
Basic workflow
- Create a repository with a README on Github
-
On your computer, use the command line to clone the repository
git clone https://github.com/<your_github_username>/<your_repository_name>.git
- Create a new files or make changes to an existing file
- Add the names of any files or directories that you want to ignore to your
.gitignore
file (creating the file if necessary) -
When you want to record a snapshot, add the file to the staging area
git add <filename>
-
Check the status of the repository to make sure the file is properly staged
git status
-
Commit your changes and include a description of the changes
git commit -m "<short description of changes"
-
When you want to update your remote repository, push the changes to Github
git push origin master
Resources
Python
- Python Software Foundation
- PEP 8 Python Style Guide
- Python Data Science Handbook
- NumPy Documentation
Tools
« Introduction to Python
Probability & Visualization »