Let’s install Anaconda
Pull up this page in your browser:
It will automatically sense your O/S and point to the page with that download package. I recommend that you use the graphic install since you get Anaconda Navigator with it that aggregates the tools and docs you need to work with it more efficiently.
You will get a message telling you how much space will be taken up:
It might take around 15-20 minutes to completely install depending on your system.
Follow the prompts and once you are done installing, you can launch Anaconda Navigator which will look like this:
Get the Anaconda Cheat Sheet
Your new development setup includes some environment variables that would normally be a pain to setup individually. Using your platform version of Anaconda, everything is setup and properly configured for you with this install.
Anaconda includes many of the libraries we will be using in this course. When everything is correctly installed, you should be able to run a Python command-line and try out some imports. The ‘resources’ tab in this set includes some commands to try and verify your installation.
After installing everything, you need to run on the command line in terminal:
This will reset your environment to accept the changes that the Anaconda installation added.
To test that things are running right – in terminal, execute these lines:
$ python Python 3.7.1 (default, Dec 14 2018, 13:28:58) [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information.> >>> import numpy as np >>> import pandas as pd >>> print ("hello world") >>> hello world >>> quit() $
If you see the ‘>>>’ after the imports then you know that Anaconda has successfully installed all the libraries for data science.
The Anaconda install is pretty reliable and you will be able to ‘set it and forget it ‘ for the most part.
There will be the occasion where you have to add a package library that it doesn’t have but you can do that with ‘conda install package library’.
These two articles:
provide a clear case for why you may need virtual environments when using Python for data science projects.
Although anaconda/root is a fully formed environment with everything present, you may want to create smaller, more concisely assembled project environments which can be shared and do not depend on the entire stack of packages every time you just want to do something focused.
Check out the root environment in the Anaconda Navigator. You will see what it includes.
To create a new virtual environment, at the command line in a terminal:
$ conda create -n my_env $ source activate my_env (my_env) $ conda install numpy as np (my_env) $ conda install pandas (my_env) $ python Python 3.7 - etc >>> import numpy >>> import pandas >>> np.linspace(2.0,3.0,num=5) >>> array([ 2. , 2.25, 2.5 , 2.75, 3. ])
A visit to Python.org will give you an idea of the scope and depth of the language. There are many sites and blogs that explain in depth what the myriad methods and functions of Python entail.
Virtual Help with StackOverflow
Google searches typically get you a handful of top-10 results pointing to stackoverflow.com. This is the virtual q&a for programming. Believe that if you can think of a programming question or have a problem with syntax, someone else has already asked a similar if not exactly the same question.
Here is a search I just did in Google:
Here is the stackoverflow.com post :
I often just punch error messages straight into google search and often times get right to a stackoverflow q&a. Get comfortable with it. It will be your ladder out of debugging rabbit holes.
The video takes us on a tour of available e-learning resources, some free and others pretty affordable.
We are going to learn about:
Python is a language that needs a week or two of self-learning to start to understand what its all about. You will find it easy to pick up and extend your knowledge as you experiment with it. There are so many open-source libraries which are well-documented using conventions that the Python and data science communities are familiar with. This is what makes it so engaging and one of the easiest programming languages to learn.
Many IDEs or integrated development environments, some free, provide inline code auto-completion which is essential to the new convert to Python. You will find this very useful when learning Python and data science.
Here are some free IDEs that can be used for Python development:
Atom (Is Atom Dead?)
Pycharm Community Edition (Compare Features)
In the next section, Jupyter Notebooks, you will see the same kind of auto-complete for Python syntax that you would get with subscription-based IDEs like PyCharm.
Once you have completed some basics in Python through self-paced, online courses, come back and jump into the lab we setup to exercise your newly acquired programming skills. These are oriented around the kinds of things you will be doing with Amateur Data Science so you can get comfortable with more challenging projects.
By now you have boot-camped on Python to get to where you want to start exploring data. That is why Jupyter notebooks are so powerful and relevant to the pursuit of data analytics and visualization. Many data scientists, both pro and amateur are using Jupyter notebooks as a communications vehicle to learn, share and explore new algorithms and datasets. Jupyter Notebooks are like an IDE in that you can enter code into cells, run it individually and debug as you go. You can also visualize analytics right in line which is very powerful for exploratory data analysis.
Once your code is solved in the notebook you can copy the relevant portions over to an IDE for Python and go hard-core with it. This allows you to build apps from your experiments in Jupyter and deploy them to the cloud or your websites knowing you worked our the algorithms interactively. There may be very little modifications required to run your proven code as executable Python except maybe for the instantiation of the visualization part.
You run Jupyter on your system and it presents in a browser window. There is also an online Jupyter site, called Nbviewer where you can upload notebooks (which are non-executable) from your git repos. Colleagues then can download these and run them locally. Even more exciting is JupyterLab which is like an IDE that you run locally. Read this article from Towards Data Science that has some video lectures about it. I use Github ‘gists’ to cut and paste readable J-Notebooks into the course posts.
Once you have Anaconda Navigator installed, you can simply launch J-Notebooks which will link to your user root directory. That way you can navigate to any folders where you have notebooks, with the extension ‘.ipynb’.
The navigator allows you to launch Jupyter Notebooks and start coding and exploring interactively in a cellular motif.
No matter which way you eventually prefer to use notebooks, you have choices. I still run these locally on my system and can back them up to Github and my network-attached backup device. When you want to share or publish, you can use Nbviewer, github, the latter which can be embedded into sites or blogs quite easily.
Github is a cloud-based source code control service that has become an integral part of any software development. What you get from this is a stable, robust and sharable repository for your project code that can be cloned down to your system of choice. In fact, the notion of ‘forking’ is where another clones your repo for their own use. If they choose, their improvements can be merged back with your repo as a new branch that you or others may choose to take. We will use Github repos to keep our code and notebooks intact for you to use.
First thing is to get github installed on your system. This is easy and is done like this:
Download git for your system here:
Then run this command from your terminal to check the version:
$ git --version
Now, we should go create an account on Github.com and initialize your first repo.
As you saw in the video, it is pretty easy to create a free Github.com account. Then we created a new repository or ‘repo’ for short. This is the well understood concept of a set of files for one’s project. It could be anything from a set of jupyter notebooks and data or Python code with .csv files, a readme.md (markdown) file and the like.
In the video, we:
- Created a new repo on our computer
- Initialized it as a git local repo
- Added files to the git database
- Committed those to the database
- Set the remote to our Github.com/repo URL
- Did a push of the local to the remote repo as Origin Master
Next, we brought the repo over to a cloud-based AWS/EC2 Linux server:
- On the Linux directory, did a git clone (github.com repo link)
- It created the hierarchy that exists on our computer and the github.com repo
- Now we can run the programs as we did on the local computer
both programs using a
df = web.DataReader(ticker,'iex',start,end)
print("\n\tA Sample of Apple Stock Prices During 2018\n") print(df.head(10))
Section 1: Anaconda Install
Download Anaconda for your platform (Win,OSX, Linux)
After installation, open a terminal and run these commands to verify the install went through.
conda --version conda --help
Section 3: Github Setup
Download git for your system here:
read the Setup Git tutorial
read the Create a Repo Tutorial
Section 4: Jupyter Notebooks
Launch Jupyter Notebooks right from Anaconda Navigator.
Section 5: Code Something
Here are the notes for section 5: