We Are Going On This Journey Together
Learn by doing | Share Insights | Dig Deep.
Data Science is proliferating throughout business, public-sector, journalism and society at a rapid rate. There are not enough formerly trained data scientists available to fulfill the needs of these groups. The onus is on engineers, hobbyists and tinkerers to learn by doing, share insights and dig deep. It behooves us to make use of open-source tools, datasets and algorithms widely available for us to achieve our goals.
"Without data you’re just another person with an opinion"
W. Edwards Deming
Learning data science on our own requires that we break it down into six topics:
It won't be too hard to learn a new language. A programming language like Python. We are going to share our experiences, tip, tricks and tools and get code flying in a pretty short time. I did it and I know you can too. Anaconda is something we want to start out with since it incorporates so many of the packages used in data science.
Data is everywhere. The problem is that much of the data we want to analyze starts out unstructured. It is sometimes in weird places and there are many tools and techniques to 'scrape' or pull it via API's. Data wrangling was born to address this and it is a must-have skill to do anything in data science, even if just for fun.
After years of open source development, many library packages have been let loose in the wild for anyone to use, free of charge. These libraries are groomed to attach to frameworks and languages like Python. The ecosystem for Python is huge so we need to get comfortable working with packages, virtual environments and updaters like Pip and Conda.
Displaying complex charts and graphs that we will build using Python data science tools takes some practice. It can be hard but we can start out simple and build on that. Great libraries like Plot.ly, Dash and Bokeh as well as the grandaddy, Matplotlib are meant to be used regularly. There is a ton of open source examples and projects we can draw from to experiment and learn.
Data science (or Python) would still be in a very nascent form without collaborative sites like Github, stack overflow, Kaggle and developer chat such as Slack, Gitter, Discord and others. Now even Quora and Medium are becoming destinations for the avid pro and amateur data science hacker alike. Social networking for developers is one of its greatest super powers and we will definitely take good advantage of it here. Lest we forget, there are a number of great instructors out there who are worth checking out.
Well, what can we say? The cloud as a conceptual construct is kind of an overlay to the internet. Only thing is that it has well understood structures, methods and tools that are essential to what we need to do here with data science. Thanks to a few thousand dev/ops visionaries in the early 2000's and a handful of companies like Amazon, Google and Microsoft, we now have a robust cloud ecosystem of vendors, tools and a lot of open source to make use of. It is ton of well-made stuff and a lot of it is somewhat free.
After we build up some Python, Analytics and Visualization muscle, we will slow-walk into Machine-Learning and see what all the fuss is about. This brings it together. Prepare to stretch hard and stand on the shoulders of giants who've come before.