Ref: Coursera - Johns Hopkins Courses.
Link to Coursera - Johns Hopkins Data Science Course: Link
ROUGH NOTES (!)
Updated: 10/4/26
We will learn to: Ask the right questions, manipulate data sets, and create visualizations to communicate results.
\[{ \boxed{\textbf{The Data Scientist's Toolbox}} }\][What is Data Science?]
Data Science is using Data to answer questions.
Lots of data is currently available and being generated.
There are a few qualities that characterize big data:
- Volume: More and more data is becoming increasingly available.
- Velocity: Data is being generated at an astonishing rate.
- Variety: The data we analyze comes in many forms.
Eg: Youtube data.
Data Science involves Substantive Expertise, Hacking Skills, and Math Knowledge.
First, we need to have enough expertise in the area that we want to ask about in order to formulate our questions, and to know what sorts of data are appropriate to answer that question.
Suppose we have our question and appropriate data. Oftentimes it needs to undergo significant cleaning and formatting. This often takes computer science/hacking skills.
Finally once we have our data, we need to analyze it. This often takes math and stat knowledge.
In this course, we will spend a bit of time focusing on each of these three sectors.
Eg: One great example of data science in action is from 2009, in which researchers at Google analysed 50 million commonly searched terms over a five year period, and compared them against CDC data on flu outbreaks. Their goal was to see if certain searches coincided with outbreaks of the flu. One of the benefits of data science and using big data is that it can identify correlations; in this case, they identified 45 words that had a strong correlation with the CDC flu outbreak data. With this data, they have been able to predict flu outbreaks based solely off of common Google searches! Without this mass amounts of data, these 45 words could not have been predicted beforehand.
[What is Data?]