Most used terms in Data Science !!

From Newbie to Newbie
4 min readAug 1, 2020

--

(Hi, this is my first article in English, so sorry for my mistakes.)

When we are studying Data Science, we see many unknown terms in this topic and feel lost.

This article intends to remove the abstraction of those terms a little.

I created a “word cloud” with some of the most used words in data science and I want to present what they mean:

Let´s do it!!

1- Data set:

Are several data about something, in most times organized on tables or spreadsheets.

From Pixabay.

2- Data mining:

It is about the exploration and extraction value of data, which is one more important step in Data Science, as it focuses on many techniques that determine the success of the work.

From Pixabay.

3- Data-driven

No longer just follow intuitions or life experiences, but follow scientific methods, basing your business on data for decision making.

From Pixabay.

4- Data lake:

It is a repository responsible for storing all the data of a company, most of the time it is raw data.

From Pixabay.

5- Outliers:

Outliers are analyzed values that distort reality when we look at all the data, they may be errors or not, there are many techniques to identify and treat them. For example, a sample on wages in which an individual receives a salary 10 times higher than the other individuals in the sample is an ‘outlier’.

From Unsplash.

6- API ( Application Programming Interface):

API integrates applications as a ‘matrix’, giving them the possibility to share information about users or about the encoding used within the applications, for example, a website needs you to create a new account and gives you the opportunity to create this account using data from Gmail or Facebook …

From Pixabay.

7- IDE (Integrated Development Environment):

It is a computer program that offers resources to optimize your programming experience.

From Unsplash

8- Python:

I don’t know about other programming languages, but I want to start with python because it is one of the most necessary languages for companies today and it is easier to learn according to experts, it has many libraries to work with data.

From Pexels

9- Pandas:

“Import pandas as pd”, one of the most important libraries for data analysis it offers tools to manipulate and analyze data in a fast and diverse way.

From Unsplash

10- Numpy:

No questions that Numpy is a powerful library, offers math functions to work with structures of arrays, multivariable matrices.

From Pixabay.

11- Machine Learning:

For me, it is the most important term in Data Science because it involves the tools that enable automation, pattern recognition and facilitate the work of Data Scientists.

From Unsplash

12- Overfitting:

Overfitting is a term used in machine learning that means the accuracy of a statistical model. In this case, the model passes the tests, but it is inefficient in predicting new values.

From Pexels.

13- Deploy:

Deploy is an implementation of your project, website, software, ‘put on the air’.

From Pixabay.

and finally…

14- Cloud:

In this pandemic, it became evident that cloud computing is a very important factor, as it has the possibility to consume storage services using only the internet. In short, it offers many computer services that only require an internet connection.

Feel free for suggestions, Thanks!

--

--

From Newbie to Newbie

I’ll try to teach more simply what I learn about data and with data.