What is Data Science?
We hear more and more about Data Science, it is the buzzword in companies, on the web and in schools.
what is data science in simple words
|What is Data Science?|
What is this discipline?
Data Science is nothing more than a multidisciplinary field whose goal is to use (Digital) data to solve real life problems or to bring a certain value called “Product Data”.
Data science is the extraction of knowledge from data sets. It uses techniques and concepts derived from many other broad areas of mathematics, especially mathematics, theoretical and information technology, including signal processing, probability models, machine learning, mathematical learning, computer programming, data engineering, pattern recognition and learning, vision, mathematical prediction, uncertain modeling, data retention, data compression and high computational performance.
What is the difference between Data Science, Big Data and Data Mining?
And (Difference between (Data Science) and (Big Data) is immediate.) Big Data is the discipline of processing and exploiting a large amount of data while in Data Science we do not define a constraint on the amount of data. It is therefore that we can use Big Data techniques in Data Science when the amount of our data to be processed becomes very large.
The difference between Data Mining and Data Science on the other hand is a little less obvious to the point that some people confuse the two. If there is any difference between these two terms, it is because Data Mining is a part of Data Science. Since Data Mining only consists of data mining, Data Science is broader since it takes into account data acquisition, for example.
This definition may seem vague but it comes from the fact that the discipline is wide and itself calls for several disciplines.
Fields involved in Data Science
It’s important to understand that the end goal of data science is to solve a problem in a specific area. That said, it is essential to have a very good knowledge of the application domain before embarking on the development of a model.
It should also be noted that the areas listed below do not represent an exhaustive list of disciplines involved in data science.) Indeed, (the end justifying themeans,)) we can do (data science) in various ways (as long as) we are in the (context presented above.)
(data science )-involves the following disciplines:
The field of application: The field of application should be understood to mean the sector (the environment) in which one wishes to produce a data product or solve a problem. This could be, for example, the stock market. If we want to build a predictive model for traders based on past stock prices.
Mathematics (Statistics, Probability, Linear Algebra, Analysis,…): Mathematics is heavily involved in data science. Indeed, problems are very often translated into mathematical models before being solved.
Computing is the basis of data science in the sense that models are implemented with code and / or computer tools. Since data is digital, its acquisition, storage and all processing is done using computers.
Machine learning: Machine learning techniques are increasingly used in data science.
Algorithmics: Mastery of this science is essential since all models are in the form of algorithms. It is important to understand concepts such as complexity.
Common sense 😉: Which is by far what you need most when faced with a complex problem.
Obviously, being a data scientist does not mean bei
ng an expert in all of these areas (although the more knowledge you have in these areas, the better). Indeed, a data science project is very often complex and consists of several stages. We can therefore find in a team people with different profiles, each in charge of a specific step.
Stages of a Data Science project
Long live CRISP-DM (standard used for a data science project)!
CRISP-DM method, Data Science, Data science
A data science project can be seen as a succession of the following steps:
Understanding the business problem
This is to rephrase the problem to make it as clear as possible. At the end of this step we should be able to know more or less which path to take throughout the project.
Raw data acquisition:
It’s about getting the data on which to work. This data is supposed to be identified from the first step.
The data obtained in the previous step is raw and unstructured. The purpose of this step is to clean them and structure them according to our needs.
It’s about setting up our model, our algorithm, our solution to solve the original problem.
This involves testing the effectiveness of our model and then moving on to the next step if we have satisfactory results or reconsidering the model from the previous step (or even going back to step 2).
This is the last phase, the conclusion that gives the solution to our problem.