Skills Needed for a DS/ML learner
Data Collection
Among various standard data collection, Raw data is usually obtained by doing surveys, web scraping, questioning, etc. Data Scientist and Machine Learning Engineer should have knowledge of these techniques.
Data Storage
Data collected, can be stored in CSV, JSON, spreadsheet, database or in various other formats. we must know how to deal with these data storage formats.
Data Cleaning
It is nothing but detecting and correcting inaccurate data points from the given dataset. data cleaning involves mostly manipulation of data, such as identifying incomplete parts. modifying, deleting the data. For this purpose, we have a pandas library.
Data Visualization
In order to understand the data well, one needs to imagine the scenario of data and that can be done by data visualization. another good word for this is exploratory data analysis. we can use matplotlib and seaborn libraries.
Data Transformation
Data Transformation is the process of changing the format, structure, or values of data. It may involve data smoothing, Data aggregation, Normalization, Generalization, and Attribute construction.
Python Programming
This Language is commonly used to streamline large complex data sets. Also, it is neat and suitable for data analysis and machine learning projects.
Data Modelling
DM is done in this field is to fir the data into various algorithms that work best for the problem statement. The process of data modeling means training a machine learning algo to predict from the features and using it for a business need.
Model Evaluation
After the data model is built, the performance of the model is measured by evaluating the model. This can be done by using various techniques such as Root Mean Squared Error, Accuracy, Confusion matrix.
Model Deployment
Model deployment means integrating the model to an existing production environment. Thus the model can take in input from the user and return an output.
Model Maintenance
The primary motive behind data modelling is to identify what led to an erroneous output for specific input and how it can be rectified.