Skills Needed for a DS/ML learner

  1. Data Collection

    Among various standard data collection, Raw data is usually obtained by doing surveys, web scraping, questioning, etc. Data Scientist and Machine Learning Engineer should have knowledge of these techniques.

  2. Data Storage

    Data collected, can be stored in CSV, JSON, spreadsheet, database or in various other formats. we must know how to deal with these data storage formats.

  3. Data Cleaning

    It is nothing but detecting and correcting inaccurate data points from the given dataset. data cleaning involves mostly manipulation of data, such as identifying incomplete parts. modifying, deleting the data. For this purpose, we have a pandas library.

  4. Data Visualization

    In order to understand the data well, one needs to imagine the scenario of data and that can be done by data visualization. another good word for this is exploratory data analysis. we can use matplotlib and seaborn libraries.

  5. Data Transformation

    Data Transformation is the process of changing the format, structure, or values of data. It may involve data smoothing, Data aggregation, Normalization, Generalization, and Attribute construction.

  6. Python Programming

    This Language is commonly used to streamline large complex data sets. Also, it is neat and suitable for data analysis and machine learning projects.

  7. Data Modelling

    DM is done in this field is to fir the data into various algorithms that work best for the problem statement. The process of data modeling means training a machine learning algo to predict from the features and using it for a business need.

  8. Model Evaluation

    After the data model is built, the performance of the model is measured by evaluating the model. This can be done by using various techniques such as Root Mean Squared Error, Accuracy, Confusion matrix.

  9. Model Deployment

    Model deployment means integrating the model to an existing production environment. Thus the model can take in input from the user and return an output.

  10. Model Maintenance

    The primary motive behind data modelling is to identify what led to an erroneous output for specific input and how it can be rectified.