Data science, in simple terms, can be defined as an interdisciplinary field that makes use of processes, systems, scientific methods, and algorithms to extract insights from structured and unstructured data.

Data science is applied in various sectors like health care recommendations, prediction of diseases, automating digital ads, real-time optimizing shipping, logistic routes and detection of frauds etc.

Data science jobs are one of the most highly paid jobs in the industry, and they make an average of $116,100 per year. You can find plenty of job opportunities as there is a shortage of skilled resources in this field.

We have listed the most commonly asked Data Science Interview Questions and Answers. Make sure you go through our full article so that you will not miss any of the Data Science Interview Questions and Answers.

Table of Contents

**Data Science Interview Questions and Answers**

**1. Can you explain the difference between long and wide format data? **

**Wide Format**: Here, the subject’s repeated responses will be defined in a single row, and each response is specified in a separate column.

**Long Format**: Here, each row is defined as a one-time point per subject.

**2. Explain Boltzmann Machine? **

Boltzmann Machine implements a simple learning algorithm that allows the user to discover interesting features which represent complex regularities in the training data. We use the Boltzmann Machine to optimize the quantities and weights for a given problem. They are helpful in solving two different computational problems.

**3. Explain Gradient Descent?**

Let us first know what a Gradient is:

**Gradient**: It is used to measure the changes in all the weights that are related to the change in the error. You can also imagine gradient as the slope of a function.

Gradient Descent can be defined as climbing down to the bottom of a valley instead of climbing up a hill. It is a minimization algorithm that minimizes the given function.

**4. What is an Auto-Encoder? **

Auto-Encoder aims to transform inputs to outputs with the minimum errors. This means that the user wants the output to be as close to input. Here we have to add a couple of layers in between the input and output and make sure the sizes of the layers are smaller than input layers. It receives an unlabelled input which is later encoded to reconstruct the input.

**5. Explain the Different Layers on CNN?**

We have four layers in CNN, namely,

- Convolution Layer: It performs convolution operations, creates several small picture windows to go over the data.
- ReLu Layer: This layer helps to bring non-linearity to the network and converts the negative pixels(all) to zero. The output obtained is a rectified feature map.
- Pooling Layer: It reduces the dimensionality of the feature map.
- Fully Connected Layer: It recognizes and classified the objects in the given image.

**6. Can you name a few Machine Learning libraries for various purposes?**

Few machine learning libraries for various purpose are listed below:

- TensorFlow
- NumPy
- SciPy
- Pandas
- Matplotlib
- Keras
- SciKit-Learn
- PyTorch
- Scrapy
- BeautifulSoup

**7. Explain Artificial Neural Networks? **

Artificial Neural Networks can be defined as a specific set of algorithms that have a revolutionized machine learning. These networks are inspired by Biological Neural Networks. The Neural Networks are adapted to changing inputs so that the network generates the best result without redesigning the output criteria.

**8. Name the different Deep Learning Frameworks? **

- Chainer
- Keras
- Caffe
- Pytorch
- TensorFlow
- Microsoft Cognitive Toolkit

**9. Explain Multi-layer Perceptron(MLP)?**

MLP(Multi-layer perceptron) is a class of ANN(Artificial Neuron Network). It mainly comprises the Input layer, Hidden layer, and Output layer. Each node, except the input node, makes use of a non-linear activation function.

MLP makes use of a supervised learning technique known as backpropagation for training. It is distinguished from linear perception because of its multiple layers and the non-linear activation function. It is used to distinguish the data that is not linearly separable.

**10. Explain the differences Between Epoch, Batch, and Iteration in Deep Learning?**

Epoch is used to represent one iteration over the whole data set.

**Batch**: Here, the data set is divided into several batches whenever we cannot pass the whole data set into the neural network at one shot.

**Iterations**: It can be defined as the number of batches of the data that the algorithm has seen.

**Data Science Interview Questions and Answers**

**11. Explain reinforcement learning? **

Reinforcement learning is defined as an area of Machine Learning. It is mainly about taking necessary action to maximize reward in a specific situation. It is employed by different software and machines to determine the best possible behavior or the path or way it should take in a given situation.

Some of the main points in Reinforcement learning are listed below:

- Input: The input should be defined as an initial state from which the model will start
- Output: There are much possible output as there are a variety of solution to a specified problem
- Training: The training is mainly based upon the input, The model returns a state, and the user has to decide to reward or punish the model based on the output.
- The model continues to learn.
- The best solution is then decided based on the maximum rewards.

**12. What are Vanishing Gradients?**

Vanishing gradients usually occur while training the deep neural networks using a gradient-based optimization method. It occurs mainly due to the nature of the backpropagation algorithm that is used to train the neural network.

**13. Explain Recurrent Neural Networks(RNNs)?**

A recurrent neural network is defined as a neural network that is specialized for processing the sequence of data x(t)= x(1), . . . , x(τ) with the time step index t that ranges from 1 to τ. For the given tasks that involve sequential inputs, like speech and language, it is better to use RNNs.

RNNs are also called recurrent because they perform the same specified task for every element of the sequence, with the output that is dependent on the previous computations.

**14. Explain the variants of Back Propagation?**

- Stochastic Gradient Descent: Here, we make use of a single training example for the calculation of gradient and update parameters.
- Batch Gradient Descent: Here, we calculate the gradient for the entire dataset, and we perform the update at each iteration.
- Mini-batch Gradient Descent: It is one of the most known optimization algorithms. It’s a variant of Stochastic Gradient Descent, and here instead of a single training example, a mini-batch of samples is used.

**15. Explain Linear Regression?**

Linear regression makes use of the least square method. The concept here is to draw a line through all the plotted data points. The line is positioned in such a way that it minimizes the distance to all of the data points. The distance is known as “residuals” or “errors.”

**16. Explain pruning in the Decision Tree? **

Pruning is defined as a data compression technique in machine learning and search algorithms that can reduce the size of the given decision trees by removing the parts or sections of the tree that are non-critical and redundant to classify instances.

Pruning helps in reducing the complexity of the final classifier and thereby improves predictive accuracy by the reduction of overfitting.

**17. Name the different kernels in SVM?**

We have four types of kernels in SVM, namely,

- Linear Kernel
- Polynomial kernel
- Radial basis kernel
- Sigmoid kernel

**18. Can you tell us the drawbacks of the Linear Model?**

A few of the drawbacks of the linear model are:

- The predictions of linearity between independent and dependent variables
- It cannot be used for count outcomes or binary outcomes
- There are many overfitting problems that it can’t solve.

**19. Can you explain the decision Tree algorithm in detail? **

The Decision Tree algorithm is an algorithm that belongs to the family of supervised learning algorithms. Unlike the other supervised learning algorithms, the decision tree algorithm is used for solving classification problems and regression problems.

In Decision Trees, for predicting a specified class label for a record, we have to start from the root of the tree. We have to compare the values of the root attribute with the record’s attribute. On the basis of comparisons, we have to follow the branch related to that value and jump to the next node.

The main goal of using a Decision Tree is to create a training model that can be used to predict the value or class of the target variable by learning the simple decision rules inferred from training data(prior data).

**20. What are Entropy and Information gain in the Decision tree algorithm? **

Entropy: The decision tree is built top-down from a root node and involves partitioning the given data into the subsets that consist of instances with similar values. ID3 algorithm makes use of entropy to calculate the homogeneity of the given sample. If the collected sample is completely homogeneous, then the entropy is zero, and if the sample is equally divided, it has an entropy of one.

Information gain: The information gain is mainly based on the decrease in the entropy after the dataset is split on an attribute. Constructing a decision tree is about finding an attribute that returns the highest information gain.

Gain(T, X) = Entropy(T) – Entropy(T,X)

## Data Science Interview Questions and Answers

### 21. What is Collaborative filtering?

Collaborative filtering can be defined as the process of filtering for information or patterns by using techniques involving collaboration among multiple agents, data sources, viewpoints, etc.

Applications of collaborative filtering basically involve very large data sets.

Collaborative filtering methods have been applied to various kinds of data, including sensing and monitoring data, like mineral exploration, environmental sensing over large areas, or multiple sensors.

### 22. What are Recommender Systems? Explain?

A recommender system is also known as a recommendation system, is a subclass of the information filtering system that predicts the rating or preference that a user would give to an item.

Recommender systems are most widely used in movies, research articles, social tags, news, music, products, etc.

Recommender systems are also popular for specific topics like restaurants and online dating.

### 23. What is Selection Bias?

Selection bias is the bias that is introduced by the selection of individuals, groups, or data for analysis in a way where proper randomization is not achieved, hence ensuring that the given sample that is obtained is not representative of the population intended to be analyzed. It is also referred to as the selection effect. It is the distortion of statistical analysis that is resulting from the method of collecting the samples. If we do not take the selection bias into consideration, then some conclusions of the study may not be accurate.

The types of selection bias are:

- Sampling bias: It is defined as a systematic error that has occurred due to a non-random sample of a population that causes few members of the population to be less likely to include than others resulting in a biased sample.
- Time interval: A trial can be terminated early at an extreme value, but the major value is reached by the variable with the largest variance, even if all variables have a related mean.
- Data: When the specific subsets of data are chosen to support the conclusion or rejection of the bad data on arbitrary grounds, instead of according to previously stated or generally agreed criteria.
- Attrition: Attrition bias is defined as a kind of selection bias that is caused by the loss of participants.

### 24. Can you write a function that takes in two sorted lists and outputs a sorted list that is their union?

```
# Python3 code to demonstrate
# to combine two sorted list
# using sorted()
# initializing lists
test_list1 = [1, 12, 2, 9, 11]
test_list2 = [3, 4, 7, 8, 10]
# printing the original lists
print ("The original list 1 is as follows : " + str(test_list1))
print ("The original list 2 is as follows : " + str(test_list2))
# using sorted()
# to combine two sorted lists
res = sorted(test_list1 + test_list2)
# printing result
print ("The combined sorted list is as follows: " + str(res))
```

```
Output:
The original list 1 is as follows : [1, 12, 2, 9, 11]
The original list 2 is as follows: [3, 4, 7, 8, 10]
The combined sorted list is as follows : [1,2, 3, 4, 5, 7, 8, 9, 10, 11,12]
```

### 25. What is dimensionality reduction?

The number of given input variables or features for a dataset is known as dimensionality. Dimensionality reduction is a technique or process that reduces the number of input variables in a particular dataset.

### 26. What is a confusion matrix?

It is defined as a performance measurement for a machine learning classification problem where the output can be two or more classes. It is basically a table with four different combinations of the predicted and the actual values.

It is mainly useful for measuring Recall, Accuracy, Precision, Specificity, and, importantly AUC-ROC Curve.

True positive: Here, it denotes all of those records where the given actual values are true, and even the predicted values are also true. Hence it denotes all true positives.

False Negative: It denotes all of those records where the given actual values are true, but the given predicted values are false.

False Positive: Here, the given actual values are false, but the given predicted values are true.

True Negative: Here, the given actual values are false, and the given predicted values are also false.

### 27. Can you explain TF/IDF vectorization?

TF-IDF is abbreviated as Term Frequency/ Inverse Document Frequency, which is a very popular algorithm to transform the given text into a meaningful representation of numbers which is then used to fit into a machine algorithm for prediction.

### 28. Can you write a function that, when called with a confusion matrix for a binary classification model, returns a dictionary with its precision and recall?

```
def calculate_precsion_and_recall(matrix):
true_positive = matrix[0][0]
false_positive = matrix[0][1]
false_negative = matrix[1][0]
return {
'precision': (true_positive) / (true_positive + false_positive),
'recall': (true_positive) / (true_positive + false_negative)
}
```

### 29. Can you write the code to calculate the accuracy of a binary classification algorithm using its confusion matrix?

```
def accuracy_score(matrix):
true_positives = matrix[0][0]
true_negatives = matrix[1][1]
total_observations = sum(matrix[0]) + sum(matrix[1])
return (true_positives + true_negatives) / total_observations
```

### 30. Can you explain stacking in Data Science?

Model stacking is defined as an efficient ensemble method where the predictions generated by using different machine learning algorithms can be used as inputs in the second-layer learning algorithm. This second-layer algorithm is then trained to optimally combine the model predictions to form a new set of predictions.

### Data Science Interview Questions and Answers

### 31. Can you explain content-based filtering in recommender systems?

Content-based filtering uses the item features to recommend other similar items that the user likes, based on their previous actions or the explicit feedback.

### 32. Explain how to handle missing data in data science?

When dealing with the missing data, data scientists make use of two primary methods in order to solve the error.

The imputation method develops a reasonable guess for the missing data. It is mostly used when the percentage of missing data is low. If the portion of the missing data is very high, then the results lack the natural variation that results in an effective model.

The next option is to remove the data. When we are dealing with the data that is missing at random, correspondent data can be deleted to reduce the bias. Removing data is not the best option if there are not enough observations to result in a reliable analysis. In certain situations, observation of particular events or factors may be required.

### 33. Explain the differences between an error and a residual error?

An error is defined as the differences between the observed values and the true values.

A residual is defined as the differences between the observed values and the predicted values (by the model).

The error is a theoretical concept that is never observed, whereas the residual is a real-world value that is calculated each time a regression is done.

### 34. Can you explain the SVM algorithm in detail?

Support Vector Machine, i.e. (SVM), is defined as a supervised machine learning algorithm that can be used for classification or regression challenges. It is commonly used in classification problems. In the SVM algorithm, we have to plot each data item as a point in n-dimensional space (where n denotes the number of features you possess) with the value of each feature that is being the value of a specific coordinate. Next, we have to perform the classification by finding the hyper-plane that distinguishes the two classes very well.

### 35. What is Precision?

Precision in data science can be defined as the number of true positives that are divided by the number(n) of true positives plus the number of false positives.

### 36. What is Deep Learning?

Deep learning is defined as a subset of machine learning in which the data goes through various non-linear transformations to obtain a specified output. Deep here refers to multiple steps in this case. The output obtained in one step is the input for another step, and this is done continuously to get the specified final output.

Deep learning is also called deep neural networks(DNN) because it uses multi-layered artificial neural networks to apply deep learning.

### 37. What is the benefit of dimensionality reduction?

The benefits of dimensionality reduction are listed below:

- It is used to reduce the required time and storage space.
- The removal of multicollinearity by dimensionality reduction improves the interpretation of parameters of the machine learning model.
- It has made it easy to visualize the data whenever it is reduced to a very low dimension such as 2D or 3D.
- It removes the noise, thus provides a simpler explanation.
- It mitigates the “curse of dimensionality.”

### 38. What is the ROC curve?

A receiver operating characteristic curve known as the ROC curve is defined as the graphical plot that demonstrates the diagnostic ability of a binary classifier system as its discrimination threshold is differing. This method was developed for operators of military radar receivers, which is why it is named as ROC curve.

### 39. What is a normal distribution?

The normal distribution can be defined as the core concept in statistics. It is the backbone of data science. While we perform the exploratory data analysis, first we explore the data and then aim to find its probability distribution, right? The most commonly and popularly used probability distribution is Normal Distribution.

The normal distribution is in the form of a bell-shaped curve where the distribution here has its mean equal to the median.

### Data Science Interview Questions and Answers

### 40. Explain k-fold cross-validation?

K-fold cross-validation is a way to improve the holdout method. This method is used to guarantee the score of our model that does not depend on the way that we picked the train and test set. The data set here is divided into k number of subsets, and the holdout method has to be repeated k number of times. It is used to evaluate the machine learning models on a limited given data sample

### 41. Can you explain why we have to use the summary function?

Summary functions are used to produce a summary of all records that are found in the dataset or the sub summary values for the records in various groups. Formulas can contain multiple summary functions. Compared to other functions, Summary functions calculate more slowly because they generate values for a range of records.

### 42. Why do we use p-value?

A p-value is defined as a measure of the probability that an observed difference could have occurred just by random chance. P-value is used as an alternative to or in addition to pre-selected confidence levels for hypothesis testing.

### 43. Can you explain the kernel function in SVM?

SVM algorithms make use of a set of mathematical functions that are known as the kernel. The function of the kernel is to take the data as input and transform the data into the required form—for example, linear, nonlinear, radial basis function (RBF), polynomial, and sigmoid.

### 44. Explain the skills that are important to become a certified Data Scientist?

The skills that a certified data scientist should possess are listed below:

- Fundamentals of Data Science
- Good command of Statistics. Statistics is defined as the grammar of data science.
- Sound Programming knowledge: it provides a way to communicate with the machine language.
- Data Manipulation and Analysis
- Data Visualization: one has to be familiar with plots like Histogram, pie charts, Bar charts and then move to the advanced charts like waterfall charts, thermometer charts, etc
- Machine Learning: It is used to build predictive models, and it is one of the core skills that a data scientist should possess.
- Deep Learning
- Big Data: Due to the large amount of data that is driven by the internet, we are trying to handle this data by adopting Big Data Technology so that this data is stored properly and efficiently and used whenever needed
- Software Engineering
- Model Deployment: It is one of the most under-rated steps in the machine learning lifecycle
- Communication Skills
- Storytelling Skills: It is the utmost important acquired skill by a data scientist.
- Structured Thinking: A Data Scientist should always look at the problems from different perspectives.
- Curiosity: One should have the curiosity to learn more and discover new things.

### 45. What is the full form of LSTM? Explain its function?

The full form of LSTM is Long Short-term Memory. LSTM is defined as an artificial recurrent neural network (RNN) architecture that is used in the field of deep learning. LSTM has feedback connections that are not like standard feedforward neural networks.

LSTM can not only process the single data points like images but also the entire sequences of data like speech or video.

For example, LSTM is applicable to tasks like unsegmented, connected, and anomaly detection in network traffic or the IDS, i.e. intrusion detection systems or Handwriting recognition.

### 46. What is the term variance in Data Science?

Variance in data science can be defined as a numerical value that shows how large the individual figures in a set or group of data distribute among themselves about the mean and thus specifies the differences of each value in the dataset from the mean value.

### 47. What is the cost function in Data science?

The cost function in Data science is a function that is used to measure the performance of the Machine Learning model for any given data. The Cost Function quantifies the error between the predicted values and the expected values and finally presents it in a single real number.

### 48. Can you explain the term Logistic Regression?

Logistic regression in data science is a classification algorithm that is used to assign observations to the discrete group of classes. A few of the examples of classification problems are Online transactions, Fraud or not Fraud, Tumor Malignant or Benign, Email spam or not spam. Logistic regression transforms its output by using the logistic sigmoid function in order to return the probability value.

### 49. Explain the term Random forest model?

Random forest is defined as a supervised learning algorithm. The forest that it builds is defined as an ensemble of decision trees that are usually trained with the bagging method. The general idea behind the bagging method is a combination of learning models that increases the overall result

### 50. Explain the bias-variance trade-off in Data Science?

- Bias is defined as the simplifying assumptions that are made by the model to make the target function easy to approximate.
- Variance is defined as the amount that the estimate of the target function will be changing given with different training data.
- The trade-off is defined as the tension between the error that is introduced by the bias and the variance.

### Data Science Interview Questions and Answers

### 51. Can you explain Univariate analysis?

Univariate analysis can be defined as the most basic form of the statistical data analysis technique. When the data or information contains only one variable and does not deal with the cause or effect of the relationship, then we make use of the Univariate analysis technique.

For example, in a survey, the researcher may be looking to count the number of adults and kids. In this example, the data reflect the number (a single variable) and its quantity, as shown in the below table.

The objective of Univariate analysis is to simply describe the data to find the patterns within the data. Here, it is being done by looking into the mean, median, mode, dispersion, variance, range, standard deviation, etc.

The Univariate analysis is conducted in several ways, which are mostly descriptive in nature.

- Frequency Distribution Tables
- Histograms
- Frequency Polygons
- Pie Charts
- Bar Charts

### 52. Can you explain Bivariate analysis?

Bivariate analysis is a bit more analytical than Univariate analysis. When the data set consists of two variables, and the researchers aim to undertake the comparisons between the two data set, then we can go with the Bivariate analysis.

For example, in a survey, the researcher may be looking to analyze the ratio of students who scored above 95% related to their genders. In this case, we have two variables namely, gender = X (independent variable) and result = Y (dependent variable). The bivariate analysis will then measure the correlations between the two variables, as shown in the table below.

### 53. Can you explain Multivariate analysis?

Multivariate analysis can be defined as a more complex form of the statistical analysis technique, and it is mostly used when there are multiple variables in the data set.

### 54. Can you name the Commonly used multivariate analysis technique?

The most commonly used multivariate analysis techniques are listed below:

- Factor Analysis
- Cluster Analysis
- Variance Analysis
- Discriminant Analysis
- Multidimensional Scaling
- Principal Component Analysis
- Redundancy Analysis

### 55. Explain Regression analysis?

Regression analysis is mainly used for estimating the relationships between two different variables. It includes the techniques for modeling and analyzing multiple variables when the focus is made on the interrelation between the dependent variable and one or more(multiple) independent variables.

It helps us to understand how the value of the dependent variable is changed when any one of the independent variables is changed.

It is mainly used for advanced data modeling purposes such as prediction and forecasting.

A few of the regression techniques used are listed below:

- Linear regression
- Simple regression
- Polynomial regression
- General linear model
- Discrete choice
- Binomial regression
- Binary regression
- Logistic regression

### 56. How is Data modeling different from Database design? Explain?

The Data Model is defined as a set of abstraction mechanisms used to represent the part of the reality to build a database. For example, in the Entity-Relationship Data Model, we can represent the reality with Entities and the Relationships between them; in the Object-Oriented Data Model we can represent the reality through Objects and the related mechanisms of Aggregation Class and Inheritance; in the Relational Data Model, the reality is represented through tables with the help of keys, foreign keys and other types of constraints, etc.

The Database Model is the name of the model of reality, built with a specific Data Model, which means it is related to a particular schema in a certain Database Management System that represents a specific reality. For example, in a Database Model for a school, you have the entities Students, Faculty, with several other associations among them, and each of them contains a certain set of attributes.

### 57. Can you explain how Data Science and Machine Learning related to each other?

Data science is a field that aims to make use of a scientific approach to extract the meaning and insights from the given data. In simple terms, Data science is defined as a combination of information technology, business management, and modeling.

Machine learning refers to a group of techniques that are used by data scientists, which allow computers to learn from the data. These techniques are designed to produce results in such a way that they perform well without explicit programming rules.

### 58. Can you tell us the full form of GAN? Explain GAN?

The full form of GAN is as follows: Generative Adversarial Network. It is an exciting new innovation in machine learning. GANs are defined as generative models that create new data instances that are similar to the training data.

For example, GANs create images that will look like photographs of human faces, even though the faces do not belong to any person in reality.

### 59. What is the term Ensemble learning in Machine Learning?

Ensemble methods can be defined as machine learning techniques that are used to combine several base models to produce one optimal predictive model.

### 60. Explain the term Activation function?

In the neural networks, the activation function is used for transforming the summed weighted input from the given node into the activation of the node or output for that input. Here, the rectified linear activation function helps to overcome the vanishing gradient problem, thus allowing the models to perform better.

Types of activation functions are listed below:

Step Function: It is the simplest kind of activation function.

Here, we should consider the threshold value, and if the value of net input, for example, y, is greater than that of the threshold, then we activate the neuron.

Mathematically it is represented as:

f(x) = 1, if x>=0

f(x) = 0, if x<0

Sigmoid Function: It is defined as

ReLu:It is defined as f(x)= max(0,x)

Leaky ReLU: It is defined as

f(x) = ax, x<0

f(x) = x, otherwise

### Data Science Interview Questions and Answers

### 61. Explain the term Batch normalization in Data Science?

The idea here is, instead of just normalizing the inputs to the network, we usually normalize the inputs to the layers within or inside the network, known as batch normalization because, during the training, we usually normalize each layer’s inputs by making use of the mean and variance of the values in the present mini-batch.

### 62. Explain about Autoencoders?

Autoencoder is defined as an unsupervised artificial neural network that learns how to accurately compress and encode the data and then learns how to reconstruct the data back from the miniaturized encoded representation to the representation that is close enough to the original input as possible.

They are used for either dimensionality reduction or as a generative model, which means that they can generate new data from the given input data.

### 63. Name the different kinds of Ensemble learning?

The different kinds of Ensemble learning are given below:

- Bayes optimal classifier
- Bootstrap aggregating
- Boosting. Main article: Boosting
- Bayesian model averaging
- Bayesian model combination
- Bucket of models
- Stacking
- Remote sensing

### 64. Can you explain the role of data cleaning in data analysis?

Data cleaning can be defined as the process of preparing the data for analysis by modifying or removing the data which is incorrect, irrelevant, incomplete, duplicated, or improperly formatted. This data is usually not helpful when it comes to analyzing the data because it hinders the process, and it provides inaccurate or false results.

### 65. Explain the term hyperparameters?

In machine learning, a hyperparameter can be defined as a parameter whose value controls the learning process. In contrast, the values of other parameters are derived through training.

### 66. Explain the different steps in LSTM?

The different steps in LSTM are listed below:

- Define Network: The Neural networks that are defined in Keras are in a sequence of layers. The container for these layers is present in the Sequential class. The first step here is to create an instance of the Sequential class. Then we have to create the layers, and we should add them in order so that they are connected.
- Compile Network: Compilation here is an efficient step. It is used to transform the simple sequence of layers that are defined into a highly efficient series of matrix transforms into a format that is executed in your GPU or CPU, depending on Keras configuration.
- Fit Network: Once we compile the network, it can be fit, which means adapting the weights on a training dataset.
- Evaluate Network: Once the network is trained, then it has to be evaluated. The network is evaluated on the training data where it does not provide a useful indication of the performance of the network as a predictive model.
- Make Predictions: When we are satisfied with the performance of the fit model, then we can use it to make predictions on newly established data. It is done easily by calling the predict() function.

### 67. Can you make a comparison between the validation set and the test set?

A validation set is used to select the proper parameters of the system. It is a part of the training set.

The test set is used to test and say the accuracy of the system.

### 68. Could you please draw a comparison between overfitting and underfitting?

Overfitting is related to a model that models the training data too well. Overfitting usually happens when the model learns the detail and noise in the training data to a certain extent that has a negative impact on the performance of the model on new data.

Meaning that the noise or the random fluctuations present in the training data are picked up, and they are learned as concepts by the model.

The problem here is that these concepts don’t apply to the new data, and they have a negative impact on the model’s ability to generalize.

Underfitting can be referred to as a model that neither models the training data nor it can generalize to the new data. An underfit machine learning model is not considered a suitable model, and it will have poor performance on the training data.

### 69. Can you please explain the various steps involved in an analytic project?

There are seven fundamental steps to complete a Data Analytics project, and they are listed below:

- Understand the Business
- Get Your Data
- Explore and Clean Your Data
- Enrich Your Dataset.
- Build Helpful Visualizations
- Get Predictive
- Iterate, Iterate, Iterate.

### 70. Can you explain Eigenvectors and Eigenvalues?

Eigenvalues and eigenvectors are the basics of computing and mathematics. They are frequently used by scientists.

Eigenvectors are defined as unit vectors, which specify that their length or magnitude is equal to 1.

Eigenvalues are defined as the coefficients that are applied to the eigenvectors that give the vectors their required length or magnitude.

### Data Science Interview Questions and Answers

### 71. Explain the goal of A/B Testing?

A/B Testing is defined as statistical hypothesis testing that is meant for a randomized experiment that has two variables, A and B. The main goal of A/B Testing is to maximize the possibility of the outcome of some interest by identifying if there are any changes to a webpage. The A/B Testing is employed for testing everything, ranging from sales emails to website copy, and search ads.

### 72. Explain the terms cluster sampling and systematic sampling?

Systematic sampling selects the random starting point from the given population, and then a sample is taken from the regular fixed intervals of the given population depending on its size.

Cluster sampling usually divides the population into clusters, and then it takes a simple random sample from each of the clusters.

We have two types of cluster sampling:

- one-stage cluster sampling
- two-stage cluster sampling.

### 73. What are tensors?

Tensors are defined as a type of data structure that is used in linear algebra, and like vectors and matrices, One can calculate the arithmetic operations with tensors.

They are a generalization of matrices, and they are represented using the n-dimensional arrays.

### 74. Explain outlier values and how do you treat them?

The Outlier values are defined as the data points in the statistics that do not belong to any certain population. An outlier value is defined as an abnormal observation that is different from other values that are belonging to the set.

To deal with outlier,s you have to follow the steps:

- You have to set up a filter in your testing tool
- Remove or change outliers during post-test analysis
- Change the value of outliers
- Consider the underlying distribution
- Consider the value of mild outliers

### 75. Name the vital components of GAN?

The vital components of GAN are listed below:

- Generator
- Discriminator

### 76. Can you explain the difference between Batch and Stochastic Gradient Descent?

Batch Gradient Descent | Stochastic Gradient Descent |
---|---|

The volume is large for analysis purposes. | The volume is lesser for analysis purposes compared to Batch. |

It updates the weight slowly. | It updates the weight more frequently. |

It helps in computing the gradient using the complete data set that is available. | It helps in computing the gradient using only a single sample. |

### 77. Python or R Which among them would you prefer for text analytics?

Python because of its Pandas library that provides easy-to-use data structures and it provides high-performance data analysis tools.

### 78. What is the Computational Graph?

A computational graph is a way that represents the mathematical function in the language of graph theory. Nodes here are the input values or functions to combine them; as data flows through the graph, the edges receive their respective weights.

### 79. Explain the terms Interpolation and Extrapolation?

Extrapolation is defined as an estimation of the value based on extending the known sequence of values or the facts beyond the area that is certainly known.

Interpolation is an estimation of the value within the two known values in the sequence of values.

### 80. Can you explain what P-value signifies about the statistical data?

- If P-Value > 0.05, then it denotes the weak evidence against the null hypothesis, which means you cannot reject the null hypothesis.
- If P-value <= 0.05, then it denotes a piece of strong evidence against the null hypothesis, which you can reject the NULL hypothesis.
- If P-value=0.05 then, it is the marginal value indicating that it is possible to go either way.

### Data Science Interview Questions and Answers

### 81. Can you explain the box cox transformation in regression models?

The main purpose of the Box-Cox transformations in regression is not to make the variables in the regression follow the normal distribution rather make the effects of the variables additive.

A Box cox transformation can be defined as a statistical technique that transforms the non-morula dependent variables into the normal shape. If the data that is given is not normal, then most of the statistical techniques assume it as normal. Applying the box cox transformation indicates that you can run a broader number of tests.

### 82. Can you tell us the advantages and disadvantages of using regularization methods like Ridge Regression?

Advantages of using Ridge Regression are:

- You can avoid overfitting the model.
- They do not require unbiased estimators.
- They add enough bias to make the estimates reasonably reliable approximations to true population values.
- They still perform well in cases of large multivariate data with the number of predictors which is greater than the number of observations.

Disadvantages of Ridge regression are:

- IT includes all the predictors that are made in the final model.
- They are not able to perform feature selection.
- They shrink the coefficients towards zero.
- They trade the variance for bias.

### 83. How to assess a good logistic model?

- You can make use of the Classification Matrix to look at the true negatives and false positives.
- A concordance helps to identify the ability of the logistic model to differentiate between the event happening and not happening event.
- Lift helps us to assess the logistic model by comparing it with some random selection.

### 84. Explain multicollinearity and how you can overcome it?

Multicollinearity happens when the independent variables in a regression model are correlated. Here, the correlation becomes a problem because the independent variables should be independent.

The below mentioned are the fixes to multicollinearity:

- The severity of the problems increases with the degree of multicollinearity. Therefore, make sure you have only moderate multicollinearity so that you may not need to resolve it.
- Multicollinearity affects only the specific independent variables that are interrelated. Thus, if multicollinearity is not present for independent variables that you are particularly interested in, then there is no need to resolve it.
- Multicollinearity affects the coefficients and the p-values, but it does not have its influence over the predictions, precision of the predictions. If your main goal is to make predictions, and you don’t have to understand the role of each independent variable, you don’t have to reduce severe multicollinearity.

### 85. Can you differentiate between func and func()?

func | func() |
---|---|

A function can be defined as a block of code to carry out a specific task. | It is associated with objects/classes. |

def functionName( arg1, arg2,….): ……. # Function_body …….. | class ClassName: def method_name(): ………….. # Method_body ……………… |

### 86. What do you understand by the term pickling in Python?

Pickling is defined as the process where the Python object hierarchy is being converted into a byte stream, and unpickling is defined as the inverse operation, where a byte stream is converted back into an object hierarchy.

### 87. Name the different ranking algorithms?

Learning to Rank (LTR) is defined as a class of techniques that usually apply supervised machine learning (ML) to solve ranking problems.

The different ranking algorithms are listed below:

RankNet: The cost functions for RankNet aims to minimize the number of inversions in the ranking. The inversion here means an incorrect order among the pair of results, i.e., when we rank a lower-rated result above a higher rated result in a ranked list. It optimizes the cost function using the Stochastic Gradient Descent.

LambdaRank: Here, you do not need the costs. You need only the gradients (λ) of the cost with respect to the model score. We think of these gradients as little arrows that are attached to each document in the ranked list, thus indicating the direction we would like those documents to move.

LambdaMart: It is a combination of LambdaRank and MART, i.e., Multiple Additive Regression Trees. Where the MART uses gradient boosted decision trees for the prediction tasks, LambdaMART makes use of the gradient boosted decision trees using a cost function that is derived from LambdaRank for solving the ranking task. On the basis of experimental datasets, LambdaMART has shown better performance than the LambdaRank and the original RankNet.

### 88. Can you differentiate between a box plot and a histogram?

Histograms and box plots are the graphical representations for the frequency of numeric data values.

Their main purpose is to describe the data or information and explore the central tendency and variability before using advanced statistical analysis techniques.

Histograms are usually the bar charts that show us the frequency of a numerical variable’s values, and they are used to approximate the probability distribution of the variable. It allows us to quickly understand the shape of the distribution, the potential outliers, and variation.

Boxplots are used to communicate different aspects of the distribution of data.

### 89. What is cross-validation?

Cross-validation is defined as a technique used for assessing how the statistical analysis generalizes to the independent data set. It is a technique used for evaluating machine learning models by training several models on the given subsets of the available input information and evaluating them on the basis of a complementary subset of the data.

### 90. How to define or select metrics?

The metrics depend on various factors like:

- Is it a regression or classification task?
- What is your business objective?
- What would be the distribution of the target variable?

### Data Science Interview Questions and Answers

### 90. Explain the term NLP?

NLP means Natural Language Processing. It is a subfield of linguistics, artificial intelligence, and computer science that are concerned with the interactions between computers and the human language, particularly how to program the computers to process and analyze huge amounts of natural language information.

### 91. Explain the advantages of Dimensionality Reduction?

The advantages of dimensionality reduction are listed below:

- It reduces the computation time.
- It takes care of the multicollinearity, which improves the model performance
- It also helps to remove the redundant features,
- It fastens the time required for performing similar computations.

### 92. What is a kernel?

The kernel is usually referred to as the kernel trick, a method that uses a linear classifier to solve the non-linear problem. It helps to transform linearly inseparable data to linearly separable ones.

### 93. Explain the term boosting?

In machine learning, boosting is a concept that is an ensemble meta-algorithm for primarily reducing bias and also the variance in supervised learning. It belongs to a family of machine learning algorithms that converts the weak learners to stronger ones.

### 94. Can you describe Markov chains?

A Markov chain is defined as a stochastic model that describes a sequence of possible events where the probability of each event depends mainly on the state that is attained in the previous event.

### 95. Define “Central Limit Theorem”?

“The central limit theorem defines that if we have a population with mean μ and a standard deviation σ which can take sufficiently large random samples from the given population with replacement, then the dispensation of the sample means will be approximately normally distributed.”

### 96. Explain the term statistical power?

Statistical power refers to the power of a hypothesis test, which is defined as the probability that the test would correctly reject the null hypothesis. Here, it Is the probability of a true positive result. It is helpful only when the null hypothesis is rejected.

### 97. Can you name the three types of biases that can occur during sampling?

- Selection bias
- Under coverage bias
- Survivorship bias

### 98. What is bias?

In Data Science, bias is defined as a deviation from expectation in the given data. In simple terms, bias refers to an error in the data. But, the error often goes unnoticed.

### 99. Can you explain ‘Naive’ in a Naive Bayes algorithm?

The Naive Bayes Algorithm model is mostly based on the Bayes Theorem. It specifies the probability of an event. It is based on prior knowledge of conditions that might be related to that specified event.

### 100. What is Back Propagation?

Back-propagation is the essence of any neural net training. It is the method that tunes the weights of a neural net that depends upon the error rate that is obtained in the previous epoch. Proper tuning helps us to reduce error rates and to make them a more reliable model by increasing its generalization.

## Frequently Asked Questions

### Name the commonly used supervised learning algorithms?

decision trees, logistic regression, support vector machine

## Conclusion

Good luck with your Data Science interview, and we hope our Data Science Interview Questions and Answers were of some help to you. You can also check out our** Call Center Interview Questions and Answers, **which might help you.