data science banner

Data Science Multiple Choice Questions (MCQs) and Answers

Master Data Science with Practice MCQs. Explore our curated collection of Multiple Choice Questions. Ideal for placement and interview preparation, our questions range from basic to advanced, ensuring comprehensive coverage of Data Science concepts. Begin your placement preparation journey now!

Q61

Q61 A pie chart in Matplotlib displays incorrect proportions. What could be the issue?

A

Wrong data labels

B

Missing data

C

Incorrect sum of values

D

Invalid chart type

Q62

Q62 A scatter plot shows overlapping points, making it hard to interpret. What can improve its readability?

A

Increase marker size

B

Add jitter

C

Use smaller axes

D

Change chart type

Q63

Q63 A line chart is difficult to interpret due to too many data points. What is the best approach to simplify it?

A

Aggregate data

B

Remove the chart

C

Use larger axes

D

Switch to bar chart

Q64

Q64 What is the primary objective of machine learning?

A

To clean data

B

To make predictions based on data

C

To create databases

D

To improve hardware

Q65

Q65 Which of the following is a supervised learning algorithm?

A

K-Means

B

Decision Trees

C

DBSCAN

D

Principal Component Analysis

Q66

Q66 What is overfitting in machine learning?

A

Model performs poorly on training data

B

Model performs well on training data but poorly on new data

C

Model is too simple

D

Model has no bias

Q67

Q67 What is the purpose of a loss function in machine learning?

A

To evaluate model predictions

B

To split datasets

C

To improve visualization

D

To standardize data

Q68

Q68 Why is it important to split data into training and testing datasets?

A

To increase dataset size

B

To evaluate model performance on unseen data

C

To clean data

D

To preprocess features

Q69

Q69 Which Python library provides the train_test_split function?

A

NumPy

B

Pandas

C

scikit-learn

D

Matplotlib

Q70

Q70 How do you train a linear regression model using scikit-learn?

A

model.fit(X, y)

B

model.train(X, y)

C

model.learn(X, y)

D

model.predict(X, y)

Q71

Q71 Which scikit-learn function is used to calculate the accuracy of a classification model?

A

classification_report

B

accuracy_score

C

score

D

confusion_matrix

Q72

Q72 A model's predictions have high bias. What could be the likely issue?

A

Overfitting

B

Underfitting

C

Feature scaling

D

Incorrect testing data

Q73

Q73 A classification model achieves 99% accuracy on the training set but only 60% on the test set. What is the issue?

A

Overfitting

B

Underfitting

C

Data imbalance

D

Feature scaling

Q74

Q74 After training a regression model, the residuals show a clear pattern. What does this imply?

A

Model is accurate

B

Model assumptions are violated

C

Feature scaling is wrong

D

Data is balanced

Q75

Q75 What is the key difference between supervised and unsupervised learning?

A

Supervised uses labeled data, unsupervised does not

B

Both use labeled data

C

Both use unlabeled data

D

Unsupervised requires labels

Q76

Q76 Which of the following is an example of a supervised learning algorithm?

A

K-Means

B

Linear Regression

C

Hierarchical Clustering

D

PCA

Q77

Q77 Which task is best suited for unsupervised learning?

A

Predicting house prices

B

Identifying customer segments

C

Spam classification

D

Stock price prediction

Q78

Q78 What metric is commonly used to evaluate a regression model in supervised learning?

A

Accuracy

B

Mean Squared Error (MSE)

C

Precision

D

Silhouette score

Q79

Q79 Why is clustering considered an unsupervised learning technique?

A

It requires labeled data

B

It uses supervised models

C

It finds patterns in unlabeled data

D

It predicts outcomes

Q80

Q80 Which Python library provides the KMeans function for clustering?

A

NumPy

B

Pandas

C

scikit-learn

D

Matplotlib

Q81

Q81 How do you fit a decision tree classifier in scikit-learn?

A

model.train(X, y)

B

model.fit(X, y)

C

model.learn(X, y)

D

model.split(X, y)

Q82

Q82 Which function in scikit-learn is used to calculate the silhouette score for a clustering model?

A

silhouette_score()

B

cluster_score()

C

clustering_score()

D

silhouette_metric()

Q83

Q83 How do you specify the number of clusters in the KMeans algorithm using scikit-learn?

A

KMeans(n_clusters=n)

B

KMeans(clusters=n)

C

KMeans(n=n)

D

KMeans(n_cluster=n)

Q84

Q84 A supervised model performs poorly on unseen data. What is the likely issue?

A

Data leakage

B

Underfitting

C

Incorrect loss function

D

Missing labels

Q85

Q85 A clustering model produces inconsistent results. What could be the likely cause?

A

Wrong feature scaling

B

Labeled data

C

High accuracy

D

Balanced dataset

Q86

Q86 After applying KMeans, one cluster has very few data points. What should you consider next?

A

Increase cluster count

B

Decrease cluster count

C

Visualize clusters

D

Change the algorithm

Q87

Q87 What is the primary goal of feature engineering in machine learning?

A

Improve model interpretability

B

Reduce dataset size

C

Enhance model performance

D

Avoid overfitting

Q88

Q88 Which technique is commonly used to handle categorical data in feature engineering?

A

Normalization

B

One-hot encoding

C

PCA

D

Standardization

Q89

Q89 Why is feature scaling important in machine learning?

A

Reduces model size

B

Improves convergence during training

C

Handles missing values

D

Reduces overfitting

Q90

Q90 What is feature selection?

A

Adding new features

B

Choosing the best features

C

Removing outliers

D

Scaling data

ad verticalad vertical
ad