Data Science MCQ Questions Practice Problems

Question 1

What is the primary goal of Exploratory Data Analysis?

Accepted Answer

Summarize data characteristics

Answer

Predict outcomes

Answer

Visualize predictions

Answer

Build models

Question 2

Which of the following is a common technique used during EDA?

Accepted Answer

Descriptive statistics

Answer

Clustering

Answer

PCA

Answer

Feature selection

Question 3

What is the significance of identifying skewness in data during EDA?

Accepted Answer

It affects data distribution assumptions

Answer

It helps in feature scaling

Answer

It determines model type

Answer

It improves visualization

Question 4

Which visualization is best suited for analyzing the relationship between two numerical variables?

Accepted Answer

Scatter plot

Answer

Histogram

Answer

Boxplot

Answer

Bar chart

Question 5

Why is it critical to detect multicollinearity during EDA?

Accepted Answer

It ensures independence among predictors

Answer

It improves model accuracy

Answer

It removes missing values

Answer

It selects important features

Question 6

Which Python library is used for creating basic visualizations such as line and bar charts?

Accepted Answer

Matplotlib

Answer

NumPy

Answer

Pandas

Answer

Seaborn

Question 7

How do you compute the correlation matrix for a DataFrame in Python?

Accepted Answer

df.corr()

Answer

df.describe()

Answer

df.cov()

Answer

df.plot()

Question 8

Which visualization technique is useful for identifying clusters in data during EDA?

Accepted Answer

Pairplot

Answer

Scatter plot

Answer

Heatmap

Answer

Boxplot

Question 9

If a dataset contains missing values in a column, what is the simplest way to visualize its impact?

Accepted Answer

Use a heatmap

Answer

Use a scatter plot

Answer

Drop the column

Answer

Fill missing values

Question 10

A dataset shows a perfect correlation of +1 between two variables. What is the likely issue?

Accepted Answer

Multicollinearity

Answer

Outliers

Answer

No issue

Answer

Wrong visualization

Question 11

During EDA, an outlier is identified in a boxplot. What is the best course of action?

Accepted Answer

Investigate the outlier

Answer

Remove the outlier

Answer

Keep the outlier

Answer

Ignore the outlier

Question 12

What is the primary purpose of hypothesis testing in statistics?

Accepted Answer

To test assumptions

Answer

To clean data

Answer

To visualize trends

Answer

To encode features

Question 13

Which statistical measure represents the spread of data values around the mean?

Accepted Answer

Variance

Answer

Mean

Answer

Median

Answer

Skewness

Question 14

When is the p-value considered statistically significant in hypothesis testing?

Accepted Answer

When p < 0.05

Answer

When p > 0.05

Answer

When p = 0.1

Answer

When p > 1

Question 15

What does the standard deviation indicate in a dataset?

Accepted Answer

The variability

Answer

The central tendency

Answer

The skewness

Answer

The correlation

Question 16

What type of statistical analysis helps identify relationships between variables?

Accepted Answer

Correlation analysis

Answer

Variance analysis

Answer

Skewness analysis

Answer

Descriptive statistics

Question 17

What assumption is made about data in a parametric statistical test?

Accepted Answer

Data follows a normal distribution

Answer

Data is categorical

Answer

Data has no missing values

Answer

Data is continuous

Question 18

Which Python library provides the ttest_ind function for hypothesis testing?

Accepted Answer

SciPy

Answer

Pandas

Answer

NumPy

Answer

Matplotlib

Question 19

How can you calculate the mean of a column in a Pandas DataFrame?

Accepted Answer

df.column.mean()

Answer

df.mean(column)

Answer

mean(df.column)

Answer

df.column.calc_mean()

Question 20

A dataset has a column with skewed numerical data. What is the best approach to normalize it?

Accepted Answer

Use log transformation

Answer

Drop outliers

Answer

Encode values

Answer

Use boxplot

Question 21

A dataset's p-value is 0.01 after running a statistical test. What does this imply?

Accepted Answer

Strong evidence against the null hypothesis

Answer

No evidence against the null hypothesis

Answer

Data is normally distributed

Answer

Data has no variance

Question 22

After standardizing data, the z-scores of a column are very high. What could be the issue?

Accepted Answer

Outliers

Answer

Incorrect scaling

Answer

Data is normalized

Answer

No issue

Question 23

What is the primary purpose of data visualization?

Accepted Answer

To represent data visually

Answer

To analyze data

Answer

To predict outcomes

Answer

To encode data

Question 24

Which visualization is best suited for showing data distribution?

Accepted Answer

Histogram

Answer

Line chart

Answer

Scatter plot

Answer

Pie chart

Question 25

Which chart is most effective for comparing parts of a whole?

Accepted Answer

Pie chart

Answer

Scatter plot

Answer

Boxplot

Answer

Line chart

Question 26

What does a boxplot help identify in a dataset?

Accepted Answer

Outliers

Answer

Correlations

Answer

Clusters

Answer

Trends

Question 27

Which of the following is a common mistake in data visualization?

Accepted Answer

Overloading charts with data

Answer

Using appropriate scales

Answer

Choosing the right chart type

Answer

Labeling axes

Question 28

Which Matplotlib function is used to create a simple line chart?

Accepted Answer

plt.plot()

Answer

plt.scatter()

Answer

plt.line()

Answer

plt.bar()

Question 29

How do you create a bar chart in Matplotlib?

Accepted Answer

plt.bar(x, y)

Answer

plt.plot(x, y)

Answer

plt.hist(x)

Answer

plt.scatter(x, y)

Question 30

Which Python library allows for creating highly interactive visualizations with minimal coding?

Accepted Answer

Plotly

Answer

Seaborn

Answer

Matplotlib

Answer

Pandas

Data Science Multiple Choice Questions (MCQs) and Answers