Q31
Q31 What is the primary goal of Exploratory Data Analysis?
Predict outcomes
Summarize data characteristics
Visualize predictions
Build models
Q32
Q32 Which of the following is a common technique used during EDA?
Clustering
PCA
Descriptive statistics
Feature selection
Q33
Q33 What is the significance of identifying skewness in data during EDA?
It helps in feature scaling
It determines model type
It affects data distribution assumptions
It improves visualization
Q34
Q34 Which visualization is best suited for analyzing the relationship between two numerical variables?
Histogram
Boxplot
Scatter plot
Bar chart
Q35
Q35 Why is it critical to detect multicollinearity during EDA?
It improves model accuracy
It ensures independence among predictors
It removes missing values
It selects important features
Q36
Q36 Which Python library is used for creating basic visualizations such as line and bar charts?
NumPy
Pandas
Matplotlib
Seaborn
Q37
Q37 How do you compute the correlation matrix for a DataFrame in Python?
df.corr()
df.describe()
df.cov()
df.plot()
Q38
Q38 Which visualization technique is useful for identifying clusters in data during EDA?
Scatter plot
Heatmap
Boxplot
Pairplot
Q39
Q39 If a dataset contains missing values in a column, what is the simplest way to visualize its impact?
Use a scatter plot
Use a heatmap
Drop the column
Fill missing values
Q40
Q40 A dataset shows a perfect correlation of +1 between two variables. What is the likely issue?
Multicollinearity
Outliers
No issue
Wrong visualization
Q41
Q41 During EDA, an outlier is identified in a boxplot. What is the best course of action?
Remove the outlier
Keep the outlier
Investigate the outlier
Ignore the outlier
Q42
Q42 What is the primary purpose of hypothesis testing in statistics?
To clean data
To test assumptions
To visualize trends
To encode features
Q43
Q43 Which statistical measure represents the spread of data values around the mean?
Variance
Mean
Median
Skewness
Q44
Q44 When is the p-value considered statistically significant in hypothesis testing?
When p > 0.05
When p < 0.05
When p = 0.1
When p > 1
Q45
Q45 What does the standard deviation indicate in a dataset?
The central tendency
The variability
The skewness
The correlation
Q46
Q46 What type of statistical analysis helps identify relationships between variables?
Correlation analysis
Variance analysis
Skewness analysis
Descriptive statistics
Q47
Q47 What assumption is made about data in a parametric statistical test?
Data is categorical
Data follows a normal distribution
Data has no missing values
Data is continuous
Q48
Q48 Which Python library provides the ttest_ind function for hypothesis testing?
Pandas
NumPy
SciPy
Matplotlib
Q49
Q49 How can you calculate the mean of a column in a Pandas DataFrame?
df.column.mean()
df.mean(column)
mean(df.column)
df.column.calc_mean()
Q50
Q50 A dataset has a column with skewed numerical data. What is the best approach to normalize it?
Use log transformation
Drop outliers
Encode values
Use boxplot
Q51
Q51 A dataset's p-value is 0.01 after running a statistical test. What does this imply?
Strong evidence against the null hypothesis
No evidence against the null hypothesis
Data is normally distributed
Data has no variance
Q52
Q52 After standardizing data, the z-scores of a column are very high. What could be the issue?
Incorrect scaling
Outliers
Data is normalized
No issue
Q53
Q53 What is the primary purpose of data visualization?
To analyze data
To predict outcomes
To represent data visually
To encode data
Q54
Q54 Which visualization is best suited for showing data distribution?
Line chart
Scatter plot
Histogram
Pie chart
Q55
Q55 Which chart is most effective for comparing parts of a whole?
Scatter plot
Pie chart
Boxplot
Line chart
Q56
Q56 What does a boxplot help identify in a dataset?
Outliers
Correlations
Clusters
Trends
Q57
Q57 Which of the following is a common mistake in data visualization?
Using appropriate scales
Choosing the right chart type
Overloading charts with data
Labeling axes
Q58
Q58 Which Matplotlib function is used to create a simple line chart?
plt.scatter()
plt.line()
plt.plot()
plt.bar()
Q59
Q59 How do you create a bar chart in Matplotlib?
plt.bar(x, y)
plt.plot(x, y)
plt.hist(x)
plt.scatter(x, y)
Q60
Q60 Which Python library allows for creating highly interactive visualizations with minimal coding?
Seaborn
Matplotlib
Plotly
Pandas