Q1
Q1 What is Data Science primarily focused on?
Data storage
Data visualization
Insight extraction
App development
Q2
Q2 Which of the following is a key aspect of data science?
Building dashboards
Cleaning and analyzing data
Developing web pages
Writing blogs
Q3
Q3 What type of data does Data Science primarily handle?
Only structured
Only unstructured
Both structured and unstructured
None of the above
Q4
Q4 Which of these domains does Data Science NOT directly involve?
Machine learning
Database optimization
Statistics
Data visualization
Q5
Q5 What is a key challenge faced in Data Science projects?
Lack of storage
Model overfitting
Manual calculations
System downtime
Q6
Q6 What role does domain expertise play in Data Science?
It is optional
It provides data storage solutions
It helps understand data context
It prevents coding errors
Q7
Q7 Which of the following is a critical component of a Data Science pipeline?
Web hosting
Feature selection
Presentation design
Software installation
Q8
Q8 In Python, which library is commonly used for numerical computations in Data Science?
NumPy
Matplotlib
Flask
Pandas
Q9
Q9 A Data Scientist receives a dataset with duplicate entries. What is the simplest way to handle this in Pandas?
drop_duplicates()
remove_duplicates()
dropna()
fillna()
Q10
Q10 What is the first step in the Data Science Life Cycle?
Model Building
Data Cleaning
Problem Definition
Evaluation
Q11
Q11 Which phase in the Data Science Life Cycle involves cleaning and preparing data for analysis?
Model Evaluation
Data Cleaning
Data Analysis
Visualization
Q12
Q12 Which step in the Data Science Life Cycle involves determining if the model meets project objectives?
Data Collection
Model Deployment
Evaluation
Visualization
Q13
Q13 What happens during the Data Collection phase of the Data Science Life Cycle?
Data is stored in a database
Data is gathered from multiple sources
Data is split into training and test sets
Data is discarded
Q14
Q14 Which step in the Data Science Life Cycle involves feature engineering and transformation?
Problem Definition
Data Cleaning
Data Preparation
Evaluation
Q15
Q15 Why is the deployment phase critical in the Data Science Life Cycle?
It ensures the model is trained
It makes the model accessible for users
It removes irrelevant data
It generates reports
Q16
Q16 What is a major challenge during the evaluation phase of the Data Science Life Cycle?
Selecting the right metric
Collecting data
Training models
Understanding business goals
Q17
Q17 In Python, which library is commonly used for splitting datasets during the Data Preparation phase?
scikit-learn
NumPy
Pandas
Matplotlib
Q18
Q18 A Data Scientist’s model performs poorly in production compared to testing. What could be the most likely cause?
Overfitting
Clean data
Balanced dataset
Simple model
Q19
Q19 What is the primary goal of data cleaning in Data Science?
To remove duplicates
To visualize data
To identify and fix data quality issues
To split data
Q20
Q20 Why is handling missing values important during data preprocessing?
It ensures model interpretability
It improves model accuracy
It increases data storage
It simplifies code
Q21
Q21 Which technique can be used to handle outliers in numerical data?
Removing them
Normalizing data
Imputation
All of the above
Q22
Q22 What is the effect of standardization in data preprocessing?
It removes duplicates
It ensures data values are centered around zero
It improves data cleaning
It removes missing values
Q23
Q23 Which preprocessing step ensures categorical variables are suitable for numerical models?
Scaling
One-hot encoding
Outlier detection
Normalization
Q24
Q24 When dealing with a dataset containing multiple irrelevant features, which method is most effective?
Data cleaning
Feature selection
One-hot encoding
Standardization
Q25
Q25 In Python, which Pandas method removes rows with missing values?
drop_duplicates()
dropna()
fillna()
replace()
Q26
Q26 How do you replace missing values in a Pandas DataFrame column with the mean of that column?
df.fillna(df.mean())
df.mean().replace()
df.replace_mean()
df.fill(df.mean())
Q27
Q27 Which Python library is best suited for outlier detection using clustering techniques?
scikit-learn
NumPy
Pandas
Matplotlib
Q28
Q28 A dataset has duplicate rows causing issues in analysis. Which Pandas method will you use to fix this?
drop_duplicates()
dropna()
fillna()
groupby()
Q29
Q29 A column contains both numerical and non-numerical values. How should you preprocess it for numerical analysis?
Drop the column
Impute missing values
Use encoding techniques
Normalize data
Q30
Q30 After standardizing a dataset, a model performs poorly. What could be a possible issue?
Data leakage
Overfitting
Outliers
Incorrect scaling