Q121
Q121 Which of the following is a disadvantage of Jupyter Notebooks?
Lack of interactivity
No real-time collaboration
Limited coding features
Requires high computational resources
Q122
Q122 Which Python library is primarily used for numerical computations?
NumPy
Pandas
Matplotlib
Seaborn
Q123
Q123 How do you load a CSV file into a Pandas DataFrame?
pd.load_csv()
pd.read_csv()
pd.import_csv()
pd.csv()
Q124
Q124 Which function in Jupyter Notebook allows you to create a new cell?
Shift + Enter
Ctrl + Enter
Alt + Enter
Esc + B
Q125
Q125 How do you install a new Python library using Jupyter Notebook?
pip.install(library)
!install library
install(library)
!pip install library
Q126
Q126 A Pandas DataFrame throws an error: "KeyError: column not found." What could be the issue?
Column name mismatch
Empty DataFrame
Incorrect library
Non-numeric data
Q127
Q127 While using Jupyter Notebook, the kernel frequently crashes during computation. What could be the cause?
Unsupported library
Insufficient memory
Incorrect syntax
No internet connection
Q128
Q128 What is the primary challenge addressed by distributed computing?
Storage optimization
Real-time collaboration
Processing large-scale data
Building small-scale applications
Q129
Q129 Which of the following is an example of a distributed computing framework?
Hadoop
Tableau
MySQL
Excel
Q130
Q130 What is the role of a job tracker in Hadoop’s architecture?
Managing storage
Assigning and monitoring tasks
Optimizing visualization
Analyzing datasets
Q131
Q131 Why is fault tolerance important in distributed computing?
It reduces redundancy
It ensures high availability
It speeds up computation
It optimizes resource usage
Q132
Q132 How do you initialize a Spark session in PySpark?
spark = SparkSession.start()
spark = SparkSession.builder.getOrCreate()
spark = Spark.start()
spark = SparkContext.start()
Q133
Q133 Which PySpark method is used to read a CSV file into a DataFrame?
read.csv()
spark.read.csv()
pd.read_csv()
load_csv()
Q134
Q134 How do you write a PySpark DataFrame to a Parquet file?
df.write.csv()
df.write.json()
df.write.parquet()
df.write.format("csv")
Q135
Q135 A Hadoop job fails midway due to a node failure. What ensures task completion in such cases?
Data replication
Parallel computing
Data visualization
Fault detection
Q136
Q136 A PySpark job runs slower than expected. What could be a possible issue?
Incorrect function syntax
Resource underutilization
Balanced partitions
Optimized transformations
Q137
Q137 Why is data privacy important in Data Science?
To increase storage
To protect user rights
To speed up processing
To improve data formats
Q138
Q138 Which of the following is a common ethical concern in AI systems?
Transparency
Data visualization
Efficient computation
Hardware optimization
Q139
Q139 What is data bias in Data Science?
Errors due to missing values
Unrepresentative data causing unfair outcomes
Overfitting
Incomplete preprocessing
Q140
Q140 Which Python library helps ensure secure handling of sensitive data during analysis?
NumPy
PyCrypto
Matplotlib
Pandas
Q141
Q141 How do you anonymize sensitive columns in a Pandas DataFrame?
df.anonymize()
hashlib.hash(df)
df['column'].apply(hashlib.sha256)
df.remove('column')
Q142
Q142 A dataset contains personally identifiable information (PII). What is the recommended practice before analysis?
Encrypt the dataset
Share the data
Ignore PII
Remove or anonymize PII
Q143
Q143 An AI model shows biased outcomes in predictions. What could be the issue?
Data preprocessing error
Biased training data
Correct loss function
Adequate testing
Q144
Q144 What is the primary benefit of case studies in Data Science?
They improve storage efficiency
They provide real-world problem-solving examples
They optimize algorithms
They test software
Q145
Q145 In predictive modeling, which case study metric is most relevant for evaluating accuracy?
Silhouette Score
Mean Absolute Error (MAE)
Execution Time
Data Redundancy
Q146
Q146 Which challenge is commonly highlighted in Data Science case studies involving healthcare?
Lack of computational resources
Data privacy and security
Limited statistical methods
Excessive labeled data
Q147
Q147 Which Python library is commonly used in case studies for creating visualizations to summarize results?
Seaborn
NumPy
PyTorch
Scikit-learn
Q148
Q148 How do you save the results of a machine learning model in Python for later use?
pickle.dump(model, file)
save_model(model)
model.save('file')
file.save(model)
Q149
Q149 During a case study analysis, a DataFrame contains missing values. What is the simplest method to handle this?
Drop rows with missing values
Save the DataFrame
Optimize DataFrame size
Export the DataFrame
Q150
Q150 A Data Science case study involves unbalanced classes in a classification dataset. What preprocessing step can address this?
Normalization
Data augmentation
PCA
Dimensionality reduction