Q91
Q91 When should feature extraction be used instead of feature selection?
When raw features are sufficient
When features need transformation
When data is balanced
When model accuracy is high
Q92
Q92 Which scikit-learn function is used to normalize data?
normalize()
standardize()
scale()
transform()
Q93
Q93 How do you perform one-hot encoding in Pandas?
pd.one_hot()
pd.dummies()
pd.categorical()
pd.encoding()
Q94
Q94 Which method in scikit-learn is used for dimensionality reduction?
PCA()
StandardScaler()
KMeans()
OneHotEncoder()
Q95
Q95 A dataset has highly correlated features. How should you handle this issue?
Normalize features
Drop one of the correlated features
Encode features
Use PCA
Q96
Q96 A numerical feature has a skewed distribution. What transformation can address this?
Log transformation
Drop the feature
One-hot encoding
Normalize values
Q97
Q97 A dataset has missing values for important features. What is the best approach to address this?
Remove the rows
Impute values
Drop the feature
Ignore the missing data
Q98
Q98 What is a key characteristic of time series data?
Random observations
Data without timestamps
Sequential observations over time
Categorical data
Q99
Q99 Which of the following is commonly used to detect seasonality in time series data?
Histogram
Autocorrelation
Scatter plot
PCA
Q100
Q100 Why is stationarity important in time series analysis?
It ensures data completeness
It stabilizes variance
It allows for accurate forecasting
It reduces data size
Q101
Q101 What is the purpose of differencing in time series preprocessing?
To detect seasonality
To remove trend and make data stationary
To visualize data
To encode features
Q102
Q102 Which metric is commonly used to evaluate the accuracy of a time series model?
Precision
Mean Absolute Error (MAE)
Silhouette Score
Log Loss
Q103
Q103 Which Python library provides the seasonal_decompose function for analyzing time series components?
Pandas
NumPy
statsmodels
Matplotlib
Q104
Q104 How do you plot a time series in Pandas?
plt.plot(time_series)
time_series.plot()
pd.plot(time_series)
plot(time_series)
Q105
Q105 Which method is used in statsmodels to fit an ARIMA model for time series forecasting?
fit_arima()
arima_fit()
ARIMA().fit()
forecast_arima()
Q106
Q106 A time series dataset shows an upward trend. What preprocessing step is necessary before modeling?
One-hot encoding
Differencing
Scaling
Normalizing
Q107
Q107 A time series forecast consistently underestimates values during high seasons. What could be the issue?
Incorrect seasonality handling
Overfitting
Underfitting
Missing timestamps
Q108
Q108 What is the main goal of Natural Language Processing?
Analyzing numerical data
Understanding and processing human language
Creating images
Performing clustering
Q109
Q109 Which of the following tasks is NOT part of Natural Language Processing?
Sentiment analysis
Speech recognition
Image classification
Text summarization
Q110
Q110 What is tokenization in NLP?
Dividing text into words or subwords
Encoding numerical data
Creating embeddings
Reducing noise in data
Q111
Q111 What is the purpose of stopword removal in text preprocessing?
To normalize text
To reduce dimensionality
To remove common but insignificant words
To correct spelling
Q112
Q112 What is a bag-of-words representation in NLP?
A numerical representation of text
A method to remove stopwords
A type of neural network
A clustering algorithm
Q113
Q113 Which library provides the word_tokenize function for tokenization in Python?
NumPy
NLTK
Pandas
Scikit-learn
Q114
Q114 How do you create a term frequency-inverse document frequency (TF-IDF) matrix in scikit-learn?
TfidfVectorizer.fit_transform()
CountVectorizer.fit_transform()
TfidfTransformer.fit()
transform_TF()
Q115
Q115 Which Python library provides pre-trained word embeddings like Word2Vec?
NLTK
Gensim
Pandas
SpaCy
Q116
Q116 A text classification model performs poorly due to high-dimensional feature space. What preprocessing step can help?
Normalization
Dimensionality reduction
Feature extraction
Stopword removal
Q117
Q117 A sentiment analysis model misclassifies reviews with negations (e.g., "not good"). What could address this?
Using n-grams
Stopword removal
Bag-of-words
TF-IDF
Q118
Q118 Which tool is primarily used for creating interactive and shareable notebooks for data analysis?
RStudio
Jupyter Notebook
PyCharm
Tableau
Q119
Q119 Which library in Python is most commonly used for data manipulation and analysis?
Matplotlib
Pandas
SciPy
NumPy
Q120
Q120 What is the main use of R in Data Science?
Data visualization and statistical analysis
Deep learning
Web development
API creation