
Q91
Q91 A machine translation system generates grammatically incorrect sentences. What is the likely issue?
Lack of linguistic features
Insufficient training
Poor tokenization
Low vocabulary coverage
Q92
Q92 What is the primary purpose of text classification in NLP?
To generate embeddings
To classify text into categories
To tokenize sentences
To remove stopwords
Q93
Q93 Which method is commonly used for vectorizing text in traditional NLP pipelines?
Bag-of-Words
Transformers
Word2Vec
RNNs
Q94
Q94 What is the main limitation of TF-IDF vectorization?
Ignores word frequency
Ignores word context
Overfits data
Requires embeddings
Q95
Q95 Which algorithm is commonly used for binary text classification tasks?
K-Means
Naive Bayes
Apriori
Decision Trees
Q96
Q96 Which vectorization method captures both word order and context in text classification?
TF-IDF
Bag-of-Words
Word Embeddings
Transformer-based embeddings
Q97
Q97 Which library in Python provides tools for creating TF-IDF vectors?
nltk
scikit-learn
spaCy
TextBlob
Q98
Q98 How can you preprocess text for vectorization using nltk?
Tokenize and lowercase
Skip tokenization
Generate embeddings
Apply TF-IDF directly
Q99
Q99 How do you implement a classification pipeline using scikit-learn?
Build and train models separately
Use Pipeline to combine steps
Skip preprocessing
Train without vectorization
Q100
Q100 A classifier performs poorly due to irrelevant features in vectorization. What should you do?
Increase vocabulary size
Apply stopword removal
Reduce dataset size
Skip preprocessing
Q101
Q101 A model overfits during text classification. What can you adjust?
Reduce embedding size
Apply regularization
Skip vectorization
Use smaller datasets
Q102
Q102 A classifier struggles to differentiate between similar classes. What approach can improve this?
Use embeddings with context
Reduce feature set
Use simpler models
Increase batch size
Q103
Q103 What is the primary purpose of sequence-to-sequence models in NLP?
Classification
Sequence prediction
Tokenization
Entity recognition
Q104
Q104 Which component of a sequence-to-sequence model generates the output sequence?
Decoder
Encoder
Embedding
Attention
Q105
Q105 How does the attention mechanism improve sequence-to-sequence models?
Reduces training time
Focuses on relevant parts of the input
Ignores long inputs
Speeds up decoding
Q106
Q106 Which type of sequence-to-sequence model architecture is most effective for long sequences?
RNN-based
CNN-based
Transformer-based
Naive Bayes
Q107
Q107 What is the role of positional encoding in transformer-based sequence-to-sequence models?
Adds semantic meaning
Represents token relationships
Preserves word order
Tokenizes text
Q108
Q108 Which library provides pre-trained sequence-to-sequence models like BART and T5?
nltk
Hugging Face
TextBlob
spaCy
Q109
Q109 How do you fine-tune a pre-trained sequence-to-sequence model using transformers?
Load a pre-trained model
Train with a custom tokenizer
Use a labeled sequence dataset
All of the above
Q110
Q110 Which parameter in transformers controls the length of output sequences during generation?
max_length
min_length
output_size
length_penalty
Q111
Q111 A sequence-to-sequence model generates incomplete outputs. What could improve this?
Increase max_length
Use smaller datasets
Reduce attention heads
Skip fine-tuning
Q112
Q112 A sequence-to-sequence model produces irrelevant output for longer inputs. What should you adjust?
Use attention mechanisms
Increase training epochs
Reduce vocabulary size
Ignore longer sequences
Q113
Q113 What is the main advantage of transformer models over RNNs?
Parallel processing
Handles fixed-length inputs
Simpler architecture
Lower computational cost
Q114
Q114 What is the role of self-attention in transformer models?
Preserves word order
Focuses on relevant words
Simplifies embeddings
Improves tokenization
Q115
Q115 Which component of a transformer model ensures information flow across layers?
Feed-forward layers
Normalization layers
Positional encoding
Residual connections
Q116
Q116 How does BERT differ from traditional transformer models?
It uses bi-directional context
It processes data sequentially
It ignores masked tokens
It requires no pre-training
Q117
Q117 Which library in Python provides pre-trained BERT models?
nltk
Hugging Face
TextBlob
spaCy
Q118
Q118 How do you fine-tune BERT for a text classification task?
Train from scratch
Use AutoModelForSequenceClassification
Apply Bag-of-Words
Use RNNs
Q119
Q119 A BERT model performs poorly on domain-specific tasks. What should you do?
Use a smaller model
Train with more epochs
Fine-tune on domain-specific data
Reduce vocabulary size
Q120
Q120 A transformer model fails to generate coherent long texts. What should you adjust?
Add positional encoding
Reduce context length
Train on short sentences
Use static embeddings