
Q31
Q31 Text preprocessing slows down significantly when applied to large datasets. What is a potential fix?
Use faster tokenization methods
Use smaller datasets
Skip normalization
Disable stemming
Q32
Q32 What is the primary purpose of POS tagging in NLP?
To identify stopwords
To label each word with its grammatical role
To tokenize text
To generate embeddings
Q33
Q33 Which of the following is a common POS tagging technique?
Rule-based
Bag-of-Words
Transformer-based
Embedding-based
Q34
Q34 Which Python library provides the pos_tag method for tagging words?
spaCy
nltk
TextBlob
pandas
Q35
Q35 What is the main challenge of POS tagging for ambiguous words like "can"?
Lack of training data
Ambiguity in context
Complex tokenization
Non-standard text
Q36
Q36 How does POS tagging assist in Named Entity Recognition (NER)?
It identifies word context
It detects sentence structure
It assigns roles to entities
It identifies grammatical errors
Q37
Q37 Which POS tagging method uses hidden states to model word sequences?
Rule-based
Hidden Markov Model
Bag-of-Words
Embedding-based
Q38
Q38 How do you perform POS tagging using spaCy in Python?
nlp.pos(text)
nlp(text).pos_
nlp(text)
nlp.pos_tags(text)
Q39
Q39 Which attribute of spaCy tokens can be used to get the POS tag?
text
lemma_
pos_
tag_
Q40
Q40 How do you display the detailed POS tags of a sentence using nltk?
pos_tag(sentence)
pos_tag(word_tokenize(sentence))
tag(sentence)
tokenize(sentence)
Q41
Q41 A POS tagging model incorrectly tags all nouns as verbs. What could be a likely issue?
Incorrect tokenization
Insufficient training data
Incorrect tagging logic
Normalization errors
Q42
Q42 A POS tagging system struggles with unseen words in a test dataset. What should you use?
Rule-based methods
Pre-trained embeddings
Bag-of-Words
Word frequency analysis
Q43
Q43 A POS tagging pipeline fails to distinguish between “book” as a noun and a verb. What should you improve?
Tagging rules
Context modeling
Tokenization
Dataset size
Q44
Q44 What is the primary goal of Named Entity Recognition (NER)?
Identify grammatical errors
Classify entities into predefined categories
Generate embeddings
Tokenize text
Q45
Q45 Which of the following is a commonly recognized entity type in NER?
Noun
Location
Verb
Adjective
Q46
Q46 How does context affect the performance of NER models?
Context doesn’t affect
Improves recognition of ambiguous entities
Reduces performance
No impact
Q47
Q47 Which algorithm is commonly used for NER tasks?
Decision Tree
K-Means
Conditional Random Fields (CRF)
Linear Regression
Q48
Q48 What is the role of a gazetteer in NER?
Provides training data
Generates embeddings
Lists predefined entities
Tokenizes text
Q49
Q49 Which neural network architecture is commonly paired with CRF for NER?
RNN
CNN
LSTM
Transformer
Q50
Q50 Which library in Python provides a pretrained NER model using spacy?
nltk
spaCy
TextBlob
pandas
Q51
Q51 How do you extract named entities using spaCy?
doc.entities
doc.ents
doc.tokens
doc.entity_types
Q52
Q52 How do you train a custom NER model using spaCy?
Update the pipeline
Modify stopwords
Train a new word2vec model
Manually tag data
Q53
Q53 An NER model incorrectly tags all city names as organizations. What is a likely issue?
Poor tokenization
Ambiguous training data
Incorrect embeddings
Low batch size
Q54
Q54 An NER model fails to recognize new entities in a specific domain. What should you do?
Use a gazetteer
Ignore domain data
Train on unrelated datasets
Reduce model size
Q55
Q55 An NER model struggles to generalize across different datasets. What technique can help?
Train a larger model
Use domain adaptation
Skip embedding layers
Reduce training data
Q56
Q56 What is the main purpose of word embeddings in NLP?
To tokenize text
To capture semantic meaning of words
To remove stopwords
To perform lemmatization
Q57
Q57 Which method does GloVe use to learn word embeddings?
Probabilistic models
Matrix factorization
Recurrent networks
Transformers
Q58
Q58 What is the difference between Word2Vec and GloVe?
Word2Vec is count-based, GloVe is predictive
Word2Vec uses local context, GloVe uses global context
Word2Vec uses global statistics
GloVe ignores word frequency
Q59
Q59 Which of the following training modes is available in Word2Vec?
Skip-gram
Bag-of-Words
LSTM
Transformer
Q60
Q60 Why are pre-trained embeddings like GloVe preferred over training from scratch?
They are less accurate
They reduce training time
They ignore rare words
They work with any language