Q31
Q31 Which of the following is a function of the Hadoop NameNode?
Store file blocks
Manage the metadata
Perform data analytics
Store the data
Q32
Q32 How does Hadoop ensure fault tolerance?
By compressing data
By using a single master node
By replicating data across nodes
By using cloud storage
Q33
Q33 What role does the Hadoop Secondary NameNode play?
Acts as a backup for the NameNode
Performs checkpoints of the NameNode metadata
Processes data in real time
Distributes data across nodes
Q34
Q34 Which of the following components is responsible for running a Hadoop job?
NameNode
DataNode
YARN
HDFS
Q35
Q35 How does MapReduce handle large data sets efficiently in Hadoop?
By performing real-time analytics
By splitting tasks across multiple nodes
By using in-memory storage
By compressing data
Q36
Q36 Which command is used to list all Hadoop jobs running on the cluster?
hadoop jobs -list
yarn application -list
hdfs dfsadmin -list
hadoop job -status
Q37
Q37 Which Hadoop command is used to stop the HDFS NameNode?
hdfs namenode -stop
hadoop namenode -stop
hadoop dfsadmin -stop
stop-dfs.sh
Q38
Q38 Which command is used to format the Hadoop NameNode?
hdfs namenode -format
hdfs namenode -init
hadoop namenode -reformat
hdfs format
Q39
Q39 A Hadoop job is failing due to a "disk full" error on one DataNode. What is the most likely solution?
Replicate data to the NameNode
Add additional storage to the DataNode
Stop the job
Reformat the NameNode
Q40
Q40 A MapReduce job is running slower than expected. Which of the following is a likely cause?
Insufficient disk space
Network latency between nodes
Improperly configured memory
High replication factor
Q41
Q41 A Hadoop job continues to fail after several retries due to a corrupt block on a DataNode. What should be the next step?
Increase memory allocation
Re-run the job
Delete the corrupted block
Restart the NameNode
Q42
Q42 What is the primary function of the Map phase in MapReduce?
Sort data
Filter data
Map data to key-value pairs
Aggregate data
Q43
Q43 In MapReduce, what is the function of the Reduce phase?
To combine intermediate data
To map data to key-value pairs
To sort data
To store data
Q44
Q44 Which of the following is a key characteristic of the MapReduce framework?
Real-time processing
Distributed processing
Sequential processing
In-memory processing
Q45
Q45 How does MapReduce ensure data locality in distributed computing?
By using cloud storage
By moving data to the computation
By performing operations in memory
By moving computation to the data
Q46
Q46 What is the purpose of partitioning in the MapReduce framework?
To increase replication
To group key-value pairs for processing
To reduce data size
To increase memory allocation
Q47
Q47 How does the Shuffle and Sort phase contribute to the MapReduce process?
It compresses data
It arranges data in a specific order
It distributes data evenly across reducers
It aggregates data
Q48
Q48 Which command is used to submit a MapReduce job in Hadoop?
hadoop job -submit
hadoop jar
hadoop dfs -submit
hadoop run-job
Q49
Q49 Which class in the Hadoop MapReduce framework defines the logic of the Reduce function?
Reducer
Mapper
Partitioner
Combiner
Q50
Q50 Which of the following is used to control how the key-value pairs are partitioned in MapReduce?
Combiner
Reducer
Mapper
Partitioner
Q51
Q51 A MapReduce job is stuck in the Shuffle and Sort phase. What is the likely cause?
Insufficient memory
Data corruption
Network issues
Incorrect key-value pairs
Q52
Q52 A MapReduce job is taking longer than expected due to a skew in the data distribution. What could be the reason?
Too many mappers
Improper key partitioning
High replication factor
Large input data size
Q53
Q53 A MapReduce job fails repeatedly at the Reduce phase. What could be the most likely reason?
Incorrect key-value pairs
Disk failure
Network congestion
Reducer memory overload
Q54
Q54 What is Apache Spark primarily used for?
Real-time processing
Batch processing
Data storage
File compression
Q55
Q55 What is the primary purpose of Apache Hive?
Distributed file storage
Data querying
Real-time analytics
Data visualization
Q56
Q56 Which programming language is commonly used to write Pig Latin scripts?
Java
Python
Pig Latin
Scala
Q57
Q57 How does Apache Pig handle large datasets in a distributed environment?
By using a SQL-like language
By using in-memory storage
By parallel processing across nodes
By replicating data
Q58
Q58 What role does Spark's Resilient Distributed Dataset (RDD) play?
It stores data temporarily
It enables in-memory computation
It writes data to disk
It compresses data
Q59
Q59 How does Apache Spark achieve fault tolerance during distributed processing?
By replicating data
By using RDDs
By checkpointing data
By splitting tasks
Q60
Q60 Which command is used to launch the Apache Spark shell?
spark-shell
spark-run
spark-submit
start-spark