Q61
Q61 Which HiveQL command is used to create a new table in Hive?
CREATE DATABASE
INSERT INTO
CREATE TABLE
ADD TABLE
Q62
Q62 Which of the following is used to submit a Spark job to a cluster?
spark-submit
spark-run
spark-apply
spark-launch
Q63
Q63 Which command is used to execute a Pig script?
pig -run
pig -execute
pig -script
pig
Q64
Q64 A Spark job is failing because it cannot allocate enough memory. What could be the most likely cause?
Small dataset
Insufficient memory allocation
Corrupt data
Network issues
Q65
Q65 A Hive query is taking longer than expected to run. What could be the possible cause?
Small dataset
Improper indexing
Too many joins
Incorrect partitioning
Q66
Q66 A Pig job is failing due to a data skew. What could be the most likely reason for the failure?
Improper data partitioning
Network latency
Corrupt data
Small dataset
Q67
Q67 What is the primary goal of data mining in Big Data?
Data compression
Pattern discovery
Data encryption
Data storage
Q68
Q68 Which machine learning algorithm is best suited for classifying data into distinct categories?
K-means clustering
Linear regression
Decision tree
K-nearest neighbors
Q69
Q69 What is the purpose of feature selection in machine learning?
To reduce the number of input variables
To increase the number of training samples
To split the dataset
To enhance data visualization
Q70
Q70 How does unsupervised learning differ from supervised learning?
It uses labeled data
It is used for classification
It does not require labeled data
It does not handle large datasets
Q71
Q71 Which machine learning model is best suited for identifying non-linear relationships in Big Data?
Logistic regression
K-means clustering
Neural networks
Linear regression
Q72
Q72 Which Python library is commonly used for machine learning tasks in Big Data?
Pandas
NumPy
Scikit-learn
Matplotlib
Q73
Q73 In Python, which function is used to split a dataset into training and testing sets?
train_test_split
split_data
random_split
train_test_data
Q74
Q74 Which type of neural network is commonly used for image recognition tasks?
Convolutional neural network
Recurrent neural network
Feedforward neural network
Generative adversarial network
Q75
Q75 A machine learning model is overfitting. What could be a possible solution?
Use a larger dataset
Reduce the number of features
Increase model complexity
Reduce the training data
Q76
Q76 A clustering algorithm is producing poor results in Big Data analysis. What could be the cause?
Too few clusters
Too many training samples
Incorrect feature scaling
Lack of training data
Q77
Q77 A neural network model is taking too long to train. What could be the most likely cause?
Small dataset
Too many hidden layers
Low learning rate
Too many output nodes
Q78
Q78 What is the primary advantage of using NoSQL databases over relational databases?
ACID compliance
Scalability
Data normalization
Primary key usage
Q79
Q79 Which type of data structure does MongoDB use to store records?
Tables
Documents
Key-value pairs
Nodes
Q80
Q80 How does Cassandra ensure high availability in a distributed environment?
By using a master-slave architecture
By replicating data across nodes
By using in-memory storage
By compressing data
Q81
Q81 What is a key feature of NoSQL databases like Cassandra and MongoDB?
Strong consistency
Vertical scalability
Horizontal scalability
Limited data types
Q82
Q82 How does MongoDB handle relationships between data?
By using joins
By embedding documents
By using keys and indexes
By using a relational model
Q83
Q83 Which MongoDB command is used to insert a document into a collection?
insertOne
addDocument
insertInto
saveDocument
Q84
Q84 Which command in Cassandra is used to create a new keyspace?
CREATE TABLE
CREATE DATABASE
CREATE KEYSPACE
CREATE COLUMNFAMILY
Q85
Q85 How do you query a specific field in a MongoDB document using the find() method?
db.collection.find({field: value})
db.collection.get({field: value})
db.collection.query({field: value})
db.collection.lookup({field: value})
Q86
Q86 A MongoDB query is returning no results, but the collection has documents. What could be the issue?
Incorrect query syntax
Corrupt database
Lack of indexes
Database size limit exceeded
Q87
Q87 A Cassandra node is consistently failing. What could be the most likely cause?
Insufficient disk space
Lack of data replication
Too many concurrent queries
Improper partitioning
Q88
Q88 A MongoDB collection is performing slowly. What is the likely cause?
Lack of indexes
Too much data replication
Incorrect data types
Small dataset
Q89
Q89 What is the primary goal of data analytics in Big Data?
Data visualization
Data encryption
Insight discovery
Data storage
Q90
Q90 Which of the following is a common tool used for data analytics in Big Data?
Apache Spark
MongoDB
HDFS
NoSQL