Q61
Q61 Which command in HBase is used to scan all records from a specific table?
scan 'table_name'
select * from 'table_name'
get 'table_name', 'row'
list 'table_name'
Q62
Q62 How do you create a new table in Hive?
CREATE TABLE table_name (columns)
NEW TABLE table_name (columns)
CREATE HIVE table_name (columns)
INITIALIZE TABLE table_name (columns)
Q63
Q63 What is the primary command to view the status of a job in Oozie?
oozie job -info job_id
oozie -status job_id
oozie list job_id
oozie -jobinfo job_id
Q64
Q64 What functionality does the sqoop merge command provide?
Merging two Hadoop clusters
Merging results from different queries
Merging two datasets in HDFS
Merging updates from an RDBMS into an existing Hadoop dataset
Q65
Q65 What should you verify first if a Sqoop import fails?
The database connection settings
The format of the imported data
The version of Sqoop
The cluster status
Q66
Q66 If a Hive query runs significantly slower than expected, what should be checked first?
The structure of the tables and indexes
The configuration of the Hive server
The data size being processed
The network connectivity between Hive and HDFS
Q67
Q67 What is Hive mainly used for in the Hadoop ecosystem?
Data warehousing
Real-time processing
Data encryption
Stream processing
Q68
Q68 How does Hive handle data storage?
It uses its own file system
It utilizes HDFS
It relies on external databases
It stores data in a proprietary format
Q69
Q69 What type of data models does Hive support?
Only structured data
Structured and unstructured data
Only unstructured data
Structured, unstructured, and semi-structured data
Q70
Q70 Which Hive component is responsible for converting SQL queries into MapReduce jobs?
Hive Editor
Hive Compiler
Hive Driver
Hive Metastore
Q71
Q71 How does partitioning in Hive improve query performance?
By decreasing the size of data scans
By increasing data redundancy
By simplifying data complexities
By reducing network traffic
Q72
Q72 What is the correct HiveQL command to list all tables in the database?
SHOW TABLES
LIST TABLES
DISPLAY TABLES
VIEW TABLES
Q73
Q73 How do you add a new column to an existing Hive table?
ALTER TABLE table_name ADD COLUMNS (new_column type)
UPDATE TABLE table_name SET new_column type
ADD COLUMN TO table_name (new_column type)
MODIFY TABLE table_name ADD (new_column type)
Q74
Q74 In Hive, which command would you use to change the data type of a column in a table?
ALTER TABLE table_name CHANGE COLUMN old_column new_column new_type
ALTER TABLE table_name MODIFY COLUMN old_column new_type
CHANGE TABLE table_name COLUMN old_column TO new_type
RETYPE TABLE table_name COLUMN old_column new_type
Q75
Q75 How can you optimize a Hive query to limit the number of MapReduce jobs it generates?
Use multi-table inserts whenever possible
Reduce the number of output columns
Use fewer WHERE clauses
Increase the amount of memory allocated
Q76
Q76 What is a common fix if a Hive query returns incorrect results?
Reboot the Hive server
Re-index the data
Check and correct the query logic
Increase the JVM memory for Hive
Q77
Q77 What should you check if a Hive job is running longer than expected without errors?
The complexity of the query
The configuration parameters for resource allocation
The data volume being processed
The network connectivity
Q78
Q78 What is Pig primarily used for in the Hadoop ecosystem?
Data transformations
Real-time analytics
Data encryption
Stream processing
Q79
Q79 What makes Pig different from traditional SQL in processing data?
Pig processes data iteratively and allows multiple outputs from a single query.
Pig only allows batch processing.
Pig supports fewer data types.
Pig requires explicit data loading.
Q80
Q80 In Pig, what is the difference between 'STORE' and 'DUMP'?
'STORE' writes the output to the filesystem, while 'DUMP' displays the output on the screen.
'STORE' and 'DUMP' both write data to the filesystem but in different formats.
'DUMP' writes data in compressed format, while 'STORE' does not compress data.
Both commands are used for debugging only.
Q81
Q81 How does Pig handle schema-less data?
By inferring the schema at runtime.
By converting all inputs to strings.
By requiring manual schema definition before processing.
By rejecting schema-less data.
Q82
Q82 How can Pig scripts be optimized to handle large datasets more efficiently?
By increasing memory allocation for each task.
By using parallel processing directives.
By minimizing data read operations.
By rewriting scripts in Java.
Q83
Q83 What Pig command is used to load data from a file?
LOAD 'data.txt' AS (line);
IMPORT 'data.txt';
OPEN 'data.txt';
READ 'data.txt';
Q84
Q84 How do you group data by a specific column in Pig?
GROUP data BY column;
COLLECT data BY column;
AGGREGATE data BY column;
CLUSTER data BY column;
Q85
Q85 What Pig function aggregates data to find the total?
SUM(data.column);
TOTAL(data.column);
AGGREGATE(data.column, 'total');
ADD(data.column);
Q86
Q86 How do you filter rows in Pig that match a specific condition?
FILTER data BY condition;
SELECT data WHERE condition;
EXTRACT data IF condition;
FIND data MATCHING condition;
Q87
Q87 What is the first thing you should check if a Pig script fails due to an out-of-memory error?
The data sizes being processed.
The number of reducers.
The script's syntax.
The JVM settings.
Q88
Q88 If a Pig script is unexpectedly slow, what should be checked first to improve performance?
The script's logical plan.
The amount of data being processed.
The network latency.
The disk I/O operations.
Q89
Q89 What is the primary storage model used by HBase?
Row-oriented
Column-oriented
Graph-based
Key-value pairs
Q90
Q90 How does HBase handle scalability?
Through horizontal scaling by adding more nodes
Through vertical scaling by adding more hardware to existing nodes
By increasing the block size in HDFS
By partitioning data into more manageable pieces