TUTORIALS

Hadoop Multiple Choice Questions (MCQs)

1 .
Topic:  Introduction
Question:
Who created Hadoop?
A
Doug Cutting
B
JetBrains
C
Graydon Hoare
D
Robert Griesemer
Answer: A
2 .
Topic:  Introduction
Question:
What is a data flow language and execution environment used for exploring very large datasets in Hadoop?
A
HBase
B
Hive
C
HDFS
D
Pig
Answer: D
3 .
Topic:  Introduction
Question:
Which tool is used to efficiently move data between relational databases and HDFS?
A
Hive
B
Pig
C
Sqoop
D
Hbase
Answer: C
4 .
Topic:  Introduction
Question:
Which among the following is a distributed data warehouse in Hadoop?
A
Hive
B
Pig
C
Sqoop
D
Hbase
Answer: A
5 .
Topic:  MapReduce
Question:
Which among the following is correct? S1: MapReduce is a programming model for data processing S2: Hadoop can run MapReduce programs written in various languages S3: MapReduce programs are inherently parallel
A
S1 and S2
B
S2 and S3
C
S1 and S3
D
S1, S2 and S3
Answer: D
6 .
Topic:  MapReduce
Question:
Mapper class is
A
generic type
B
abstract type
C
static type
D
final
Answer: A
7 .
Topic:  MapReduce
Question:
Which package provides the basic types of Hadoop?
A
org.apache.hadoop.io
B
org.apache.hadoop.util
C
org.apache.hadoop.type
D
org.apache.hadoop.lang
Answer: A
8 .
Topic:  MapReduce
Question:
Which among the following does the Job control in Hadoop?
A
Mapper class
B
Reducer class
C
Task class
D
Job class
Answer: D
9 .
Topic:  MapReduce
Question:
Hadoop runs the jobs by dividing them into
A
maps
B
tasks
C
individual files
D
None of these
Answer: B
Explanation: Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks
10 .
Topic:  MapReduce
Question:
Which are the two nodes that control the job execution process of Hadoop?
A
Job Tracker and Task Tracker
B
Map Tracker and Reduce Tracker
C
Map Tracker and Job Tracker
D
Map Tracker and Task Tracker
Answer: A
Explanation: There are two types of nodes that control the job execution process: a Job Tracker and a number of Task Trackers. The Job Tracker coordinates all the jobs run on the system by scheduling tasks to run on Task Trackers
11 .
Topic:  MapReduce
Question:
Which among the following schedules tasks to be run?
A
Job Tracker
B
Task Tracker
C
Job Scheduler
D
Task Controller
Answer: A
Explanation: There are two types of nodes that control the job execution process: a Job Tracker and a number of Task Trackers. The Job Tracker coordinates all the jobs run on the system by scheduling tasks to run on Task Trackers
12 .
Topic:  MapReduce
Question:
What are fixed size pieces of MapReduce job called?
A
records
B
splits
C
tasks
D
maps
Answer: B
Explanation: Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits, or just splits. Hadoop creates one map task for each split, which runs the userdefined map function for each record in the split.
13 .
Topic:  MapReduce
Question:
Where is the output of map tasks written?
A
local disk
B
HDFS
C
File System
D
secondary storge
Answer: A
14 .
Topic:  MapReduce
Question:
Which among the following is responsible for processing one or more chunks of data and producing the output results.
A
Maptask
B
jobtask
C
Mapper class
D
Reducetask
Answer: A
15 .
Topic:  MapReduce
Question:
Which acts as an interface between Hadoop and the program written?
A
Hadoop Cluster
B
Hadoop Streams
C
Hadoop Sequencing
D
Hadoop Streaming
Answer: D
16 .
Topic:  MapReduce
Question:
What are Hadoop Pipes?
A
Java interface to Hadoop MapReduce
B
C++ interface to Hadoop MapReduce
C
Ruby interface to Hadoop MapReduce
D
Python interface to Hadoop MapReduce
Answer: B
17 .
Topic:  HDFS
Question:
What does Hadoop Common Package contain?
A
war files
B
msi files
C
jar files
D
exe files
Answer: C
Explanation: The Hadoop Common package contains the Java Archive (JAR) files and scripts needed to start Hadoop.
18 .
Topic:  HDFS
Question:
Which among the following is the master node?
A
Name Node
B
Data Node
C
Job Node
D
Task Node
Answer: A
Explanation: The master node can track files, manage the file system and has the metadata of all of the stored data within it.
19 .
Topic:  HDFS
Question:
Which among the following is the slave node?
A
Name Node
B
Data Node
C
Job Node
D
Task Node
Answer: B
Explanation: Data Node are slave daemons which stores the actual data into HDFS which is responsible for the client to read and write.
20 .
Topic:  HDFS
Question:
Which acts as a checkpoint node in HDFS?
A
Name Node
B
Data Node
C
Secondary Name Node
D
Secondary Data Node
Answer: C
Explanation: It is the helper Node for the Name Node and its job is only to take care of the checkpoints of the file system metadata present in the Name Node
21 .
Topic:  HDFS
Question:
Which among the following holds the location of data?
A
Name Node
B
Data Node
C
Job Tracker
D
Task Tracker
Answer: A
Explanation: name Node is the master Node which keep tracks of files, file system, name and location of data and other details of the data.
22 .
Topic:  MapReduce
Question:
What is the process of applying the code received by the JobTracker on the file called?
A
Naming
B
Tracker
C
Mapper
D
Reducer
Answer: A
Explanation: Task Trackerthe Slave Node for the Job Tracker and it will take the task from the Job Tracker. Task Tracker will take the codereceived from the Job Tracker and applies on the file. The process of applying that code on the file is known as Mapper
23 .
Topic:  MapReduce
Question:
In which mode should Hadoop run in order to run pipes job?
A
distributed mode
B
centralized mode
C
pseudo distributed mode
D
parallel mode
Answer: C
Explanation: pseudo distributed mode is a mode where all the daemons run on the local machine.Pipes doesn’t run in standalone mode, since it relies on Hadoop’s distributed cache mechanism
24 .
Topic:  HDFS
Question:
Which among the following manages the filesystem namespace?
A
Name Node
B
Data Node
C
Secondary Node
D
Primary Node
Answer: A
Explanation: The namenode is the master node which manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree
25 .
Topic:  HDFS
Question:
Which among the following acts as a workhorses of the filesystem?
A
Name Node
B
Data Node
C
Secondary Node
D
Primary Node
Answer: B
Explanation: Data nodes are slave nodes store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing.
26 .
Topic:  HDFS
Question:
Which of the following are correct? S1: Namespace volumes are independent of each other S2: Namespace volumes are manages by namenode
A
S1 only
B
S2 only
C
Both S1 and S2
D
Neither S1 nor S2
Answer: C
Explanation: each namenode manages a namespace volume, which is made up of the metadata for the namespace, and a block pool containing all the blocks for the files in the namespace. Namespace volumes are independent of each other, which means namenodes do not communicate with one another, and furthermore the failure of one namenode does not affect the availability of the namespaces managed by other namenodes
27 .
Topic:  HDFS
Question:
Which among the following architectural changes need to attain High availability in HDFS?
A
Clients must be configured to handle namenode failover
B
Datanodes must send block reports to both namenodes since the block mappings are stored in a namenode’s memory, and not on disk
C
namenodes must use highly-available shared storage to share the edit log
D
All of the above
Answer: D
28 .
Topic:  HDFS
Question:
Which controller in HDFS manages the transition from the active namenode to the standby?
A
failover controller
B
recovery controller
C
failsafe controller
D
fencing controller
Answer: A
29 .
Topic:  HDFS
Question:
Which among the following is not an fencing mechanism employed by system in HDFS?
A
killing the namenode’s process
B
disabling namenode's network port via a remote management command
C
revoking namenode's access to the shared storage directory
D
None of the above
Answer: D
30 .
Topic:  HDFS
Question:
What is the value of the property dfs.replication et in case of pseudo distributed mode?
A
0
B
1
C
null
D
yes
Answer: B
31 .
Topic:  HDFS
Question:
What is the minimum amount of data that a disk can read or write in HDFS?
A
block size
B
byte size
C
heap
D
None
Answer: A
32 .
Topic:  HDFS
Question:
What is the default size of HDFS blocks?
A
32 MB
B
64 MB
C
128 MB
D
256 MB
Answer: C
33 .
Topic:  HDFS
Question:
Which HDFS command checks file system and lists the blocks?
A
hfsck
B
fcsk
C
fblock
D
fsck
Answer: D
34 .
Topic:  HDFS
Question:
What is an administered group used to manage cache permissions and resource usage?
A
Cache pools
B
block pool
C
Namenodes
D
HDFS Cluster
Answer: A
35 .
Topic:  HDFS
Question:
Which object encapsulates a client or server's configuration?
A
File Object
B
Configuration object
C
Path Object
D
Stream Object
Answer: B
36 .
Topic:  HDFS
Question:
Which interface permits seeking to a position in the file and provides a query method for the current offset from the start of the file?
A
DataStream
B
Seekable
C
PositionedReadable
D
Progressable
Answer: B
37 .
Topic:  HDFS
Question:
Which method is used to list the contents of a directory?
A
listFiles
B
listContents
C
listStatus
D
listPaths
Answer: C
38 .
Topic:  HDFS
Question:
What is the operation that use wildcard characters to match multiple files with a single expression called?
A
globbing
B
pattern matching
C
regex
D
regexfilter
Answer: A
39 .
Topic:  HDFS
Question:
What does the globStatus() methods return?
A
an array of FileStatus objects
B
an array of ListStatus objects
C
an array of PathStatus objects
D
an array of FilterStatus objects
Answer: A
40 .
Topic:  HDFS
Question:
What does the glob question mark(?) matches?
A
zero or more characters
B
one or more characters
C
a single character
D
metacharacter
Answer: C
41 .
Topic:  HDFS
Question:
Which method on FileSystem is used to permanently remove files or directories?
A
remove()
B
rm()
C
del()
D
delete()
Answer: D
42 .
Topic:  HDFS
Question:
Which streams the packets to the first datanode in the pipeline?
A
DataStreamer
B
FileStreamer
C
InputStreamer
D
PathStreamer
Answer: A
43 .
Topic:  HDFS
Question:
Which queue is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas?
A
ack queue
B
data queue
C
path queue
D
stream queue
Answer: B
44 .
Topic:  HDFS
Question:
Which command is used to copy files/directories?
A
distcp
B
hcp
C
copy
D
cp
Answer: A
45 .
Topic:  HDFS
Question:
Which flag is used with distcp to delete any files or directories from the destination?
A
-remove
B
-rm
C
-del
D
-delete
Answer: D
46 .
Topic:  YARN
Question:
Which among the following is Hadoop's cluster resource management system?
A
GLOB
B
YARN
C
ARM
D
SPARK
Answer: B
47 .
Topic:  YARN
Question:
Which of the following processing framework interacts with YARN directly?
A
Pig
B
Hive
C
Crunch
D
None of these
Answer: D
48 .
Topic:  YARN
Question:
Which of the following processing frameworks run on MapReduce?
A
Pig
B
Hive
C
Crunch
D
All of the above
Answer: D
49 .
Topic:  YARN
Question:
Which among the following are the core services of YARN?
A
resource manager and node manager
B
namenode and datanode
C
data manager and resource manager
D
data manager and application manager
Answer: A
50 .
Topic:  YARN
Question:
Which constraints can be used to request a container on a specific node or rack, or anywhere on the cluster in YARN?
A
Container constraints
B
Space constraints
C
Locality constraints
D
Resource constraints
Answer: C
51 .
Topic:  YARN
Question:
Which among the following can be used to model YARN applications?
A
one application per user job
B
run one application per workflow
C
long-running application that is shared by different users
D
All of the above
Answer: D
52 .
Topic:  YARN
Question:
Which follows one application per user job model?
A
MapReduce
B
Spark
C
Apache Slider
D
Samza
Answer: A
53 .
Topic:  YARN
Question:
Which application runs per user session?
A
MapReduce
B
Spark
C
Apache Slider
D
None of the above
Answer: B
54 .
Topic:  YARN
Question:
Which among the following has a long-running application master for launching other applications on the cluster?
A
MapReduce
B
Spark
C
Apache Slider
D
None of the above
Answer: C
55 .
Topic:  YARN
Question:
Which among the following can be used for stream processing?
A
Spark
B
Samza
C
Storm
D
All of the above
Answer: D
56 .
Topic:  YARN
Question:
Which provides a simple programming model for developing distributed applications on YARN?
A
Apache Slider
B
Apache Twill
C
Spark
D
Tez
Answer: B
57 .
Topic:  YARN
Question:
Which among the following statements are true with respect to Apache Twill? S1: Twill supports real-time logging S2: Allows the usage of a Java Runnable interface
A
S1 only
B
S2 only
C
Both S1 and S2
D
Neither S1 nor S2
Answer: C
58 .
Topic:  YARN
Question:
Which daemon control the job execution process in MapReduce 1?
A
jobtracker
B
tasktrackers
C
Both jobtracker and tasktrackers
D
Name node and data node
Answer: C
59 .
Topic:  YARN
Question:
Which among the following coordinates all the jobs run on the system by scheduling tasks in MapReduce 1?
A
jobtracker
B
tasktrackers
C
data node
D
Name node
Answer: A
60 .
Topic:  YARN
Question:
Which of the following which keeps a record of the overall progress of each job in MapReduce 1?
A
jobtracker
B
tasktrackers
C
data node
D
Name node
Answer: A
61 .
Topic:  YARN
Question:
Which among the following run tasks and send progress reports in MapReduce 1?
A
jobtracker
B
tasktrackers
C
data node
D
Name node
Answer: B
62 .
Topic:  YARN
Question:
Choose the tasks of jobtracker in MapReduce 1?
A
job scheduling
B
task progress monitoring
C
task bookkeeping
D
All of the above
Answer: D
63 .
Topic:  YARN
Question:
Which is responsible for storing job history in MapReduce 1?
A
jobtracker
B
tasktrackers
C
data node
D
Name node
Answer: A
64 .
Topic:  YARN
Question:
In YARN, the responsibility of jobtracker is handled by
A
Resource manager
B
application master
C
timeline server
D
All of the above
Answer: D
65 .
Topic:  YARN
Question:
In YARN, the responsibility of tasktracker is handled by
A
Resource manager
B
application master
C
timeline server
D
Node manager
Answer: D
66 .
Topic:  YARN
Question:
Which stores the application history in YARN?
A
Resource manager
B
application master
C
timeline server
D
Node manager
Answer: C
67 .
Topic:  YARN
Question:
Which among the following are the features of YARN?
A
Scalability
B
Multitenancy
C
Availability
D
All of the above
Answer: D
68 .
Topic:  YARN
Question:
Which among the following schedulers available in YARN?
A
FIFO
B
Shortest Job First
C
Round Robin
D
Shortest Remaining Time
Answer: A
69 .
Topic:  YARN
Question:
Which are/is the schedulers available in YARN?
A
FIFO
B
Capacity
C
Fair Schedulers
D
All of the above
Answer: D
70 .
Topic:  YARN
Question:
Which among the following schedulers attempts to allocate resources so that all running applications get the same share of resources in YARN?
A
FIFO
B
Capacity
C
Fair Schedulers
D
Round Robin
Answer: C
71 .
Topic:  YARN
Question:
Which among the following schedulers provides queue elasticity in YARN?
A
FIFO
B
Capacity
C
Fair Schedulers
D
Round Robin
Answer: B
72 .
Topic:  YARN
Question:
Which among the following schedulers in YARN is used by default?
A
FIFO
B
Capacity
C
Fair Schedulers
D
Round Robin
Answer: B
73 .
Topic:  YARN
Question:
In which xml, is the default configuration of schedulers to be changed?
A
yarn-site.xml
B
config.xml
C
scheduler.xml
D
yarn-scheduler.xml
Answer: A
74 .
Topic:  YARN
Question:
Which among the following queue scheduling policies are/is supported by Fair Schedulers in YARN?
A
FIFO
B
Dominant Resource Fairness
C
preemption
D
All of the above
Answer: D
75 .
Topic:  YARN
Question:
Which holds the list of rules for queue placement in Fair Scheduling?
A
queuePlacementPolicy
B
rulePlacementolicy
C
scheduleQueuePolicy
D
schedulingPolicy
Answer: A
76 .
Topic:  YARN
Question:
Which of the setting is used to set preemption globally?
A
yarn.scheduler.fair.preemption = true
B
yarn.scheduler.preemption = true
C
yarn.scheduler.global.preemption = true
D
yarn.scheduler.enable.preemption = true
Answer: A
77 .
Topic:  YARN
Question:
Which among the following supports delay scheduling?
A
FIFO
B
Capacity Scheduler
C
Fair Scheduler
D
Both Capacity and Fair Scheduler
Answer: D
78 .
Topic:  YARN
Question:
What is the default period of heartbeat request sent by node manager?
A
one per millisecond
B
one per second
C
one per minute
D
one per nanosecond
Answer: B
79 .
Topic:  Hadoop I/O
Question:
Which error detection code is used in HDFS?
A
CRC-32
B
CRC-32C
C
SHA
D
SHA-1
Answer: B
80 .
Topic:  Hadoop I/O
Question:
CRC-32C has the storage overhead
A
less than 1%
B
less than 5%
C
less than 10%
D
less than 2.5%
Answer: A
81 .
Topic:  Hadoop I/O
Question:
Which is responsible to verify the data and its checksum?
A
Datanode
B
application manager
C
resource manager
D
Namenode
Answer: A
82 .
Topic:  YARN
Question:
The heartbeat signal are sent from
A
Jobtracker to Tasktracker
B
Tasktracker to Job tracker
C
Jobtracker to namenode
D
Tasktracker to namenode
Answer: B
83 .
Topic: 
Question:
A
B
C
D
84 .
Topic: 
Question:
A
B
C
D
85 .
Topic: 
Question:
A
B
C
D
86 .
Topic:  Basics
Question:
Data locality feature in Hadoop means
A
Store the same data across multiple nodes.
B
Relocate the data from one node to another.
C
Co-locate the data with the computing nodes.
D
Distribute the data across multiple nodes.
Answer: C
87 .
Topic:  Basics
Question:
In order to apply a combiner, what is one property that has to be satisfied by the values emitted from the mapper?
A
Combiner can be applied always to any data
B
Output of the mapper and output of the combiner has to be same key value pair and they can be heterogeneous
C
Output of the mapper and output of the combiner has to be same key value pair.
D
Only if the values satisfy associative and commutative property it can be done.
Answer: C
88 .
Topic:  Basics
Question:
HDFS block size is larger as compared to the size of the disk blocks so that
A
Only HDFS files can be stored in the disk used.
B
The seek time is maximum
C
Transfer of a large files made of multiple disk blocks is not possible.
D
A single file larger than the disk size can be stored across many disks in the cluster.
Answer: D
89 .
Topic:  Basics
Question:
When a client communicates with the HDFS file system, it needs to communicate with
A
only the namenode
B
only the data node
C
both the namenode and datanode
D
None of these
Answer: C
90 .
Topic:  Basics
Question:
When running on a pseudo distributed mode the replication factor is set to
A
2
B
1
C
0
D
3
Answer: B
91 .
Topic:  Basics
Question:
Which of the below property gets configured on mapred-site.xml?
A
Replication factor
B
Directory names to store hdfs files.
C
Host and port where MapReduce task runs.
D
Java Environment variables.
Answer: C
92 .
Topic:  Basics
Question:
The role of a Journal node is to
A
Report the location of the blocks in a data node
B
Report the edit log information of the blocks in the data node.
C
Report the Schedules when the jobs are going to run
D
Report the activity of various components handled by resource manager
Answer: B
93 .
Topic:  Basics
Question:
The command "hadoop fs -test -z URI" gives the result 0 if
A
if the path is a directory
B
if the path is a file
C
if the path is not empty
D
if the file is zero length
Answer: D
94 .
Topic:  Basics
Question:
The comman used to copy a directory form one node to another in HDFS is
A
rcp
B
dcp
C
drcp
D
distcp
Answer: D
95 .
Topic:  Basics
Question:
In a HDFS system with block size 64MB we store a file which is less than 64MB. Which of the following is true?
A
The file will consume 64MB
B
The file will consume more than 64MB
C
The file will consume less than 64MB
D
Cannot be predicted.
Answer: C
96 .
Topic:  Basics
Question:
What mechanisms Hadoop uses to make namenode resilient to failure?
A
Take backup of filesystem metadata to a local disk and a remote NFS mount.
B
Store the filesystem metadata in cloud.
C
Use a machine with at least 12 CPUs
D
Using expensive and reliable hardware.
Answer: A
97 .
Topic:  Basics
Question:
Which of this is not a scheduler options available with YARN?
A
Optimal Scheduler
B
FIFO scheduler
C
Capacity scheduler
D
Fair scheduler
Answer: A
98 .
Topic:  Basics
Question:
The hdfs command put is used to
A
Copy files from local file system to HDFS.
B
Copy files or directories from local file system to HDFS.
C
Copy files from from HDFS to local filesystem.
D
Copy files or directories from HDFS to local filesystem.
Answer: B
99 .
Topic:  Basics
Question:
The input split used in MapReduce indicates
A
The average size of the data blocks used as input for the program
B
The location details of where the first whole record in a block begins and the last whole record in the block ends.
C
Splitting the input data to a MapReduce program into a size already configured in the mapred-site.xml
D
None of these
Answer: B
100 .
Topic:  Basics
Question:
The command "hadoop fs -test -z URI" gives the result 0 if
A
the path is a directory
B
the path is a file
C
the path is not empty
D
the file is zero length
Answer: D
101 .
Topic:  Basics
Question:
All the files in a directory in HDFS can be merged together using
A
getmerge
B
putmerge
C
remerge
D
mergeall
Answer: A
102 .
Topic:  Basics
Question:
Job tracker runs on
A
Namenode
B
Datanode
C
Secondary namenode
D
Secondary datanode
Answer: A
103 .
Topic:  Basics
Question:
Which one of the following is not true regarding to Hadoop?
A
It is a distributed framework.
B
The main algorithm used in it is Map Reduce
C
It runs with commodity hard ware
D
All are true
Answer: D