Hadoop Multiple Choice Questions

1 .

Topic: Introduction

Question:

Who created Hadoop?

A

Doug Cutting

B

JetBrains

C

Graydon Hoare

D

Robert Griesemer

Answer: A

2 .

Topic: Introduction

Question:

What is a data flow language and execution environment used for exploring very large datasets in Hadoop?

A

HBase

B

Hive

C

HDFS

D

Pig

Answer: D

3 .

Topic: Introduction

Question:

Which tool is used to efficiently move data between relational databases and HDFS?

A

Hive

B

Pig

C

Sqoop

D

Hbase

Answer: C

4 .

Topic: Introduction

Question:

Which among the following is a distributed data warehouse in Hadoop?

A

Hive

B

Pig

C

Sqoop

D

Hbase

Answer: A

5 .

Topic: MapReduce

Question:

Which among the following is correct? S1: MapReduce is a programming model for data processing S2: Hadoop can run MapReduce programs written in various languages S3: MapReduce programs are inherently parallel

A

S1 and S2

B

S2 and S3

C

S1 and S3

D

S1, S2 and S3

Answer: D

6 .

Topic: MapReduce

Question:

Mapper class is

A

generic type

B

abstract type

C

static type

D

final

Answer: A

7 .

Topic: MapReduce

Question:

Which package provides the basic types of Hadoop?

A

org.apache.hadoop.io

B

org.apache.hadoop.util

C

org.apache.hadoop.type

D

org.apache.hadoop.lang

Answer: A

8 .

Topic: MapReduce

Question:

Which among the following does the Job control in Hadoop?

A

Mapper class

B

Reducer class

C

Task class

D

Job class

Answer: D

9 .

Topic: MapReduce

Question:

Hadoop runs the jobs by dividing them into

A

maps

B

tasks

C

individual files

D

None of these

Answer: B

Explanation: Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks

10 .

Topic: MapReduce

Question:

Which are the two nodes that control the job execution process of Hadoop?

A

Job Tracker and Task Tracker

B

Map Tracker and Reduce Tracker

C

Map Tracker and Job Tracker

D

Map Tracker and Task Tracker

Answer: A

Explanation: There are two types of nodes that control the job execution process: a Job Tracker and a number of Task Trackers. The Job Tracker coordinates all the jobs run on the system by scheduling tasks to run on Task Trackers

11 .

Topic: MapReduce

Question:

Which among the following schedules tasks to be run?

A

Job Tracker

B

Task Tracker

C

Job Scheduler

D

Task Controller

Answer: A

Explanation: There are two types of nodes that control the job execution process: a Job Tracker and a number of Task Trackers. The Job Tracker coordinates all the jobs run on the system by scheduling tasks to run on Task Trackers

12 .

Topic: MapReduce

Question:

What are fixed size pieces of MapReduce job called?

A

records

B

splits

C

tasks

D

maps

Answer: B

Explanation: Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits, or just splits. Hadoop creates one map task for each split, which runs the userdefined map function for each record in the split.

13 .

Topic: MapReduce

Question:

Where is the output of map tasks written?

A

local disk

B

HDFS

C

File System

D

secondary storge

Answer: A

14 .

Topic: MapReduce

Question:

Which among the following is responsible for processing one or more chunks of data and producing the output results.

A

Maptask

B

jobtask

C

Mapper class

D

Reducetask

Answer: A

15 .

Topic: MapReduce

Question:

Which acts as an interface between Hadoop and the program written?

A

Hadoop Cluster

B

Hadoop Streams

C

Hadoop Sequencing

D

Hadoop Streaming

Answer: D

16 .

Topic: MapReduce

Question:

What are Hadoop Pipes?

A

Java interface to Hadoop MapReduce

B

C++ interface to Hadoop MapReduce

C

Ruby interface to Hadoop MapReduce

D

Python interface to Hadoop MapReduce

Answer: B

17 .

Topic: HDFS

Question:

What does Hadoop Common Package contain?

A

war files

B

msi files

C

jar files

D

exe files

Answer: C

Explanation: The Hadoop Common package contains the Java Archive (JAR) files and scripts needed to start Hadoop.

18 .

Topic: HDFS

Question:

Which among the following is the master node?

A

Name Node

B

Data Node

C

Job Node

D

Task Node

Answer: A

Explanation: The master node can track files, manage the file system and has the metadata of all of the stored data within it.

19 .

Topic: HDFS

Question:

Which among the following is the slave node?

A

Name Node

B

Data Node

C

Job Node

D

Task Node

Answer: B

Explanation: Data Node are slave daemons which stores the actual data into HDFS which is responsible for the client to read and write.

20 .

Topic: HDFS

Question:

Which acts as a checkpoint node in HDFS?

A

Name Node

B

Data Node

C

Secondary Name Node

D

Secondary Data Node

Answer: C

Explanation: It is the helper Node for the Name Node and its job is only to take care of the checkpoints of the file system metadata present in the Name Node

21 .

Topic: HDFS

Question:

Which among the following holds the location of data?

A

Name Node

B

Data Node

C

Job Tracker

D

Task Tracker

Answer: A

Explanation: name Node is the master Node which keep tracks of files, file system, name and location of data and other details of the data.

22 .

Topic: MapReduce

Question:

What is the process of applying the code received by the JobTracker on the file called?

A

Naming

B

Tracker

C

Mapper

D

Reducer

Answer: A

Explanation: Task Trackerthe Slave Node for the Job Tracker and it will take the task from the Job Tracker. Task Tracker will take the codereceived from the Job Tracker and applies on the file. The process of applying that code on the file is known as Mapper

23 .

Topic: MapReduce

Question:

In which mode should Hadoop run in order to run pipes job?

A

distributed mode

B

centralized mode

C

pseudo distributed mode

D

parallel mode

Answer: C

Explanation: pseudo distributed mode is a mode where all the daemons run on the local machine.Pipes doesn’t run in standalone mode, since it relies on Hadoop’s distributed cache mechanism

24 .

Topic: HDFS

Question:

Which among the following manages the filesystem namespace?

A

Name Node

B

Data Node

C

Secondary Node

D

Primary Node

Answer: A

Explanation: The namenode is the master node which manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree

25 .

Topic: HDFS

Question:

Which among the following acts as a workhorses of the filesystem?

A

Name Node

B

Data Node

C

Secondary Node

D

Primary Node

Answer: B

Explanation: Data nodes are slave nodes store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing.

26 .

Topic: HDFS

Question:

Which of the following are correct? S1: Namespace volumes are independent of each other S2: Namespace volumes are manages by namenode

A

S1 only

B

S2 only

C

Both S1 and S2

D

Neither S1 nor S2

Answer: C

Explanation: each namenode manages a namespace volume, which is made up of the metadata for the namespace, and a block pool containing all the blocks for the files in the namespace. Namespace volumes are independent of each other, which means namenodes do not communicate with one another, and furthermore the failure of one namenode does not affect the availability of the namespaces managed by other namenodes

27 .

Topic: HDFS

Question:

Which among the following architectural changes need to attain High availability in HDFS?

A

Clients must be configured to handle namenode failover

B

Datanodes must send block reports to both namenodes since the block mappings are stored in a namenode’s memory, and not on disk

C

namenodes must use highly-available shared storage to share the edit log

D

All of the above

Answer: D

28 .

Topic: HDFS

Question:

Which controller in HDFS manages the transition from the active namenode to the standby?

A

failover controller

B

recovery controller

C

failsafe controller

D

fencing controller

Answer: A

29 .

Topic: HDFS

Question:

Which among the following is not an fencing mechanism employed by system in HDFS?

A

killing the namenode’s process

B

disabling namenode's network port via a remote management command

C

revoking namenode's access to the shared storage directory

D

None of the above

Answer: D

30 .

Topic: HDFS

Question:

What is the value of the property dfs.replication et in case of pseudo distributed mode?

A

0

B

1

C

null

D

yes

Answer: B

31 .

Topic: HDFS

Question:

What is the minimum amount of data that a disk can read or write in HDFS?

A

block size

B

byte size

C

heap

D

None

Answer: A

32 .

Topic: HDFS

Question:

What is the default size of HDFS blocks?

A

32 MB

B

64 MB

C

128 MB

D

256 MB

Answer: C

33 .

Topic: HDFS

Question:

Which HDFS command checks file system and lists the blocks?

A

hfsck

B

fcsk

C

fblock

D

fsck

Answer: D

34 .

Topic: HDFS

Question:

What is an administered group used to manage cache permissions and resource usage?

A

Cache pools

B

block pool

C

Namenodes

D

HDFS Cluster

Answer: A

35 .

Topic: HDFS

Question:

Which object encapsulates a client or server's configuration?

A

File Object

B

Configuration object

C

Path Object

D

Stream Object

Answer: B

36 .

Topic: HDFS

Question:

Which interface permits seeking to a position in the file and provides a query method for the current offset from the start of the file?

A

DataStream

B

Seekable

C

PositionedReadable

D

Progressable

Answer: B

37 .

Topic: HDFS

Question:

Which method is used to list the contents of a directory?

A

listFiles

B

listContents

C

listStatus

D

listPaths

Answer: C

38 .

Topic: HDFS

Question:

What is the operation that use wildcard characters to match multiple files with a single expression called?

A

globbing

B

pattern matching

C

regex

D

regexfilter

Answer: A

39 .

Topic: HDFS

Question:

What does the globStatus() methods return?

A

an array of FileStatus objects

B

an array of ListStatus objects

C

an array of PathStatus objects

D

an array of FilterStatus objects

Answer: A

40 .

Topic: HDFS

Question:

What does the glob question mark(?) matches?

A

zero or more characters

B

one or more characters

C

a single character

D

metacharacter

Answer: C

41 .

Topic: HDFS

Question:

Which method on FileSystem is used to permanently remove files or directories?

A

remove()

B

rm()

C

del()

D

delete()

Answer: D

42 .

Topic: HDFS

Question:

Which streams the packets to the first datanode in the pipeline?

A

DataStreamer

B

FileStreamer

C

InputStreamer

D

PathStreamer

Answer: A

43 .

Topic: HDFS

Question:

Which queue is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas?

A

ack queue

B

data queue

C

path queue

D

stream queue

Answer: B

44 .

Topic: HDFS

Question:

Which command is used to copy files/directories?

A

distcp

B

hcp

C

copy

D

cp

Answer: A

45 .

Topic: HDFS

Question:

Which flag is used with distcp to delete any files or directories from the destination?

A

-remove

B

-rm

C

-del

D

-delete

Answer: D

46 .

Topic: YARN

Question:

Which among the following is Hadoop's cluster resource management system?

A

GLOB

B

YARN

C

ARM

D

SPARK

Answer: B

47 .

Topic: YARN

Question:

Which of the following processing framework interacts with YARN directly?

A

Pig

B

Hive

C

Crunch

D

None of these

Answer: D

48 .

Topic: YARN

Question:

Which of the following processing frameworks run on MapReduce?

A

Pig

B

Hive

C

Crunch

D

All of the above

Answer: D

49 .

Topic: YARN

Question:

Which among the following are the core services of YARN?

A

resource manager and node manager

B

namenode and datanode

C

data manager and resource manager

D

data manager and application manager

Answer: A

50 .

Topic: YARN

Question:

Which constraints can be used to request a container on a specific node or rack, or anywhere on the cluster in YARN?

A

Container constraints

B

Space constraints

C

Locality constraints

D

Resource constraints

Answer: C

51 .

Topic: YARN

Question:

Which among the following can be used to model YARN applications?

A

one application per user job

B

run one application per workflow

C

long-running application that is shared by different users

D

All of the above

Answer: D

52 .

Topic: YARN

Question:

Which follows one application per user job model?

A

MapReduce

B

Spark

C

Apache Slider

D

Samza

Answer: A

53 .

Topic: YARN

Question:

Which application runs per user session?

A

MapReduce

B

Spark

C

Apache Slider

D

None of the above

Answer: B

54 .

Topic: YARN

Question:

Which among the following has a long-running application master for launching other applications on the cluster?

A

MapReduce

B

Spark

C

Apache Slider

D

None of the above

Answer: C

55 .

Topic: YARN

Question:

Which among the following can be used for stream processing?

A

Spark

B

Samza

C

Storm

D

All of the above

Answer: D

56 .

Topic: YARN

Question:

Which provides a simple programming model for developing distributed applications on YARN?

A

Apache Slider

B

Apache Twill

C

Spark

D

Tez

Answer: B

57 .

Topic: YARN

Question:

Which among the following statements are true with respect to Apache Twill? S1: Twill supports real-time logging S2: Allows the usage of a Java Runnable interface

A

S1 only

B

S2 only

C

Both S1 and S2

D

Neither S1 nor S2

Answer: C

58 .

Topic: YARN

Question:

Which daemon control the job execution process in MapReduce 1?

A

jobtracker

B

tasktrackers

C

Both jobtracker and tasktrackers

D

Name node and data node

Answer: C

59 .

Topic: YARN

Question:

Which among the following coordinates all the jobs run on the system by scheduling tasks in MapReduce 1?

A

jobtracker

B

tasktrackers

C

data node

D

Name node

Answer: A

60 .

Topic: YARN

Question:

Which of the following which keeps a record of the overall progress of each job in MapReduce 1?

A

jobtracker

B

tasktrackers

C

data node

D

Name node

Answer: A

61 .

Topic: YARN

Question:

Which among the following run tasks and send progress reports in MapReduce 1?

A

jobtracker

B

tasktrackers

C

data node

D

Name node

Answer: B

62 .

Topic: YARN

Question:

Choose the tasks of jobtracker in MapReduce 1?

A

job scheduling

B

task progress monitoring

C

task bookkeeping

D

All of the above

Answer: D

63 .

Topic: YARN

Question:

Which is responsible for storing job history in MapReduce 1?

A

jobtracker

B

tasktrackers

C

data node

D

Name node

Answer: A

64 .

Topic: YARN

Question:

In YARN, the responsibility of jobtracker is handled by

A

Resource manager

B

application master

C

timeline server

D

All of the above

Answer: D

65 .

Topic: YARN

Question:

In YARN, the responsibility of tasktracker is handled by

A

Resource manager

B

application master

C

timeline server

D

Node manager

Answer: D

66 .

Topic: YARN

Question:

Which stores the application history in YARN?

A

Resource manager

B

application master

C

timeline server

D

Node manager

Answer: C

67 .

Topic: YARN

Question:

Which among the following are the features of YARN?

A

Scalability

B

Multitenancy

C

Availability

D

All of the above

Answer: D

68 .

Topic: YARN

Question:

Which among the following schedulers available in YARN?

A

FIFO

B

Shortest Job First

C

Round Robin

D

Shortest Remaining Time

Answer: A

69 .

Topic: YARN

Question:

Which are/is the schedulers available in YARN?

A

FIFO

B

Capacity

C

Fair Schedulers

D

All of the above

Answer: D

70 .

Topic: YARN

Question:

Which among the following schedulers attempts to allocate resources so that all running applications get the same share of resources in YARN?

A

FIFO

B

Capacity

C

Fair Schedulers

D

Round Robin

Answer: C

71 .

Topic: YARN

Question:

Which among the following schedulers provides queue elasticity in YARN?

A

FIFO

B

Capacity

C

Fair Schedulers

D

Round Robin

Answer: B

72 .

Topic: YARN

Question:

Which among the following schedulers in YARN is used by default?

A

FIFO

B

Capacity

C

Fair Schedulers

D

Round Robin

Answer: B

73 .

Topic: YARN

Question:

In which xml, is the default configuration of schedulers to be changed?

A

yarn-site.xml

B

config.xml

C

scheduler.xml

D

yarn-scheduler.xml

Answer: A

74 .

Topic: YARN

Question:

Which among the following queue scheduling policies are/is supported by Fair Schedulers in YARN?

A

FIFO

B

Dominant Resource Fairness

C

preemption

D

All of the above

Answer: D

75 .

Topic: YARN

Question:

Which holds the list of rules for queue placement in Fair Scheduling?

A

queuePlacementPolicy

B

rulePlacementolicy

C

scheduleQueuePolicy

D

schedulingPolicy

Answer: A

76 .

Topic: YARN

Question:

Which of the setting is used to set preemption globally?

A

yarn.scheduler.fair.preemption = true

B

yarn.scheduler.preemption = true

C

yarn.scheduler.global.preemption = true

D

yarn.scheduler.enable.preemption = true

Answer: A

77 .

Topic: YARN

Question:

Which among the following supports delay scheduling?

A

FIFO

B

Capacity Scheduler

C

Fair Scheduler

D

Both Capacity and Fair Scheduler

Answer: D

78 .

Topic: YARN

Question:

What is the default period of heartbeat request sent by node manager?

A

one per millisecond

B

one per second

C

one per minute

D

one per nanosecond

Answer: B

79 .

Topic: Hadoop I/O

Question:

Which error detection code is used in HDFS?

A

CRC-32

B

CRC-32C

C

SHA

D

SHA-1

Answer: B

80 .

Topic: Hadoop I/O

Question:

CRC-32C has the storage overhead

A

less than 1%

B

less than 5%

C

less than 10%

D

less than 2.5%

Answer: A

81 .

Topic: Hadoop I/O

Question:

Which is responsible to verify the data and its checksum?

A

Datanode

B

application manager

C

resource manager

D

Namenode

Answer: A

82 .

Topic: YARN

Question:

The heartbeat signal are sent from

A

Jobtracker to Tasktracker

B

Tasktracker to Job tracker

C

Jobtracker to namenode

D

Tasktracker to namenode

Answer: B

83 .

Topic:

Question:

A

B

C

D

84 .

Topic:

Question:

A

B

C

D

85 .

Topic:

Question:

A

B

C

D

86 .

Topic: Basics

Question:

Data locality feature in Hadoop means

A

Store the same data across multiple nodes.

B

Relocate the data from one node to another.

C

Co-locate the data with the computing nodes.

D

Distribute the data across multiple nodes.

Answer: C

87 .

Topic: Basics

Question:

In order to apply a combiner, what is one property that has to be satisfied by the values emitted from the mapper?

A

Combiner can be applied always to any data

B

Output of the mapper and output of the combiner has to be same key value pair and they can be heterogeneous

C

Output of the mapper and output of the combiner has to be same key value pair.

D

Only if the values satisfy associative and commutative property it can be done.

Answer: C

88 .

Topic: Basics

Question:

HDFS block size is larger as compared to the size of the disk blocks so that

A

Only HDFS files can be stored in the disk used.

B

The seek time is maximum

C

Transfer of a large files made of multiple disk blocks is not possible.

D

A single file larger than the disk size can be stored across many disks in the cluster.

Answer: D

89 .

Topic: Basics

Question:

When a client communicates with the HDFS file system, it needs to communicate with

A

only the namenode

B

only the data node

C

both the namenode and datanode

D

None of these

Answer: C

90 .

Topic: Basics

Question:

When running on a pseudo distributed mode the replication factor is set to

A

2

B

1

C

0

D

3

Answer: B

91 .

Topic: Basics

Question:

Which of the below property gets configured on mapred-site.xml?

A

Replication factor

B

Directory names to store hdfs files.

C

Host and port where MapReduce task runs.

D

Java Environment variables.

Answer: C

92 .

Topic: Basics

Question:

The role of a Journal node is to

A

Report the location of the blocks in a data node

B

Report the edit log information of the blocks in the data node.

C

Report the Schedules when the jobs are going to run

D

Report the activity of various components handled by resource manager

Answer: B

93 .

Topic: Basics

Question:

The command "hadoop fs -test -z URI" gives the result 0 if

A

if the path is a directory

B

if the path is a file

C

if the path is not empty

D

if the file is zero length

Answer: D

94 .

Topic: Basics

Question:

The comman used to copy a directory form one node to another in HDFS is

A

rcp

B

dcp

C

drcp

D

distcp

Answer: D

95 .

Topic: Basics

Question:

In a HDFS system with block size 64MB we store a file which is less than 64MB. Which of the following is true?

A

The file will consume 64MB

B

The file will consume more than 64MB

C

The file will consume less than 64MB

D

Cannot be predicted.

Answer: C

96 .

Topic: Basics

Question:

What mechanisms Hadoop uses to make namenode resilient to failure?

A

Take backup of filesystem metadata to a local disk and a remote NFS mount.

B

Store the filesystem metadata in cloud.

C

Use a machine with at least 12 CPUs

D

Using expensive and reliable hardware.

Answer: A

97 .

Topic: Basics

Question:

Which of this is not a scheduler options available with YARN?

A

Optimal Scheduler

B

FIFO scheduler

C

Capacity scheduler

D

Fair scheduler

Answer: A

98 .

Topic: Basics

Question:

The hdfs command put is used to

A

Copy files from local file system to HDFS.

B

Copy files or directories from local file system to HDFS.

C

Copy files from from HDFS to local filesystem.

D

Copy files or directories from HDFS to local filesystem.

Answer: B

99 .

Topic: Basics

Question:

The input split used in MapReduce indicates

A

The average size of the data blocks used as input for the program

B

The location details of where the first whole record in a block begins and the last whole record in the block ends.

C

Splitting the input data to a MapReduce program into a size already configured in the mapred-site.xml

D

None of these

Answer: B

100 .

Topic: Basics

Question:

The command "hadoop fs -test -z URI" gives the result 0 if

A

the path is a directory

B

the path is a file

C

the path is not empty

D

the file is zero length

Answer: D

101 .

Topic: Basics

Question:

All the files in a directory in HDFS can be merged together using

A

getmerge

B

putmerge

C

remerge

D

mergeall

Answer: A

102 .

Topic: Basics

Question:

Job tracker runs on

A

Namenode

B

Datanode

C

Secondary namenode

D

Secondary datanode

Answer: A

103 .

Topic: Basics

Question:

Which one of the following is not true regarding to Hadoop?

A

It is a distributed framework.

B

The main algorithm used in it is Map Reduce

C

It runs with commodity hard ware

D

All are true

Answer: D

Hadoop Multiple Choice Questions (MCQs)