Hadoop Interview Questions
Questions Answers Views Company eMail

What is the driver program in spark?

217

What is spark submit?

235

How do I clear my spark cache?

228

What is a partition in spark?

282

What is spark vectorization?

268

What is off heap memory in spark?

228

What is a tuple in spark?

244

Is spark an etl?

235

How is rdd distributed?

271

What are the common transformations in apache spark?

230

What is the difference between dataset and dataframe in spark?

265

What is distributed cache in spark?

247

What is catalyst framework in spark?

257

How is dag created in spark?

237

What does spark do during speculative execution?

270


Un-Answered Questions { Hadoop }

What is inputformat in hadoop?

555


Explain how Hive Deserialize and serialize the data?

548


Why the output of map tasks are stored (spilled ) into local disc and not in hdfs?

505


What do you mean by ss table and explain how it is different from the other original tables?

84


What is the default replication factor?

791


Explain how mapreduce works.

452


Illustrate a simple example of the working of MapReduce.

476


What does the high availability of a name-node means? How is it accomplished?

297


What is the fundamental difference between a MapReduce InputSplit and HDFS block?

456


Explain about the different channel types in Flume.

82


What is lazy evaluation in Spark?

270


Mention what is the best way to copy files between hdfs clusters?

65


What is the difference between external table and managed table?

715


What is JMX?

169


How to skip header rows from a table in Hive?

614