Big Data Interview Questions
Questions Answers Views Company eMail

What is catalyst query optimizer in apache spark?

199

What are the various types of shared variable in apache spark?

185

Define the common faults of the developer while using apache spark?

203

What is the use of spark driver, where it gets executed on the cluster?

213

What is speculative execution in spark?

237

Explain write ahead log(journaling) in spark?

188

Explain values() operation in apache spark?

272

Define the level of parallelism and its need in spark streaming?

234

Define sparksession in apache spark? Why is it needed?

198

Describe different transformations in dstream in apache spark streaming?

204

In hadoop_pid_dir, what does pid stands for?

245

What are the network requirements for hadoop?

256

What does hadoop-env.sh do?

248

Which are the three modes in which hadoop can be run?

255

Where is hadoop-env.sh file present?

238


Un-Answered Questions { Big Data }

What are clusters in cassandra?

42


Explain Avro Schemas?

71


What is transformation in spark?

228


What is the use of recordreader in hadoop?

234


What is Apache Hadoop? Why is Hadoop essential for every Big Data application?

539






How NameNode tackle Datanode failures in Hadoop?

238


Mention what is rack awareness?

225


How is Flume-NG different from Flume 0.9?

65


Define the purpose of the partition function in mapreduce framework

422


Can we broadcast an rdd?

196


Explain the operation transformation and action in Apache Spark RDD?

234


How can you use consumer api?

285


Characterize data integrity? How does hdfs ensure information integrity of data blocks squares kept in hdfs?

31


What is the meaning of speculative execution in Hadoop? Why is it important?

760


What is SparkSession in Apache Spark? Why is it needed?

221