Big Data Interview Questions
Questions Answers Views Company eMail

What is catalyst query optimizer in apache spark?

199

What are the various types of shared variable in apache spark?

185

Define the common faults of the developer while using apache spark?

203

What is the use of spark driver, where it gets executed on the cluster?

213

What is speculative execution in spark?

235

Explain write ahead log(journaling) in spark?

188

Explain values() operation in apache spark?

272

Define the level of parallelism and its need in spark streaming?

234

Define sparksession in apache spark? Why is it needed?

198

Describe different transformations in dstream in apache spark streaming?

204

In hadoop_pid_dir, what does pid stands for?

244

What are the network requirements for hadoop?

256

What does hadoop-env.sh do?

248

Which are the three modes in which hadoop can be run?

253

Where is hadoop-env.sh file present?

238


Un-Answered Questions { Big Data }

What is the core of the job in MapReduce framework?

582


What are the various programming languages supported by Spark?

237


Can hbase run without hadoop?

455


Why should we use ‘distinct’ keyword in Pig scripts?

309


Hadoop sqoop is which type of tool?

5






What does rdd mean?

204


Why does the picture of Spark come into existence?

204


Is it possible to provide multiple inputs to hadoop? If yes, explain.

406


What is heartbeat in hdfs? Explain.

371


Say when to pick “inward table” and “outside table” in hive?

418


How many maps are there in a particular job?

258


Give any two features of flume?

73


What happened after creating a table in hive ?

426


Explain some of the basic commands used for Apache Ambari server?

47


How do I configure hadoop high availability (ha) for impala?

45