Why do we use HDFS for applications having large data sets and not when there are lot of small files?
1 2137
Explain Spark streaming?
Explain bucketing in Hive?
What is the role of Consumer API?
What are the advantages and Disadvantages in archieving partition in Hive?
When to use secondary indexes?
What are clusters in cassandra?
Why aggregation cannot be done in Mapper?
Is spark good for machine learning?
Explain what is speculative execution?
Please explain the sparse vector in Spark.
is it posible to join multiple fields in pig scripts?
Why HDFS stores data using commodity hardware despite the higher chance of failures?
Explain the common input formats in hadoop?
What are sink processors?
Explain jsonloader, jsonstorage functions in pig?