Scenario-10

Have you subscribed for update ? Please subscribe here

Problem Scenario 10 : You have been given a database named retail_db with following detail. Which consists 6 tables and datamodel you can see in image.

user=retail_dba 

password=cloudera 

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

1. Import the entire database in a file format this good for analytical applications on Hadoop e.g. group your data in columns and should be able to query this data using Impala.

Also, while importing to save space you do compression using snappy codec.

2. In impala write the query, which can produce 5 Most popular product categories and save the results in HadoopExam/best_categories.csv in hdfs .

3. In Impala write the query, which can produce top 10 revenue generating products and save the results in HadoopExam/best_products.csv  in hdfs .