Analysing Apache Pig, Apache Hive, and Mysql Query Performance on Large Dataset
Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. With the capabilities of Hadoop to process large data sets, it will help to save resource on computer. MySQL Cluster is a famous database that is used by the company, government, and many other institutes. the problem of MySQL Cluster is that as the data grow larger, processing data time is increasing and additional resource may be needed. With Hadoop and tools provided for Hadoop, processing time can be decreased and with data consistency. the purpose of this research is to find when is Hadoop and the tools can overcome processing time of MySQL Cluster with data consistency. Another aspect that makes Hadoop is suitable for big data is by adding another extra Hadoop node, it can squeeze more the processing time. the research is done by creating dummy datasets with large rows and queries statements on each dataset. the result of queries time will determine when Hadoop overcomes MySQL Cluster.
B01610 | (Rack Thesis) | Available |
No other version available