Manado, Indonesia. 95252
(+62) 823-9602-9583
bayudwiyansatria@gmail.com

Tag: Apache Hadoop

Software Engineer | DevOps Engineer

spark-vs-hadoop

What is differences about Apache Hadoop vs Apache Spark

What is Big Data? What size of Data is considered to be big and will be termed as Big Data? We have many relative assumptions for the term Big Data. It is possible that, the amount of data say 50 terabytes can be considered as Big Data for Startup’s but it may not be Big Data for the companies like Google and Facebook. It is because they have infrastructure to store and process this vast amount of data. Apache Hadoop and Apache Spark are both Big Data analytics frameworks they provide some of the most popular tools used to carry out common Big Data-related tasks.

hadoop

Apache Hadoop Introduction

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer. So delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Apache_hadoop

Setup And Configure Cluster Node Hadoop Installation

This describes how to setup and configure a cluster-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).