Big Data ~ Hadoop
Big data is a popular term used to describe the exponential growth & availability of data, both structured & unstructured. Its usually includes data sets with sizes beyond the ability of commonly used software tools to capture,curate, manage, & process data within a tolerable elapsed time.Its important to business &society– as the Internet has become.
Course Details:
Real time work shop
is conducted by realtime experienced Project Manger from MNC
Duration 2 Months
Project One Month
Class Monday to Friday
Batch Timing 2hrs per day
class Monday to Friday
Eligibility B.E/B.TECH, M.C.A,
Arcus infotech organizes trained fresher interview slots with leading MNCs .dedicated placement cell provides complete placement support for successfully course completed students.Arcus infotech candidates are currently working in IBM , Accenture , Capgemini , Wipro , CTS , ACS , DELL,PEROT etc.
Recent Placement Gallery
Our Honorable Clients

Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially,it accomplishes two tasks: massive data storage and faster processing. For effective scheduling of work, every Hadoop-compatible file system should provide location awareness: Hadoop applications can use this information to run work on the node where the data is, and, failing that, on the same rack/switch, reducing backbone traffic.


The Motivation for Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

Hadoop: Basic Concepts

  • An Overview of Hadoop

  • The Hadoop Distributed File System

  • Hands-On Exercise

  • How MapReduce Works

  • Hands-On Exercise

  • Anatomy of a Hadoop Cluster

  • Other Hadoop Ecosystem Components

Big Data Hadoop ~ Course Content
Writing a MapReduce Program

  • ining a Sample MapReduce Program The MapReduce Flow

  • Exam

  • Basic MapReduce API Concepts

  • The Driver Code

  • The Mapper

  • The Reducer

  • Hadoop’s Streaming API

  • Using Eclipse for Rapid Development

  • Hands-on exercise

  • The New MapReduce API

Delving Deeper Into The Hadoop API

  • More about ToolRunner

  • Testing with MRUnit

  • Reducing Intermediate Data With Combiners

  • The configure and close methods for Map/Reduce Setup and Teardown

  • Writing Partitioners for Better Load Balancing

  • Hands-On Exercise

  • Directly Accessing HDFS

  • Using the Distributed Cache

  • Hands-On Exercise

Common MapReduce Algorithms

  • Sorting and Searching

  • Indexing

  • Machine Learning With Mahout

  • Term Frequency – Inverse Document Frequency

  • Word Co-Occurrence

  • Hands-On Exercise

Usining HBase

  • What is HBase?

  • HBase Architecture

  • HBase API

  • Managing large data sets with HBase

  • Using HBase in Hadoop applications

  • Hands-on exercise

Using Hive and Pig

  • Hive Basics

  • Pig Basics

  • Hands-on exercise

Practical Development Tips and Techniques

  • Debugging MapReduce Code

  • Using LocalJobRunner Mode For Easier Debugging

  • Retrieving Job Information with Counters

  • Logging

  • Splittable File Formats

  • Determining the Optimal Number of Reducers

  • Map-Only MapReduce Jobs

  • Hands-On Exercise
More Advanced MapReduce Programming

  • Custom Writables and WritableComparables

  • Saving Binary Data using SequenceFiles and Avro Files

  • Creating InputFormats and OutputFormats

  • Hands-On Exercise
Joining Data Sets in MapReduce

  • Map-Side Joins

  • The Secondary Sort

  • Reduce-Side Joins