Big Data ~ Hadoop
Big data is a popular term used to describe the exponential growth & availability of data, both structured & unstructured. Its usually includes data sets with sizes beyond the ability of commonly used software tools to capture,curate, manage, & process data within a tolerable elapsed time.Its important to business &society– as the Internet has become.
Course Details:
Real time work shop
is conducted by realtime experienced Project Manger from MNC
Duration 2 Months
Project One Month
Class Monday to Friday
Batch Timing 2hrs per day
class Monday to Friday
Eligibility B.E/B.TECH, M.C.A,
M.Sc,M.E/M.TECH
Placement
Arcus infotech organizes trained fresher interview slots with leading MNCs .dedicated placement cell provides complete placement support for successfully course completed students.Arcus infotech candidates are currently working in IBM , Accenture , Capgemini , Wipro , CTS , ACS , DELL,PEROT etc.
Recent Placement Gallery
Our Honorable Clients
Hadoop

Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially,it accomplishes two tasks: massive data storage and faster processing. For effective scheduling of work, every Hadoop-compatible file system should provide location awareness: Hadoop applications can use this information to run work on the node where the data is, and, failing that, on the same rack/switch, reducing backbone traffic.

Introduction

The Motivation for Hadoop


  • Problems with traditional large-scale systems
  • Requirements for a new approach

Hadoop: Basic Concepts


  • An Overview of Hadoop

  • The Hadoop Distributed File System

  • Hands-On Exercise

  • How MapReduce Works

  • Hands-On Exercise

  • Anatomy of a Hadoop Cluster

  • Other Hadoop Ecosystem Components



Big Data Hadoop ~ Course Content
Writing a MapReduce Program

  • ining a Sample MapReduce Program The MapReduce Flow

  • Exam

  • Basic MapReduce API Concepts

  • The Driver Code

  • The Mapper

  • The Reducer

  • Hadoop’s Streaming API

  • Using Eclipse for Rapid Development

  • Hands-on exercise

  • The New MapReduce API

Delving Deeper Into The Hadoop API


  • More about ToolRunner

  • Testing with MRUnit

  • Reducing Intermediate Data With Combiners

  • The configure and close methods for Map/Reduce Setup and Teardown

  • Writing Partitioners for Better Load Balancing

  • Hands-On Exercise

  • Directly Accessing HDFS

  • Using the Distributed Cache

  • Hands-On Exercise

Common MapReduce Algorithms


  • Sorting and Searching

  • Indexing

  • Machine Learning With Mahout

  • Term Frequency – Inverse Document Frequency


  • Word Co-Occurrence

  • Hands-On Exercise

Usining HBase


  • What is HBase?

  • HBase Architecture

  • HBase API

  • Managing large data sets with HBase

  • Using HBase in Hadoop applications

  • Hands-on exercise

Using Hive and Pig


  • Hive Basics

  • Pig Basics

  • Hands-on exercise

Practical Development Tips and Techniques

  • Debugging MapReduce Code

  • Using LocalJobRunner Mode For Easier Debugging

  • Retrieving Job Information with Counters

  • Logging

  • Splittable File Formats

  • Determining the Optimal Number of Reducers

  • Map-Only MapReduce Jobs

  • Hands-On Exercise
More Advanced MapReduce Programming

  • Custom Writables and WritableComparables

  • Saving Binary Data using SequenceFiles and Avro Files

  • Creating InputFormats and OutputFormats

  • Hands-On Exercise
Joining Data Sets in MapReduce

  • Map-Side Joins

  • The Secondary Sort

  • Reduce-Side Joins