Unlimited Storage with Hadoop…! using LVM on AWS Cloud(EBS)

Digambar Nandrekar
5 min readJan 18, 2023

Apache Hadoop Elasticity with LVM using AWS EBS

Elasticity :

Elasticity is defined as “thcano which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible”

Sooo What is Apache Hadoop ?

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows the distributed processing clusters usage data sets across cluster of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than relyhigh availabilityo deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

and what`s about AWS..?

Amazon Web Services (AWS) is a subsidiary of Amazon providing on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These clofferputing web services provide a variety of basic abstract technical infrastructure and distributed computing building blocks and tools.

source: aws.amazon.com

LVM (Logical Volume Management) :

source: google

LVM is a tool for logical volume management which includes allocating, disks, striping, mirroring and resizing logical volumes.

With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.

The physical volumes are combined into logical volumes, with the exception of the /boot partition. The /boot partition cannot be on a logical volume group because the boot loader cannot read it.

AWS EC2 (Elastic Compute Cloud) :

source: google

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. Amazon EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon’s proven computing environment.

AWS EBS ( Elastic Block Storage) :

source: google

Amazon Elastic Block Store(EBS) is an easy to use, high performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2) for both throughput and transaction intensive workloads at any scale. A broad range of workloads, such as relational and non-relational databases, enterprise applications, containerized applications, big data analytics engines, file systems, and media workflows are widely deployed on Amazon EBS.

Hadoop Architecture:

source: DataFlair

Hadoop Setup Deployed on AWS:

ASG for Hadoop DataNode
source: google

AWS AutoScalingGroups

An Auto Scaling group contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management. An Auto Scaling group also enables you to use Amazon EC2 Auto Scaling features such as health check replacements and scaling policies. Both maintaining the number of instances in an Auto Scaling group and automatic scaling are the core functionality of the Amazon EC2 Auto Scaling service.

Hadoop Cluster:

source: google

‣ 1 Master/HadoopNameNode

‣ 1 Hadoop Client

‣ 3 Slaves/HadoopDataNode

source: awsmgmtconsole

Hadoop DataNode Elasticity using LVM:

source: redhat.com

Creating PV(Physical Volume) :

source: aws mgmt console
source: aws mgmt console
PhysicalVolumeDisplay

Creating volume group(VG) :

VolumeGroupDisplay

Creating LogicalVolume(LV) and mount on Hadoop DataNode Dir :

Logical Volume Display & mount

LV (Logical Volume) Extend:

Extending LV Partition Size

Increase flow of data onto the DataNode can be facilitated by extending the LV Partition Size.

LV (Logical Volume) Reduce:

Reducing the partition size of LV

To free up lv space when not in use can be beneficial to utilize the storage efficiently by dedicating the free storage to other LV.

VG (Volume Group Exhaust):

Free Space in VG Exhausted

If the VG runs out of free space, allocating storage to LV’s would’nt be possible.

For this matter, we can create more PV’s and extend the size of VG with the help of vgextend.

--

--

Digambar Nandrekar

DevOps | RHEL8 | Python | AI/ML | AWS | Docker | K8S |Ansible | Jenkins| Hadoop