Blog

Managing OpenStack Swift Cluster Capacity

  • (4.0)
  • | 1608 Ratings

Introduction


In recent days,  the IT landscape has seen dramatic changes and development.  There are many IT platforms like DevOps,  Cloud Services,  Containers,  Kubernetes, etc. With all the data around us,  it is very difficult to store,  retrieve and process it. In the proprietary options, there are many issues in tracking and monitoring, managing time,  and scaling applications. In order to run all the operations in one platform, OpenStack Swift is necessary as it combines all the operations in one platform and works on Platform-As-a-Service (PaAS) model. It is an open source architecture that helps in deploying applications across virtual,  physical,  private, public and hybrid cloud infrastructure.


Let us learn more about OpenStack swift in the subsequent paragraphs.



Swift Overview


Swift is an OpenStack Object store project which offers organizations to store and retrieve a huge amount of data safely, cheaply and efficiently on the cloud. It maintains concurrency, availability and robustness across the entire data store. It can store any type of files such as analytics data, images, videos, audios, web content or any other unstructured data. 



To gain in-depth knowledge and be on par with practical experience, then explore  OpenStack Training course.

 


Server Process Layers


The object storage in Swift is done using a distributed storage system called Swift Clusters. It is a collection of nodes/machines that are running one or more swift’s servers processes. The four Swift Process layers are as follows: 


Proxy Layer


All the requests and responses are passed through the proxy layer because it is the only communication with the external clients. A minimum of two proxy servers is deployed for redundancy if suppose one of the proxy servers fails others can take over. It is implemented using shared-nothing architecture and it can be scaled in and scaled out based on the workloads. 


[Related Article: OpenStack Network]

 

Account Layer


The Metadata information of containers within each account or individual accounts is handled by account layer. The information collected will be stored in SQLite databases by the account server. 


Container Layer


The Metadata information of a list of objects within container or container metadata is handled by the container layer. The metadata of list of objects will have only provides details of to which container it belongs not the location of the object. The information of the container is stored as SQLite like account. 



[Related Article:  OpenStack Authentication]


Object Layer


This layer is responsible for the actual storage of the objects. The objects are stored in the form of binary files on the drive in the path of the associated partition with a timestamp. Timestamp is important as it allows the user to store different versions of the same object. The metadata of the object is stored in the extended attribute (xattrs) of the file. The data and metadata are copied as a single unit and stored together. 


Checkout OpenStack Tutorial


How OpenStack Swift works?


OpenStack Swift offers a platform to manage all the containers across various operating environments to build,  deploy and scale-in scale-out also reduces the processing time. It supports redundancy so that data when lost can be retrieved from other copy stored in a different geographical location. The below section elaborates the working principles of OpenStack Swift.


The data is stored in the form of binary objects on the underlying file system. Each object stored will have its metadata stored along with data as a part of extended attributes. OpenStack Swift has proxy server and storage nodes. The proxy server implements a REST API for the transmission of read and write requests between the storage nodes and clients. 


[Related Article: OpenStack Dashboard to launch instances]


The clients can retrieve and store data using the commands such as GET and PUT via HTTP protocol. The proxy server locates the data using the metadata information. Swift supports deletion and replication of coding across the nodes. An extra copy of every object is stored in unique location arranged by region, zone, server and drive. Suppose the object has been deleted due to hardware or server fails, the replication of content from other location will be transferred to new location in the cluster. 


To map the object storage data with the server partitions, Swift uses a ring. Each service (account, object and container) will have separate ring. All the rings are constructed and managed by the Ring Builder. The Ring Builder also assigns partitions to storage devices and also stores the configuration information in the storage nodes. 


OpenStack Swift uses eventual consistency (the value for a specific object will remain consistent across all the nodes in the cluster at a given time) that provides high availability and high scalability. The proxy servers provide access to the recently available data, even if some server nodes are not available in the cluster.


The diagram below shows all the services offered by SwiftStack and how they interact among themselves.


 OpenStack Swift Cluster Capacity


After installing the OpenStack latest version following steps should be followed for OpenStack Swift installation:

  • Enable the ability for installing new updates in OpenStack, then use the command below to install Swift on each node. 
# apt-get install ubuntu-cloud-keyring
# echo “deb http://ubuntu-cloud.archive.canonical.com/ubuntu” \  
   “trusty-updates/juno main” 
    > /etc/apt/sources.list.d/cloudarchive-juno.list
  • Update OS using the command,
# apt-get update && apt-get dist-upgrade
  • Install all the prerequisite softwares and services on all Swift nodes using the command, 
# apt-get install swift rsync memcached python-netifaces 
		   python-xattr Python-memcache
  • Create a Swift folder under /etc and grant access to the users using the following commands,
 # mkdir -p /etc/swift/
 # chown -R swift:swift /etc/swift
  • From GitHub download the file /etc/swift/swift.conf using the following command,
# curl -o /etc/swift/swift.conf
https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/
swift.conf-sample
  • Add a variable, swift_hash_path_suffix in the swift-hash section in the file /etc/swift/swift.conf. Create a unique hash string # python -c “from uuid import uuid4; print uuid4()” and assign it to the created variable, as shown: 
[swift-hash]
		# random unique strings that can never change 
		swift_hash_path_prefix = bd08f643f5663c4ec607
  • Add another variable, swift_hash_path_prefix in the swift-hash section using the same method from step 6. Using these strings mappings in the ring are done. The swift.conf file must be identical in all the nodes in the cluster. 
swift_hash_path_suffix = f423bf7ab663888fe832

[Related Article: Monitoring MySQL with Hyperic]

Managing Capacity


Data is ever growing so is the capacity to store them. It is important to manage the data and storage to make sure there is no leakage of data. The following sections explain in detail about how Swift manages data and capacity.


How SwiftStack Distributes Data


Every time when there is an incoming of data, Swift always tries to store them in a location as unique as possible. Swift is a distributed system that is controlled by software and has multiple copies of object in different nodes. A group of nodes that is kept away from other nodes is called as zone. A Swift ring is responsible for mapping swift services with the objects and stores every replica in separate zone. A SwiftStack cluster uses a hashing ring to randomly distribute objects across all the nodes in the cluster according to its size. 


  • Adding Capacity: When new capacity is added it means that a new server or new rack has been added. If a new capacity is added to SwiftStack Cluster, then whole data will be redistributed evenly across other cluster resources. To ensure there is no impact on the performance of the cluster it is important to manage the newly added replication traffic. In large nodes it would not impact much but small nodes should be taken care of. The process of adding the new capacity is handled by SwiftStack Controller.
  • Removing Capacity: A capacity can be removed when it has failed and needs a replacement, causing issues and need a replacement or to replace it with larger capacity to increase the capacity of the cluster. Is is easy to remove a capacity and similar to adding a capacity just that Remove should be chosen, and wait until the capacity becomes empty so that it can removed physically.  
  • Full Drives: The drive becomes full if needed capacity has not been added when there is a data growth. SwiftStack Cluster distributes data evenly across the clusters so if the drive is full it implies that all the drives in the SwiftStack is full and corresponds to when a new capacity should be added.

[Related Article: Creating a sandbox Network server for Neutron with VirtualBox]


SwiftStack Cluster Hardware Requirements


When selecting hardware for SwiftStack Cluster it is important to take configuration requirements for balancing I/O performance, cost and capacity of the workload into consideration. The following portion will help user select suitable hardware for the SwiftStack deployment. 


Networked Nodes Architecture


SwiftStack deployment will have many nodes offering proxy servers and storage services. These nodes are connected to each other and to the SwiftStack Controller. There are three categories based on the hardware requirement. They are - Networking, Nodes and the third category comes into play when the organization needs the SwiftStack Controller On-Premises. 

 

[Related Article: Configuring Ubuntu Cloud]


Networking


The current network topology will have SwiftStack deployment embedded in it but the organization should discuss with the network engineer team about the hardware requirements and configurations that must be added to it (switches, routers, firewall etc). 


There are different network segments for SwiftStack deployment such as Outward-facing network - for API access and to run authentication and proxy services, Cluster-Facing Network - for communication between storage nodes and proxy servers, Replication Network - This is optional internal network, when there is a replication traffic because of data redistribution this network can be created, Network route to a SwiftStack Controller - For SwiftStack Controller On-Premises deployment this one would help in routing IP traffic to SwiftStack Controller, Hardware Network Management- To manage ILO, IPMI etc used for managing hardware.  


Load Balancing


The requests which come through Swift API are evenly distributed to among nodes in proxy tier is done by Load Balancing Tier. 


No Load Balancer: If it is a single-node cluster then there is no need for a load balancer. If that is the case then in Cluster API Address Field provide Outward-Facing IP Address and select No Load Balancer. 


External Load Balancer: When there is more than one node in a cluster, a commercial external load balancer can be used. The proxy servers are configured with healthcheck URL which would be used by external load balancer. The external load server uses the URL /healthcheck (example: http://12.23.45.5/healthcheck) to check whether the proxy servers are working or not.


Frequently Asked OpenStack Interview Question & Answers

 

Node Configuration


Node Provisioning: The following are the steps to be followed for node provisioning:


Step 1: Navigate to Nodes tab in Manage Cluster page


Step 2: Click Install a New Node to read the instructions on how to install new nodes in an empty cluster


Step 3: After installing a node to allocate that to a particular cluster, click Unprovisioned Nodes, select zone and region then click on Ingest Now


Step 4: When the user adds the node URL and claims the node but didn’t add it to the cluster then it will be shown as Unprovisioned Node. To provision unprovisioned node click on Set up in Swift Nodes and to change the provisioned node click on Manage.


[Related Article: Managing swift cluster capacity]


Manage Node Network Configuration 


It is safe to use two network interface of which one will be used for intra cluster communications and other one will be used for communications with load balancer or clients. If a node has three interfaces then it would be easy to use a separate network for replications. 


To add three network interfaces, under Edit Network Interfaces make sure Outward-facing interface, Cluster-facing interface, Data-replication interface are provided and correct then click on Reassign Interfaces.

  • Zones:  Regions and zones are used to organise nodes across clusters. Regions are used to separate the objects in different geographical areas so that the copies of objects are kept as far apart as possible from each other. Inside regions, nodes are further classified into zones. Zones are set of nodes that are grouped based on single point of failure. In other words, they are a collection of nodes that represent tangible, real-world failure domains which will help Swift to increase data availability. 
  • Services: The SwiftStack node software will perform services such as Proxy services, Container Services, Account Services and Object Services. Proxy Services handle incoming requests and responses to those requests that come via API, manages timestamps, load balancing, encryption, failures, SSL terminations and Authentication. Container, Account and Object Services are responsible for retrieve, store and update, replicate  the containers, objects and accounts, Auditing the data integrity of objects, containers and accounts.  
  • General Hardware Information:
    • Chasis : Standard 1U-4U
    • CPU : AMD or Intel Xeon E5-2600
    • RAM : Depends on individual nodes
    • Storage Disks : 7200-RPM enterprise-grade SATA or SAS drivers.
    • OS Disks: 100 GB Minimum
    • SSDs: Depends on individual nodes used for Account and Container Services 
    • Network Cards: Outward-facing and Cluster-facing Network - single port or dual port 10 GbE adapters. Out-of-band hardware management and in-band cluster management services - single port 1 GbE adapters.
    • RAID Controller Cards: RAID cards should be avoided wherever possible. If used then set it in “JBOD” or “pass-through” mode.

[Related Article: Openstack Block Storage]


Example Server Models


The following table provides common sample models deployed among SwiftStack customers today:

Example Server Models


Specific Hardware Sizing


On-Premises Controller

  • CPU: Virtual/Physical, 4+ cores
  • Memory: Virtual - 8 GB, Physical - 32GB 
  • HDDs: 64GB Minimum, 2x boot drivers 
  • SSDs: 4GB per node in the cluster
  • Network: 1 x  1 GbE

Proxy/Account/Container/Object (PACO) Nodes

  • CPU: 2.1 GHz+, 1 core per 1000 client connections with respect to SSL, 1 core per 3 HDDs.
  • Memory:  - less than 30 drives - 128GB

  - 30 to 60 drives - 256GB

 - more than 60 drives - 384GB 

  • HDDs: 2 x 240GB SSD, Mirrored Boot Drive
  • SSDs: 3.9k of disk space per unique object.
  • Network: 2 x 10GbE

Proxy Nodes

  • CPU: 2.1 GHz+, 1 core per 1000 client connections with respect to SSL, 1 core per 3 HDDs.
  • Memory: 32GB
  • HDDs: 1 x boot drive
  • Network: 2 x 10 GbE

[Related Article: How to progress with OpenStack?]


Proxy/Account/Container (PAC) Nodes

  • CPU: 2.1 GHz+, 1 core per 1000 client connections with respect to SSL, 1 core per 3 HDDs.
  • Memory: 32GB
  • HDDs: 1 x boot drive
  • SSDs: 3.9k of disk space per unique object.
  • Network: 2 x 10 GbE

Account/Container/Object (ACO) Nodes

  • CPU: 2.1 GHz+, 1 core per 1000 client connections with respect to SSL, 1 core per 3 HDDs.
  • Memory: 32GB
  • HDDs: 1 x boot drive
  • SSDs: 3.9k of disk space per unique object.
  • Network: 2 x 10 GbE

[Related Article: Managing swift cluster capacity]


Object (O) Nodes

  • CPU: 2.1GHz+, 1 core per 3 HDDs
  • Memory:  - less than 30 drives - 128GB

 - 30 to 60 drives - 256GB

 - more than 60 drives - 384GB 

  • HDDs: 2 x boot drive
  • Network: 2 x 10 GbE

[Related Article: Openstack Block Storage]


Features of OpenStack Swift


The following are some of the common features of OpenStack Swift: 


  • Leverages Community hardware
  • No Central database
  • Unlimited storage
  • Built-in replication 3 x and data redundancy
  • Container/Account/Object Structure
  • Multi-dimensional scalability
  • Easy addition of capacity
  • No RAID
  • Drive Auditing
  • Direct Object access
  • Expiring object
  • Real Time visibility into client request
  • Support S3 API
  • Restrict containers or account
  • Built-in management abilities

 

Conclusion 


OpenStack Swift, which is also called as OpenStack Storage, is an open source software to effectively manage and store large amount of data at a cheaper rate for a long term. The data is stored as objects in nodes of a cluster. The replication facility offered by the SwiftStack helps at the time of failure and data loss. It is developed by RackSpace Hosting Inc based on Cloud Files technology. Common benefits of using OpenStack Swift include Storage backup,  archiving unstructured data such as static web content,  video and audio files, virtual machine images and documents


Explore OpenStack Sample Resumes! Download & Edit, Get Noticed by Top Employers!  Download Now!


Subscribe For Free Demo

Free Demo for Corporate & Online Trainings.

Ravindra Savaram
About The Author

Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.


DMCA.com Protection Status

Close
Close