In recent days, the IT landscape has seen dramatic changes and development. There are many IT platforms like DevOps, Cloud Services, Containers, Kubernetes, etc. With all the data around us, it is very difficult to store, retrieve and process it. In the proprietary options, there are many issues in tracking and monitoring, managing time, and scaling applications. In order to run all the operations in one platform, OpenStack Swift is necessary as it combines all the operations in one platform and works on Platform-As-a-Service (PaaS) model. It is an open-source architecture that helps in deploying applications across virtual, physical, private, public, and hybrid cloud infrastructure.
If you would like to become an OpenStack Certified professional, then visit Mindmajix - A Global online training platform: " OpenStack Certification Training Course ". This course will help you to achieve excellence in this domain.
Let us learn more about OpenStack swift in the subsequent paragraphs.
Swift is an OpenStack Object store project which offers organizations to store and retrieve a huge amount of data safely, cheaply, and efficiently on the cloud. It maintains concurrency, availability, and robustness across the entire data store. It can store any type of files such as analytics data, images, videos, audios, web content, or any other unstructured data.
The object storage in Swift is done using a distributed storage system called Swift Clusters. It is a collection of nodes/machines that are running one or more swift’s servers processes. The four Swift Process layers are as follows:
All the requests and responses are passed through the proxy layer because it is the only communication with the external clients. A minimum of two proxy servers is deployed for redundancy if suppose one of the proxy servers fails others can take over. It is implemented using shared-nothing architecture and it can be scaled in and scaled out based on the workloads.
The Metadata information of containers within each account or individual accounts is handled by the account layer. The information collected will be stored in SQLite databases by the account server.
The Metadata information of a list of objects within container or container metadata is handled by the container layer. The metadata of the list of objects will have only provides details of to which container it belongs not the location of the object. The information of the container is stored as SQLite accounts.
This layer is responsible for the actual storage of the objects. The objects are stored in the form of binary files on the drive in the path of the associated partition with a timestamp. The timestamp is important as it allows the user to store different versions of the same object. The metadata of the object is stored in the extended attribute (xattrs) of the file. The data and metadata are copied as a single unit and stored together.
OpenStack Swift offers a platform to manage all the containers across various operating environments to build, deploy and scale-in scale-out also reduces the processing time. It supports redundancy so that data when lost can be retrieved from other copies stored in a different geographical location. The below section elaborates the working principles of OpenStack Swift.
The data is stored in the form of binary objects on the underlying file system. Each object stored will have its metadata stored along with data as a part of extended attributes. OpenStack Swift has proxy server and storage nodes. The proxy server implements a REST API for the transmission of reading and writes requests between the storage nodes and clients.
The clients can retrieve and store data using the commands such as GET and PUT via HTTP protocol. The proxy server locates the data using the metadata information. Swift supports deletion and replication of coding across the nodes. An extra copy of every object is stored in a unique location arranged by region, zone, server, and drive. Suppose the object has been deleted due to hardware or server fails, the replication of content from another location will be transferred to a new location in the cluster.
To map the object storage data with the server partitions, Swift uses a ring. Each service (account, object, and container) will have a separate ring. All the rings are constructed and managed by the Ring Builder. The Ring Builder also assigns partitions to storage devices and also stores the configuration information in the storage nodes.
OpenStack Swift uses eventual consistency (the value for a specific object will remain consistent across all the nodes in the cluster at a given time) that provides high availability and high scalability. The proxy servers provide access to the recently available data, even if some server nodes are not available in the cluster.
The diagram below shows all the services offered by SwiftStack and how they interact among themselves.
After installing the OpenStack latest version following steps should be followed for OpenStack Swift installation:
# apt-get install ubuntu-cloud-keyring # echo “deb http://ubuntu-cloud.archive.canonical.com/ubuntu” “trusty-updates/juno main” > /etc/apt/sources.list.d/cloudarchive-juno.list
# apt-get update && apt-get dist-upgrade
# apt-get install swift rsync memcached python-netifaces python-xattr Python-memcache
# mkdir -p /etc/swift/ # chown -R swift:swift /etc/swift
# curl -o /etc/swift/swift.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/ swift.conf-sample
[swift-hash] # random unique strings that can never change swift_hash_path_prefix = bd08f643f5663c4ec607
swift_hash_path_suffix = f423bf7ab663888fe832
Data is ever growing so is the capacity to store them. It is important to manage the data and storage to make sure there is no leakage of data. The following sections explain in detail about how Swift manages data and capacity.
Every time when there is an incoming of data, Swift always tries to store them in a location as unique as possible. Swift is a distributed system that is controlled by software and has multiple copies of objects in different nodes. A group of nodes that is kept away from other nodes is called a zone. A Swift ring is responsible for mapping swift services with the objects and stores every replica in a separate zone. A SwiftStack cluster uses a hashing ring to randomly distribute objects across all the nodes in the cluster according to their size.
When selecting hardware for SwiftStack Cluster it is important to take configuration requirements for balancing I/O performance, cost, and capacity of the workload into consideration. The following portion will help the user select suitable hardware for the SwiftStack deployment.
SwiftStack deployment will have many nodes offering proxy servers and storage services. These nodes are connected to each other and to the SwiftStack Controller. There are three categories based on the hardware requirement. They are - Networking, Nodes and the third category comes into play when the organization needs the SwiftStack Controller On-Premises.
The current network topology will have SwiftStack deployment embedded in it but the organization should discuss with the network engineer team about the hardware requirements and configurations that must be added to it (switches, routers, firewall, etc).
There are different network segments for SwiftStack deployment such as Outward-facing network - for API access and to run authentication and proxy services, Cluster-Facing Network - for communication between storage nodes and proxy servers, Replication Network - This is an optional internal network, when there is replication traffic because of data redistribution this network can be created, Network route to a SwiftStack Controller - For SwiftStack Controller On-Premises deployment this one would help in routing IP traffic to SwiftStack Controller, Hardware Network Management- To manage ILO, IPMI etc used for managing hardware.
The requests which come through Swift API are evenly distributed among nodes in the proxy tier is done by Load Balancing Tier.
No Load Balancer: If it is a single-node cluster then there is no need for a load balancer. If that is the case then in Cluster API Address Field provide Outward-Facing IP Address and select No Load Balancer.
External Load Balancer: When there is more than one node in a cluster, a commercial external load balancer can be used. The proxy servers are configured with healthcheck URL which would be used by external load balancer. The external load server uses the URL /health check (example: http://220.127.116.11/healthcheck) to check whether the proxy servers are working or not.
[Related Article: OpenStack Interview Questions and Answers]
Node Provisioning: The following are the steps to be followed for node provisioning:
Step 1: Navigate to the Nodes tab on the Manage Cluster page
Step 2: Click Install a New Node to read the instructions on how to install new nodes in an empty cluster
Step 3: After installing a node to allocate that to a particular cluster, click Unprovisioned Nodes, select zone and region then click on Ingest Now
Step 4: When the user adds the node URL and claims the node but didn’t add it to the cluster then it will be shown as Unprovisioned Node. To provision an unprovisioned node click on Set up in Swift Nodes and to change the provisioned node click on Manage.
It is safe to use two network interfaces of which one will be used for intracluster communications and another one will be used for communications with load balancer or clients. If a node has three interfaces then it would be easy to use a separate network for replications.
To add three network interfaces, under Edit Network Interfaces make sure Outward-facing interface, Cluster-facing interface, Data-replication interface are provided and correct then click on Reassign Interfaces.
Example Server Models
The following table provides common sample models deployed among SwiftStack customers today:
Proxy/Account/Container/Object (PACO) Nodes
- 30 to 60 drives - 256GB
- more than 60 drives - 384GB
Proxy/Account/Container (PAC) Nodes
Account/Container/Object (ACO) Nodes
Object (O) Nodes
- 30 to 60 drives - 256GB
- more than 60 drives - 384GB
The following are some of the common features of OpenStack Swift:
OpenStack Swift, which is also called OpenStack Storage, is an open-source software to effectively manage and store a large amount of data at a cheaper rate for the long term. The data is stored as objects in nodes of a cluster. The replication facility offered by the SwiftStack helps at the time of failure and data loss. It is developed by RackSpace Hosting Inc based on Cloud Files technology. Common benefits of using OpenStack Swift include Storage backup, archiving unstructured data such as static web content, video and audio files, virtual machine images, and documents