Azure Batch is a cloud-based computing platform that allows you to run massive parallel workloads. With Azure Batch, you can overcome the limitations of on-premise resources' compute capacity, as well as the costly infrastructure required to operate large workloads. This guide to Azure Batch provides an overview of its workflows, resources, compute pools, and more.
Batch processing refers to the execution of a series of jobs without manual intervention. The execution of programs happens on a series of inputs known as a batch. A batch process performs a series of steps for processing input. These steps are known as jobs.
Azure Batch is a managed service that enables us to run high-performance parallel computing jobs in the cloud. We can run compute-intensive work in a collection of virtual machines. In azure batch, we programmatically define Azure compute resources to execute our jobs. It is an on-demand solution or can be scheduled to run at a specific time.
If you want to become certified and make a career in this platform, then you can visit Mindmajix a global training online platform : "Azure training" , This course will help you to become a certified professional in this platform.
A batch account is a unique identifier entity for a batch service, which associates all processing in it. A batch account is generally associated with a storage account.
Batch libraries are used for creating batch accounts in the Azure portal and also can be used for creating batch account programmatically.
The batch account can distribute workloads in the same subscription within different regions. Multiple workloads can be run in a single batch account.
Azure batch REST APIs are RESTful APIs that can be accessed via HTTP requests. any service inside azure or outside azure can access Azure Rest APIs.
The batch account is the basic authentication for batch services. The URL of a batch account is like:
https://{account-name}.{region-id}.batch.azure.com
Batch applications perform the task on the input file and process them. An application can be a binary file or a script and supporting dependencies. An application contains one or more application package and specific configuration to run the tasks. Applications can be installed in compute nodes and can be deleted or updated.
Related article: Azure Applications Insights
Compute pool is a collection of computing nodes where the application runs. You can create a pool manually or it can be created automatically by the batch service when work is specified to be done. A batch account can only access the pool it has created and a batch account can have more than one pool.
Azure pool builds on the top of azure compute infrastructure. It supports the scaling of resources and provides health monitoring.
Every compute node has a unique IP address associated with the pool. When a node leaves a pool the IP is released for future use.
To create a compute pool we must specify the following attributes:
Task virtual machines are the compute nodes that run the tasks. They are the virtual machines build on the top of azure compute resources. Both Windows and Linux systems are supported.
A task runs in a node and is a computational unit for a job. Tasks are assigned to nodes for execution and are queued until one by one processing happens.
A job is a collection of tasks. Job manages computation in all nodes in a compute pool.
Here we will use dot net batch library and visual studio to create a sample batch task.
Step 1. Create containers in Azure Blob Storage.
Step 2. Upload task application files and input files to containers.
Step 3. Create a Batch pool.
3a. The pool StartTask downloads the task binary files (TaskApplication) to nodes as they join the pool.
Step 4. Create a Batch job.
Step 5. Add tasks to the job.
5a. The tasks are scheduled to execute on nodes.
5b. Each task downloads its input data from Azure Storage, then begins execution.
Step 6. Monitor tasks.
6a. As tasks are completed, they upload their output data to Azure Storage.
Step 7. Download task output from Storage.
Program.cs //for credentials
// Batch account credentials
private const string BatchAccountName = "";
private const string BatchAccountKey = "";
private const string BatchAccountUrl = "";
// Storage account credentials
private const string StorageAccountName = "";
private const string StorageAccountKey = "";
Using azure storage client library for dot net create a azure storage account to upload files.
// Construct the Storage account connection string
string storageConnectionString = String.Format(
"DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}",
StorageAccountName,
StorageAccountKey);
// Retrieve the storage account
CloudStorageAccount storageAccount =
CloudStorageAccount.Parse(storageConnectionString);
// Create the blob client, for use in obtaining references to
// blob storage containers
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Use the blob client to create the containers in Azure Storage if they don't
// yet exist
const string appContainerName = "application";
const string inputContainerName = "input";
const string outputContainerName = "output";
await CreateContainerIfNotExistAsync(blobClient, appContainerName);
await CreateContainerIfNotExistAsync(blobClient, inputContainerName);
await CreateContainerIfNotExistAsync(blobClient, outputContainerName);
private static async Task CreateContainerIfNotExistAsync(
CloudBlobClient blobClient,
string containerName)
{
CloudBlobContainer container =
blobClient.GetContainerReference(containerName);
if (await container.CreateIfNotExistsAsync())
{
Console.WriteLine("Container [{0}] created.", containerName);
}
else
{
Console.WriteLine("Container [{0}] exists, skipping creation.",
containerName);
}
}
// Paths to the executable and its dependencies that will be executed by the tasks
List applicationFilePaths = new List
{
// The DotNetTutorial project includes a project reference to TaskApplication,
// allowing us to determine the path of the task application binary dynamically
typeof(TaskApplication.Program).Assembly.Location,
"Microsoft.WindowsAzure.Storage.dll"
};
// The collection of data files that are to be processed by the tasks
List inputFilePaths = new List
{
@"....taskdata1.txt",
@"....taskdata2.txt",
@"....taskdata3.txt"
};
// Upload the application and its dependencies to Azure Storage. This is the
// application that will process the data files, and will be executed by each
// of the tasks on the compute nodes.
List applicationFiles = await UploadFilesToContainerAsync(
blobClient,
appContainerName,
applicationFilePaths);
// Upload the data files. This is the data that will be processed by each of
// the tasks that are executed on the compute nodes within the pool.
List inputFiles = await UploadFilesToContainerAsync(
blobClient,
inputContainerName,
inputFilePaths);
private static async Task CreatePoolIfNotExistAsync(BatchClient batchClient, string poolId, IList resourceFiles)
{
CloudPool pool = null;
try
{
Console.WriteLine("Creating pool [{0}]...", poolId);
// Create the unbound pool. Until we call CloudPool.Commit() or CommitAsync(), no pool is actually created in the
// Batch service. This CloudPool instance is therefore considered "unbound," and we can modify its properties.
pool = batchClient.PoolOperations.CreatePool(
poolId: poolId,
targetDedicatedComputeNodes: 3, // 3 compute nodes
virtualMachineSize: "small", // single-vCPU, 1.75 GB memory, 225 GB disk
cloudServiceConfiguration: new CloudServiceConfiguration(osFamily: "4")); // Windows Server 2012 R2
// Create and assign the StartTask that will be executed when compute nodes join the pool.
// In this case, we copy the StartTask's resource files (that will be automatically downloaded
// to the node by the StartTask) into the shared directory that all tasks will have access to.
pool.StartTask = new StartTask
{
// Specify a command line for the StartTask that copies the task application files to the
// node's shared directory. Every compute node in a Batch pool is configured with a number
// of pre-defined environment variables that can be referenced by commands or applications
// run by tasks.
// Since a successful execution of robocopy can return a non-zero exit code (e.g. 1 when one or
// more files were successfully copied) we need to manually exit with a 0 for Batch to recognize
// StartTask execution success.
CommandLine = "cmd /c (robocopy %AZ_BATCH_TASK_WORKING_DIR% %AZ_BATCH_NODE_SHARED_DIR%) ^& IF %ERRORLEVEL% LEQ 1 exit 0",
ResourceFiles = resourceFiles,
WaitForSuccess = true
};
await pool.CommitAsync();
}
catch (BatchException be)
{
// Swallow the specific error code PoolExists since that is expected if the pool already exists
if (be.RequestInformation?.BatchError != null && be.RequestInformation.BatchError.Code == BatchErrorCodeStrings.PoolExists)
{
Console.WriteLine("The pool {0} already existed when we tried to create it", poolId);
}
else
{
throw; // Any other exception is unexpected
}
}
}
private static async Task CreatePoolIfNotExistAsync(BatchClient batchClient, string poolId, IList resourceFiles)
{
CloudPool pool = null;
try
{
Console.WriteLine("Creating pool [{0}]...", poolId);
// Create the unbound pool. Until we call CloudPool.Commit() or CommitAsync(), no pool is actually created in the
// Batch service. This CloudPool instance is therefore considered "unbound," and we can modify its properties.
pool = batchClient.PoolOperations.CreatePool(
poolId: poolId,
targetDedicatedComputeNodes: 3, // 3 compute nodes
virtualMachineSize: "small", // single-vCPU, 1.75 GB memory, 225 GB disk
cloudServiceConfiguration: new CloudServiceConfiguration(osFamily: "4")); // Windows Server 2012 R2
// Create and assign the StartTask that will be executed when compute nodes join the pool.
// In this case, we copy the StartTask's resource files (that will be automatically downloaded
// to the node by the StartTask) into the shared directory that all tasks will have access to.
pool.StartTask = new StartTask
{
// Specify a command line for the StartTask that copies the task application files to the
// node's shared directory. Every compute node in a Batch pool is configured with a number
// of pre-defined environment variables that can be referenced by commands or applications
// run by tasks.
// Since a successful execution of robocopy can return a non-zero exit code (e.g. 1 when one or
// more files were successfully copied) we need to manually exit with a 0 for Batch to recognize
// StartTask execution success.
CommandLine = "cmd /c (robocopy %AZ_BATCH_TASK_WORKING_DIR% %AZ_BATCH_NODE_SHARED_DIR%) ^& IF %ERRORLEVEL% LEQ 1 exit 0",
ResourceFiles = resourceFiles,
WaitForSuccess = true
};
await pool.CommitAsync();
}
catch (BatchException be)
{
// Swallow the specific error code PoolExists since that is expected if the pool already exists
if (be.RequestInformation?.BatchError != null && be.RequestInformation.BatchError.Code == BatchErrorCodeStrings.PoolExists)
{
Console.WriteLine("The pool {0} already existed when we tried to create it", poolId);
}
else
{
throw; // Any other exception is unexpected
}
}
}
private static async Task> AddTasksAsync(
BatchClient batchClient,
string jobId,
List inputFiles,
string outputContainerSasUrl)
{
Console.WriteLine("Adding {0} tasks to job [{1}]...", inputFiles.Count, jobId);
// Create a collection to hold the tasks that we'll be adding to the job
List tasks = new List();
// Create each of the tasks. Because we copied the task application to the
// node's shared directory with the pool's StartTask, we can access it via
// the shared directory on the node that the task runs on.
foreach (ResourceFile inputFile in inputFiles)
{
string taskId = "topNtask" + inputFiles.IndexOf(inputFile);
string taskCommandLine = String.Format(
"cmd /c %AZ_BATCH_NODE_SHARED_DIR%TaskApplication.exe {0} 3 "{1}"",
inputFile.FilePath,
outputContainerSasUrl);
CloudTask task = new CloudTask(taskId, taskCommandLine);
task.ResourceFiles = new List { inputFile };
tasks.Add(task);
}
// Add the tasks as a collection, as opposed to issuing a separate AddTask call
// for each. Bulk task submission helps to ensure efficient underlying API calls
// to the Batch service.
await batchClient.JobOperations.AddTaskAsync(jobId, tasks);
return tasks;
}
private static async Task DownloadBlobsFromContainerAsync(
CloudBlobClient blobClient,
string containerName,
string directoryPath)
{
Console.WriteLine("Downloading all files from container [{0}]...", containerName);
// Retrieve a reference to a previously created container
CloudBlobContainer container = blobClient.GetContainerReference(containerName);
// Get a flat listing of all the block blobs in the specified container
foreach (IListBlobItem item in container.ListBlobs(
prefix: null,
useFlatBlobListing: true))
{
// Retrieve reference to the current blob
CloudBlob blob = (CloudBlob)item;
// Save blob contents to a file in the specified folder
string localOutputFile = Path.Combine(directoryPath, blob.Name);
await blob.DownloadToFileAsync(localOutputFile, FileMode.Create);
}
Console.WriteLine("All files downloaded to {0}", directoryPath);
}
You can additionally perform clean up tasks and deleting the computing pools and jobs.
An azure batch requires programming and platform expertise to perform operations.
If you interested to learn Azure and build a career then check out our Azure training Course at your near Cities
Microsoft Azure Course Bangalore, Microsoft Azure Course Hyderabad, Microsoft Azure Course Pune, Microsoft Azure Course Delhi, Microsoft Azure Course Chennai, Microsoft Azure Course Newyork, Microsoft Azure Course Washington, Microsoft Azure Course Dallas, Microsoft Azure Course Maryland
These courses are incorporated with Live instructor-led training, Industry Use cases, and hands-on live projects. This training program will make you an expert in AWS and help you to achieve your dream job.
Microsoft Azure Infrastructure Solutions 70-533 | Microsoft Azure Solutions 70-532 |
Azure Solutions Architect | Microsoft Azure Certification |
Name | Dates | |
---|---|---|
Azure Training | Oct 15 to Oct 30 | View Details |
Azure Training | Oct 19 to Nov 03 | View Details |
Azure Training | Oct 22 to Nov 06 | View Details |
Azure Training | Oct 26 to Nov 10 | View Details |
Anji Velagana is working as a Digital Marketing Analyst and Content Contributor for Mindmajix. He writes about various platforms like Servicenow, Business analysis, Performance testing, Mulesoft, Oracle Exadata, Azure, and few other courses. Contact him via anjivelagana@gmail.com and LinkedIn.