Microsoft Azure Batch

Azure Batch Computing Monitoring

❗️

This source has been deprecated

observIQ is in the process of transitioning a subset of BindPlane's monitoring capabilities to the observIQ OpenTelemetry Collector. As a result, this Source is no longer publicly available in BindPlane. If you need access to this Source, please reach out to our support via chat or via [email protected].

Please refer to the Microsoft Azure Sources topic for additional information on how to configure the LPU, and general Azure Data Collection setup details.

Least Privileged User

Steps:

  1. Using the Azure CLI Client, find the Subscription ID and Tenant ID from your account list
  2. Create a custom RBAC role using the JSON provided. Include your Subscription ID and rename the file to azure.json
  3. Create an Active Directory Service Principal and assign the custom RBAC role t it.

Creating custom roles using the Azure CLI:

https://docs.microsoft.com/en-us/azure/role-based-access-control/custom-roles

Assigning roles using the Azure CLI:

https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal

{
  "Name": "LPU Batch",
  "Description": "LPU for Load Balancers",
  "Actions": [
    "Microsoft.Batch/batchAccounts/*/read",
    "Microsoft.Insights/metrics/*/read",
    "Microsoft.Authorization/*/read"
  ],
  "AssignableScopes": [
    "/subscriptions/[Subscription ID]"
  ]
}

Connection Parameters

NameRequired?Description
Subscription IDRequiredGUID Subscription ID
Tenant IDRequiredGUID Tenant ID (also known as Directory ID)
Client IDRequiredGUID Client ID (also known as Application ID)
Client SecretRequiredThe Secret (also known as Key) corresponding to the Client ID.
Maximum HTTP Retry Time (seconds)The maximum amount of time in seconds to retry each API request when the API is throttling.
HTTP Request Timeout (seconds)The maximum amount of time in seconds before a single HTTP request will fail.
Monitor Metric Collection LevelSelects which monitor metrics should be collected.
Filter By Resource Group TypeSelects whether to use a whitelist or blacklist when filtering by Resource Groups.
Filter By Resource Group WhitelistA comma separated list of resource groups to explicitly allow. A '*' character is used to represent 'all', and a blank string is used for 'none'.
Filter By Resource Group BlacklistA comma separated list of resource groups to filter out. A '*' character is used to represent 'all', and a blank string is used for 'none'.
Filter By Tags Group TypeSelects whether to use a whitelist or blacklist when filtering by Resource Groups.
Filter By Tags Group WhitelistA comma separated list of tags to explicitly allow. Tags must follow the format <key:value>. Instead of a specific tag, or tag value, a '*' character is used to represent 'all'. A blank entry is treated as 'none'.
Filter By Tags Group BlacklistA comma separated list of tags to filter out. Tags must follow the format <key:value>. Instead of a specific tag, or tag value, a '*' character is used to represent 'all'. A blank entry is treated as 'none'.

Metrics

Account

NameDescription
Active Job And Job Schedule QuotaThe active job and job schedule quota for this batch account
Auto StorageThe properties and status of any auto-storage account associated with the Batch account
Creating Node CountNumber of nodes being created
Dedicated Core CountTotal number of dedicated cores in the batch account
Dedicated Core QuotaThe dedicated core quota for this batch account
Dedicated Node CountTotal number of dedicated nodes in the batch account
EndpointThe account endpoint used to interact with the Batch service
IDThe ID of the batch account
Idle Node CountNumber of idle nodes
Last Key SyncThe time at which the auto-storage key was last synced
Leaving Pool Node CountNumber of nodes leaving the Pool
LocationThe location of the batch account
Low-Priority Core CountTotal number of low-priority cores in the batch account
Low Priority Core QuotaThe low-priority core quota for this batch account
Low-Priority Node CountTotal number of low-priority nodes in the batch account
NameThe name of the batch account
Offline Node CountNumber of offline nodes
Pool Allocation ModeThe allocation mode for creating pools in the batch account
Pool Create EventsTotal number of pools that have been created
Pool Delete Complete EventsTotal number of pool deletes that have completed
Pool Delete Start EventsTotal number of pool deletes that have started
Pool QuotaThe pool quota for this batch account
Pool Resize Complete EventsTotal number of pool resizes that have completed
Pool Resize Start EventsTotal number of pool resizes that have started
Preempted Node CountNumber of preempted nodes
Provisioning StateThe provisioned state of the batch account
Re-imaging Node CountNumber of reimaging nodes
Rebooting Node CountNumber of rebooting nodes
Resource GroupThe Resource Group of the Azure resource.
Running Node CountNumber of running nodes
Start Task Failed Node CountNumber of nodes where the Start Task has failed
Starting Node CountNumber of nodes starting
Storage Account IDThe storage account id of the auto-storage account associated with the batch account
Task Complete EventsTotal number of tasks that have completed
Task Fail EventsTotal number of tasks that have completed in a failed state
Task Start EventsTotal number of tasks that have started
TypeThe type of the batch account
Unusable Node CountNumber of unusable nodes
Waiting For Start Task Node CountNumber of nodes waiting for the Start Task to complete

API Usage

NameDescription
Average PagesThe average amount of pages needed for a paged resource type.
Average Request RetriesThe average number of retry requests per unique requests made.
Average Retry AttemptsThe average number of retry requests made per unique request that was retried.
Average Retry Wait (Milliseconds)The average amount of time retried requests spent waiting.
Client IDThe client ID used to make API calls.
Failed RequestsThe total number of requests that returned a failure response.
Maximum PagesThe most amount of pages needed for a paged resource type.
Maximum RetriesThe highest number of retries made for a single request.
Maximum Retry Wait (Milliseconds)The most amount of time a retried request spent waiting.
Minimum PagesThe least amount of pages needed for a paged resource type.
Minimum Retry Wait (Milliseconds)The least amount of time a retried request spent waiting.
Other Status ResponsesThe total number of successful requests that responded with some other accepted status.
Request TimeoutsThe total number of requests that timed out waiting for a response.
Requests RetriedThe number of unique requests that were retried.
Retry Status ResponsesThe total number of successful requests that responded with the status TOO MANY REQUESTS (429).
Retry TimeoutsThe total number of requests that needed to be retried, but the request retry time exceeded the maximum retry time.
Status OK ResponsesThe total number of successful requests that responded with the status OK (200).
Subscription IDThe subscription ID used to make API calls.
Successful RequestsThe total number of requests that returned a successful response.
Tenant IDThe tenant ID used to make API calls.
Total Monitor RequestsThe total number of requests made to get monitor metrics.
Total Paged RequestsThe total amount of resource types that required paging.
Total RequestsThe total number of requests made during collection.
Total RetriesThe total number of retry requests that were made.
Unique Monitor RequestsThe number of unique requests made to get monitor metrics.
Unique RequestsThe number of requests made with unique endpoints.

Application

NameDescription
Allow UpdatesA value indicating whether packages within the application may be overwritten using the same version string
Default VersionThe package to use if a client requests the application but does not specify a version
Display NameThe display name for the application
IDResource ID of the application
Parent IDThe id of the parent resource.

Pool

NameDescription
Allocation StateWhether the pool is resizing
Allocation State Transition TimeThe time at which the pool entered its current allocation state
Application LicensesThe list of application licenses the Batch service will make available on each compute node in the pool
Creation TimeThe creation time of the pool
Current Dedicated NodesThe number of compute nodes currently in the pool
Current Low Priority NodesThe number of low priority compute nodes currently in the pool
Entity TagThe ETag of the resource, used for concurrency statements
IDResource ID of the pool
Inter Node CommunicationWhether the pool permits direct communication between nodes. This imposes restrictions on which nodes can be assigned to the pool. Enabling this value can reduce the chance of the requested number of nodes to be allocated in the pool. If not specified, this value defaults to 'Disabled'
Last ModifiedThis is the last time at which the pool level data, such as the targetDedicatedNodes or autoScaleSettings, changed. It does not factor in node-level changes such as a compute node changing state
Maximum Tasks Per NodeThe maximum number of tasks that can run concurrently on a single compute node in the pool
NameResource name of the pool
Provisioning StateThe current state of the pool
Provisioning State Transition TimeThe time at which the pool entered its current state
Task Scheduling PolicyHow tasks are distributed across compute nodes in a pool
TypeMicrosoft Azure resource type
VM SizeThe size of virtual machines in the pool. All VMs in a pool are the same size