Google Cloud Dataproc

🚧
Not Included in the BindPlane with Google Cloud Monitoring offering
All of the Google Cloud Platform sources listed within this documentation are not included with the BindPlane with Google Cloud Monitoring offering.

For more information on how to use the below LPU and other Google Cloud Data Collection setup. See the Google Cloud Platform Sources

Least Privileged User

A user role with the at least the following permissions is required:

📘
Deploying a Least Privileged User
To learn more about how to deploy a role with these permissions to a GCP Organization, or a GCP Project, please refer to this documentation:
Deploy an Individual LPU role to a GCP Project, or GCP Organization

- cloudnotifications.activities.list
- monitoring.alertPolicies.get
- monitoring.alertPolicies.list
- monitoring.dashboards.get
- monitoring.dashboards.list
- monitoring.groups.get
- monitoring.groups.list
- monitoring.metricDescriptors.get
- monitoring.metricDescriptors.list
- monitoring.monitoredResourceDescriptors.get
- monitoring.monitoredResourceDescriptors.list
- monitoring.notificationChannelDescriptors.get
- monitoring.notificationChannelDescriptors.list
- monitoring.notificationChannels.get
- monitoring.notificationChannels.list
- monitoring.publicWidgets.get
- monitoring.publicWidgets.list
- monitoring.timeSeries.list
- monitoring.uptimeCheckConfigs.get
- monitoring.uptimeCheckConfigs.list
- resourcemanager.projects.get
- resourcemanager.projects.list
- stackdriver.projects.get- 
- compute.machineTypes.get
- compute.regions.get
- compute.regions.list
- compute.zones.get
- dataproc.clusters.get
- dataproc.clusters.list
- dataproc.jobs.get
- dataproc.jobs.list
- dataproc.operations.get
- dataproc.operations.list
- dataproc.workflowTemplates.get
- dataproc.workflowTemplates.list

Connection Parameters

Name	Required?	Description
Private Key JSON	Required	The contents of the private key JSON file created when setting up a service account.
Metric Collection		Controls which metrics get requested from GCP's Stackdriver API.
Projects	Required	A comma separated whitelist of project IDs. If the wildcard "*" is used, resources will be collected from all available projects.
Regions	Required	A comma separated whitelist of regions. At least one region must be specified.
Connection Timeout		The number of seconds to allow for connecting to the target.

Metrics

Cloud Dataproc Cluster

Name	Description
Cluster Name	The name of the cluster.
Cluster Uuid	The unique ID of the cluster.
Configuration Bucket	A Cloud Storage staging bucket used for sharing generated SSH keys and config. If you do not specify a staging bucket, Cloud Dataproc will determine an appropriate Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Google Compute Engine zone where your cluster is deployed, and then it will create and manage this project-level, per-location bucket for you.
Failed Jobs	Indicates the number of jobs that have failed on a cluster.
Failed Operations	Indicates the number of operations that have failed on a cluster.
GCE Cluster Configuration Internal Ip Only	If true, all instances in the cluster will only have internal IP addresses. By default, clusters are not restricted to internal IP addresses, and will have ephemeral external IP addresses assigned to each instance. This internalIpOnly restriction can only be enabled for subnetwork enabled networks, and all off-cluster dependencies must be configured to be accessible without external IP addresses.
GCE Cluster Configuration Metadata	The Compute Engine metadata entries to add to all instances (see Project and instance metadata).
GCE Cluster Configuration Network Uri	The Compute Engine network to be used for machine communications. Cannot be specified with subnetworkUri. If neither networkUri nor subnetworkUri is specified, the "default" network of the project is used, if it exists. Cannot be a "Custom Subnet Network" (see Using Subnetworks for more information).
GCE Cluster Configuration Service Account	The service account of the instances. Defaults to the default Compute Engine service account.
GCE Cluster Configuration Service Account Scopes	The URIs of service account scopes to be included in Compute Engine instances.
GCE Cluster Configuration Subnetwork Uri	The Compute Engine subnetwork to be used for machine communications. Cannot be specified with networkUri.
GCE Cluster Configuration Tags	The Compute Engine tags to add to all instances (see Tagging instances).
GCE Cluster Configuration Zone Uri	The zone where the Compute Engine cluster will be located. On a create request, it is required in the "global" region. If omitted in a non-global Cloud Dataproc region, the service will pick a zone in the corresponding Compute Engine region. On a get request, zone will always be present.
HDFS Capacity (Gigabytes)	Indicates capacity of HDFS system running on cluster in GB.
HDFS DataNodes	Indicates the number of HDFS DataNodes that are running inside a cluster.
HDFS Storage Utilization (%)	The ratio of HDFS storage currently used.
Labels	The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a cluster.
Master Configuration Accelerators Accelerator Count	The number of the accelerator cards of this type exposed to this instance.
Master Configuration Accelerators Accelerator Type Uri	Full URL, partial URI, or short name of the accelerator type resource to expose to this instance.
Master Configuration Disk Configuration Boot Disk Size (Gigabytes)	Size of the boot disk (default is 500GB).
Master Configuration Disk Configuration Boot Disk Type	Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive).
Master Configuration Disk Configuration Local SSD Count	Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries.
Master Configuration Image Uri	The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version.
Master Configuration Instance Count	The number of VM instances in the instance group. For master instance groups, must be set to 1.
Master Configuration Instance Names	Output only. The list of instance names. Cloud Dataproc derives the names from clusterName, numInstances, and the instance group.
Master Configuration Is Preemptible	Specifies that this instance group contains preemptible instances.
Master Configuration Machine Type Uri	The Compute Engine machine type used for cluster instances.
Master Configuration Managed Group Configuration Instance Group Manager Name	Output only. The name of the Instance Group Manager for this group.
Master Configuration Managed Group Configuration Instance Template Name	Output only. The name of the Instance Template used for the Managed Instance Group.
Metrics Hdfs Metrics	The HDFS metrics.
Metrics Yarn Metrics	The YARN metrics.
Project Id	The identifier of the project the cluster belongs to.
Region	The region of the cluster these metrics belong to.
Running Jobs	Indicates the number of jobs that are running on a cluster.
Running Operations	Indicates the number of operations that are running on a cluster.
Secondary Worker Configuration Accelerators Accelerator Count	The number of the accelerator cards of this type exposed to this instance.
Secondary Worker Configuration Accelerators Accelerator Type Uri	Full URL, partial URI, or short name of the accelerator type resource to expose to this instance.
Secondary Worker Configuration Disk Configuration Boot Disk Size (Gigabytes)	Size of the boot disk (default is 500GB).
Secondary Worker Configuration Disk Configuration Boot Disk Type	Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive).
Secondary Worker Configuration Disk Configuration Local SSD Count	Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries.
Secondary Worker Configuration Image Uri	The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version.
Secondary Worker Configuration Instance Count	The number of VM instances in the instance group. For master instance groups, must be set to 1.
Secondary Worker Configuration Instance Names	Output only. The list of instance names. Cloud Dataproc derives the names from clusterName, numInstances, and the instance group.
Secondary Worker Configuration Is Preemptible	Specifies that this instance group contains preemptible instances.
Secondary Worker Configuration Machine Type Uri	The Compute Engine machine type used for cluster instances.
Secondary Worker Configuration Managed Group Configuration Instance Group Manager Name	Output only. The name of the Instance Group Manager for this group.
Secondary Worker Configuration Managed Group Configuration Instance Template Name	Output only. The name of the Instance Template used for the Managed Instance Group.
Software Configuration Image Version	The version of software inside the cluster. It must be one of the supported Cloud Dataproc Versions, such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version. If unspecified, it defaults to the latest version.
Software Configuration Properties	The properties to set on daemon config files.
Status Detail	Output only. Optional details of cluster's state.
Status State	Output only. The cluster's state.
Status State Start Time	Output only. Time when this state was entered.
Status Substate	Output only. Additional state information that includes status reported by the agent.
Submitted Jobs	Indicates the number of jobs that have been submitted to a cluster.
Submitted Operations	Indicates the number of operations that have been submitted to a cluster.
Unhealthy HDFS Blocks By Status	Indicates the number of unhealthy blocks inside the cluster.
Worker Configuration Accelerators Accelerator Count	The number of the accelerator cards of this type exposed to this instance.
Worker Configuration Accelerators Accelerator Type Uri	Full URL, partial URI, or short name of the accelerator type resource to expose to this instance.
Worker Configuration Disk Configuration Boot Disk Size (Gigabytes)	Size of the boot disk (default is 500GB).
Worker Configuration Disk Configuration Boot Disk Type	Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive).
Worker Configuration Disk Configuration Local SSD Count	Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries.
Worker Configuration Image Uri	The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version.
Worker Configuration Instance Count	The number of VM instances in the instance group. For master instance groups, must be set to 1.
Worker Configuration Instance Names	Output only. The list of instance names. Cloud Dataproc derives the names from clusterName, numInstances, and the instance group.
Worker Configuration Is Preemptible	Specifies that this instance group contains preemptible instances.
Worker Configuration Machine Type Uri	The Compute Engine machine type used for cluster instances.
Worker Configuration Managed Group Configuration Instance Group Manager Name	Output only. The name of the Instance Group Manager for this group.
Worker Configuration Managed Group Configuration Instance Template Name	Output only. The name of the Instance Template used for the Managed Instance Group.
YARN Active Applications	Indicates the number of active YARN applications.
YARN Allocated Memory (%)	The ratio of YARN memory allocated.
YARN Containers	Indicates the number of YARN containers.
YARN Memory Size (Gigabytes)	Indicates the YARN memory size in GB.
YARN NodeManagers	Indicates the number of YARN NodeManagers running inside cluster.
YARN Pending Memory Size (Gigabytes)	The current memory request that is pending to be fulfilled by the scheduler.
YARN Virtual CPU (Cores)	The amount of virtual CPU in YARN.

Job

Name	Description
Driver Control Files URI	Output only. If present, the location of miscellaneous control files which may be used as part of job setup and handling. If not present, control files may be placed in the same location as driver_output_uri.
Driver Output Resource URI	Output only. A URI pointing to the location of the stdout of the job's driver program.
Labels	The labels to associate with this job. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a job.
Placement Cluster Name	The name of the cluster where the job will be submitted.
Placement Cluster UUID	Output only. A cluster UUID generated by the Cloud Dataproc service when the job is submitted.
Project ID	The project ID in which this resource was created.
Reference Job ID	The job ID, which must be unique within the project. The job ID is generated by the server upon job submission or provided by the user as a means to perform retries without creating duplicate jobs. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or hyphens (-). The maximum length is 100 characters.
Region	The region in which this resource is located.
Scheduling Maximum Failure Rate (per Hour)	Maximum number of times a driver may be restarted as a result of driver terminating with non-zero code before job is reported failed. A job may be reported as thrashing if driver exits with non-zero code 4 times within 10 minute window. Maximum value is 10.
Status Details	Output only. Optional job state details, such as an error description if the state is ERROR.
Status State	Output only. A state message specifying the overall job state.
Status State Start Time	Output only. The time when this state was entered.
Status Substate	Output only. Additional state information, which includes status reported by the agent.

YARN Application

Name	Description
Project ID	The project ID in which this resource was created.
Reference Job ID	The identifer of the parent job.
YARN Applications Name	The application name.
YARN Applications Progress (%)	The numerical progress of the application, from 1 to 100.
YARN Applications State	The application state.
YARN Applications Tracking URL	The HTTP URL of the ApplicationMaster, HistoryServer, or TimelineServer that provides application-specific information. The URL uses the internal hostname, and requires a proxy server for resolution and, possibly, access.

Google Cloud Dataproc

🚧
Not Included in the BindPlane with Google Cloud Monitoring offering

Least Privileged User

📘
Deploying a Least Privileged User

Connection Parameters

Metrics

Cloud Dataproc Cluster

Job

YARN Application

🚧Not Included in the BindPlane with Google Cloud Monitoring offering

Least Privileged User

📘Deploying a Least Privileged User

Connection Parameters

Metrics

Cloud Dataproc Cluster

Job

YARN Application

🚧
Not Included in the BindPlane with Google Cloud Monitoring offering

📘
Deploying a Least Privileged User