flasharray: fall back to array capacity when pod has no quota by genegr · Pull Request #13050 · apache/cloudstack

genegr · 2026-04-20T21:52:18Z

Description

This PR fixes FlashArrayAdapter.getManagedStorageStats() so that a FlashArray primary pool registered against a pod without an explicit quota reports real capacity to the CloudStack allocator instead of 0 / 0.

Problem

getManagedStorageStats() returns null whenever the pod footprint is 0, and otherwise derives capacity purely from the pod quota:

if (pod == null || pod.getFootprint() == 0) {
    return null;
}
Long capacityBytes = pod.getQuotaLimit();
Long usedBytes     = pod.getQuotaLimit() - (pod.getQuotaLimit() - pod.getFootprint());

A freshly-registered pool on a pod without a quota therefore surfaces as:

disksizetotal = 0
disksizeused  = 0

ClusterScopeStoragePoolAllocator treats a zero-capacity pool as ineligible and skips it. The user's first attempt to deploy onto the pool fails with Unable to find suitable primary storage, and there is no documented way to fix it from the CloudStack side — the operator has to discover that pod quotas drive the reported capacity and set one on the array by hand.

The secondary bug is the usedBytes math: pod.getQuotaLimit() - (pod.getQuotaLimit() - pod.getFootprint()) is just pod.getFootprint(), but with an extra NullPointerException surface when getQuotaLimit() returns null.

Fix

Report pod.getQuotaLimit() when present and non-zero; otherwise fall back to the array's total physical capacity via GET /arrays?space=true (a new getArrayTotalCapacity() helper).
Only return null when neither value is obtainable (i.e. the array REST is unreachable) — not when the pod is simply empty.
Report pod.getFootprint() as usedBytes, defaulting to 0 when the field is absent. Drops the NPE-prone math.

Expected behaviour: cmk list storagepools name=<pool> shows a non-zero disksizetotal (the pod quota if set, or the array total otherwise); subsequent deploys route to the pool normally.
Actual behaviour (before this PR): disksizetotal=0, the allocator skips the pool, deploys fail with Unable to find suitable primary storage, operator has to manually set a pod quota on the array.

Types of changes

Breaking change (fix or feature that would cause existing functionality to change)
New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Enhancement (improves an existing feature and functionality)
Cleanup (Code refactoring and cleanup, that may add test cases)
Build/CI
Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Major
Minor

Bug Severity

A freshly-registered pool is unusable for allocation until the operator discovers the undocumented pod-quota requirement and sets it by hand on the array. Not a BLOCKER (the pool still builds up; create-from-API with an explicit pool id still works), but any default-offering-based deploy hits it.

Screenshots (if appropriate):

N/A — backend adapter change, no UI surface.

How Has This Been Tested?

Tested against Purity 6.7 on a two-node KVM cluster with a FlashArray pool registered against a pod that has no quota set.

Register pool via cmk create storagepool ... provider=FlashArray url=https://<user>:<pass>@<fa>:443/api?pod=<name>&api_skiptlsvalidation=true.
cmk list storagepools name=<pool> — verified disksizetotal == array total physical capacity (matches GET /arrays?space=true → capacity = 29.5 TB on the test array).
cmk deploy virtualmachine ... using an offering tagged to the pool — succeeds.
cmk create volume + cmk attach volume — succeeds; disksizeused increments by the provisioned size.
Repeated with a pod that does have an explicit quota — verified the quota is reported instead of the array total (no behaviour change for that path).

Build: mvn -pl plugins/storage/volume/flasharray --also-make -am -DskipTests -Dcheckstyle.skip=false --batch-mode package passes with checkstyle enabled.

How did you try to break this feature and the system with this change?

Array unreachable during a stats refresh: getArrayTotalCapacity() catches the exception, logs a warning, returns null; the outer method then returns null and the allocator treats the pool as temporarily unavailable — same behaviour as any other transient storage-pool-stats failure. No crash, no NPE.
Pod exists but REST returns no quota field: pod.getQuotaLimit() is null → falls through to the array-total path → pool still reports real capacity.
Pod footprint absent: usedBytes defaults to 0. Verified with SELECT disksizeused FROM storage_pool WHERE name=... before any volume has been provisioned.
Array capacity field is non-numeric (defensive, not seen in practice): cap instanceof Number guards the cast; falls through to null without throwing.
Pod with explicit quota smaller than current footprint: still reports the quota (not clamped) — matches prior behaviour, not changed by this PR.

sureshanaparti · 2026-04-21T06:06:46Z

+            FlashArrayList<Map<String, Object>> list = GET("/arrays?space=true",
+                    new TypeReference<FlashArrayList<Map<String, Object>>>() {
+                    });
+            if (list != null && list.getItems() != null && !list.getItems().isEmpty()) {


Suggested change

if (list != null && list.getItems() != null && !list.getItems().isEmpty()) {

if (list != null && CollectionUtils.isNotEmpty(list.getItems())) {

sureshanaparti · 2026-04-21T06:07:30Z

@blueorangutan package

blueorangutan · 2026-04-21T06:08:04Z

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

codecov · 2026-04-21T06:16:22Z

Codecov Report

❌ Patch coverage is 0% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.68%. Comparing base (be89e6f) to head (ee38a87).

Files with missing lines	Patch %	Lines
...atastore/adapter/flasharray/FlashArrayAdapter.java	0.00%	18 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               4.22   #13050      +/-   ##
============================================
- Coverage     17.68%   17.68%   -0.01%     
  Complexity    15793    15793              
============================================
  Files          5922     5922              
  Lines        533096   533112      +16     
  Branches      65209    65214       +5     
============================================
  Hits          94275    94275              
- Misses       428181   428197      +16     
  Partials      10640    10640

Flag	Coverage Δ
uitests	`3.69% <ø> (ø)`
unittests	`18.76% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

blueorangutan · 2026-04-21T07:18:49Z

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17563

Copilot

Pull request overview

Fixes FlashArray primary storage stats reporting so pools registered against pods without an explicit quota report a non-zero total capacity (falling back to array physical capacity) instead of 0/0, restoring allocator eligibility.

Changes:

Update getManagedStorageStats() to use pod quota when present, otherwise fall back to array total capacity.
Simplify usedBytes to use pod footprint (defaulting to 0 if missing) and avoid NPE-prone arithmetic.
Add getArrayTotalCapacity() helper to query /arrays?space=true and extract capacity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T12:09:38Z

+                }
+            }
+        } catch (Exception e) {
+            logger.warn("Could not retrieve array total capacity", e);


The warning log in getArrayTotalCapacity() doesn’t include any identifying context (e.g., array URL / configured pod). When multiple FlashArray pools are configured, this makes it hard to trace which backend is failing. Consider including url (and possibly pod) in the message, and avoid logging a full stack trace at warn level if this is expected to be retried frequently.

Suggested change

logger.warn("Could not retrieve array total capacity", e);

logger.warn("Could not retrieve array total capacity for configured pod [{}]: {}", this.pod, e.getMessage());

logger.debug("Error retrieving array total capacity for configured pod [{}]", this.pod, e);

Copilot · 2026-04-21T12:09:38Z

+        if (capacityBytes == null || capacityBytes == 0) {
+            // Pod has no explicit quota set; report the array total physical
+            // capacity so the CloudStack allocator has a real ceiling to plan
+            // against rather than bailing out with a zero-capacity pool.
+            capacityBytes = getArrayTotalCapacity();
+        }


Calling GET("/arrays?space=true") inside getManagedStorageStats() can add a second REST request on every storage-stats refresh whenever a pod has no quota. Since FlashArrayAdapterFactory constructs a new adapter per call, this likely won’t be amortized/cached and could create avoidable API load. Consider memoizing the array capacity (e.g., static cache keyed by URL with a TTL) or persisting the discovered capacity into CloudStack pool details so subsequent stats calls don’t need to re-query /arrays each time.

sureshanaparti · 2026-04-21T12:21:29Z

@genegr can you check/address the copilot comments if relevant.

winterhazel · 2026-04-22T20:46:33Z

@genegr this one is also a simple bug fix. Could you rebase it over branch 4.20 to include it on the next 4.20/4.22 minor versions?

genegr · 2026-04-22T21:10:49Z

Pushed a new revision addressing the three review comments:

Use CollectionUtils.isNotEmpty(list.getItems()) in getArrayTotalCapacity() (@sureshanaparti).
The getArrayTotalCapacity() warn log now includes the FlashArray URL and pod, and the stack trace was demoted to DEBUG so transient per-refresh warnings do not spam the log.
Cached the discovered array total capacity in a static URL-keyed map with a 5-minute TTL. The capacity of a physical FlashArray only changes when hardware is added or removed, so a several-minute TTL is safe and avoids an extra /arrays?space=true call on every storage stats refresh for every pool that has no pod quota set.

Branch was also rebased onto current apache/cloudstack:main, which already contains the end-of-files/codespell fixes that the earlier pre-commit run flagged on unrelated files.

FlashArrayAdapter.getManagedStorageStats() returns null whenever the backing pod has no volumes (footprint == 0) and never reports anything other than the pod quota otherwise. A freshly-registered pool that sits on a pod without an explicit quota therefore shows disksizetotal=0, disksizeused=0 and the ClusterScopeStoragePoolAllocator refuses to allocate any volume against it (zero-capacity pool is skipped). The plugin is unusable until a pod quota is set manually on the array - which is not documented anywhere and not discoverable from the CloudStack side. Fix: fall back to the arrays total physical capacity (retrieved via GET /arrays?space=true) when the pod has no quota, or when the quota is zero. The used value falls back to the pod footprint, defaulting to 0 when absent. Only return null when no capacity value is obtainable at all, which now only happens if the array itself is unreachable. The math for usedBytes was also simplified: the previous form pod.getQuotaLimit() - (pod.getQuotaLimit() - pod.getFootprint()) is just pod.getFootprint() with an extra NPE risk when getQuotaLimit() is null.

boring-cyborg Bot added component:primary-storage component:storage labels Apr 20, 2026

sureshanaparti reviewed Apr 21, 2026

View reviewed changes

Flexibilize public IP selection (apache#11076)

9f96c9d

sureshanaparti requested a review from Copilot April 21, 2026 12:05

Copilot started reviewing on behalf of sureshanaparti April 21, 2026 12:05 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

genegr force-pushed the fix/flasharray-managed-stats-array-capacity-fallback branch from ee38a87 to 0e8387f Compare April 22, 2026 18:22

genegr mentioned this pull request Apr 22, 2026

flasharray/kvm/adaptive: NVMe-TCP transport for FlashArray primary storage #13061

Open

3 tasks

winterhazel changed the base branch from 4.22 to main April 22, 2026 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flasharray: fall back to array capacity when pod has no quota#13050

flasharray: fall back to array capacity when pod has no quota#13050
genegr wants to merge 2 commits intoapache:mainfrom
genegr:fix/flasharray-managed-stats-array-capacity-fallback

genegr commented Apr 20, 2026

Uh oh!

sureshanaparti Apr 21, 2026

Uh oh!

sureshanaparti commented Apr 21, 2026

Uh oh!

blueorangutan commented Apr 21, 2026

Uh oh!

codecov Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

blueorangutan commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

sureshanaparti commented Apr 21, 2026

Uh oh!

winterhazel commented Apr 22, 2026 •

edited

Loading

Uh oh!

genegr commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	if (list != null && list.getItems() != null && !list.getItems().isEmpty()) {
	if (list != null && CollectionUtils.isNotEmpty(list.getItems())) {

	logger.warn("Could not retrieve array total capacity", e);
	logger.warn("Could not retrieve array total capacity for configured pod [{}]: {}", this.pod, e.getMessage());
	logger.debug("Error retrieving array total capacity for configured pod [{}]", this.pod, e);

Conversation

genegr commented Apr 20, 2026

Description

Problem

Fix

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

Uh oh!

sureshanaparti Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

sureshanaparti commented Apr 21, 2026

Uh oh!

blueorangutan commented Apr 21, 2026

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

blueorangutan commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

sureshanaparti commented Apr 21, 2026

Uh oh!

winterhazel commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

genegr commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov Bot commented Apr 21, 2026 •

edited

Loading

winterhazel commented Apr 22, 2026 •

edited

Loading