flasharray: fall back to array capacity when pod has no quota#13050
flasharray: fall back to array capacity when pod has no quota#13050genegr wants to merge 2 commits intoapache:mainfrom
Conversation
| FlashArrayList<Map<String, Object>> list = GET("/arrays?space=true", | ||
| new TypeReference<FlashArrayList<Map<String, Object>>>() { | ||
| }); | ||
| if (list != null && list.getItems() != null && !list.getItems().isEmpty()) { |
There was a problem hiding this comment.
| if (list != null && list.getItems() != null && !list.getItems().isEmpty()) { | |
| if (list != null && CollectionUtils.isNotEmpty(list.getItems())) { |
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #13050 +/- ##
============================================
- Coverage 17.68% 17.68% -0.01%
Complexity 15793 15793
============================================
Files 5922 5922
Lines 533096 533112 +16
Branches 65209 65214 +5
============================================
Hits 94275 94275
- Misses 428181 428197 +16
Partials 10640 10640
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17563 |
There was a problem hiding this comment.
Pull request overview
Fixes FlashArray primary storage stats reporting so pools registered against pods without an explicit quota report a non-zero total capacity (falling back to array physical capacity) instead of 0/0, restoring allocator eligibility.
Changes:
- Update
getManagedStorageStats()to use pod quota when present, otherwise fall back to array total capacity. - Simplify
usedBytesto use pod footprint (defaulting to0if missing) and avoid NPE-prone arithmetic. - Add
getArrayTotalCapacity()helper to query/arrays?space=trueand extractcapacity.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
| } catch (Exception e) { | ||
| logger.warn("Could not retrieve array total capacity", e); |
There was a problem hiding this comment.
The warning log in getArrayTotalCapacity() doesn’t include any identifying context (e.g., array URL / configured pod). When multiple FlashArray pools are configured, this makes it hard to trace which backend is failing. Consider including url (and possibly pod) in the message, and avoid logging a full stack trace at warn level if this is expected to be retried frequently.
| logger.warn("Could not retrieve array total capacity", e); | |
| logger.warn("Could not retrieve array total capacity for configured pod [{}]: {}", this.pod, e.getMessage()); | |
| logger.debug("Error retrieving array total capacity for configured pod [{}]", this.pod, e); |
| if (capacityBytes == null || capacityBytes == 0) { | ||
| // Pod has no explicit quota set; report the array total physical | ||
| // capacity so the CloudStack allocator has a real ceiling to plan | ||
| // against rather than bailing out with a zero-capacity pool. | ||
| capacityBytes = getArrayTotalCapacity(); | ||
| } |
There was a problem hiding this comment.
Calling GET("/arrays?space=true") inside getManagedStorageStats() can add a second REST request on every storage-stats refresh whenever a pod has no quota. Since FlashArrayAdapterFactory constructs a new adapter per call, this likely won’t be amortized/cached and could create avoidable API load. Consider memoizing the array capacity (e.g., static cache keyed by URL with a TTL) or persisting the discovered capacity into CloudStack pool details so subsequent stats calls don’t need to re-query /arrays each time.
|
@genegr can you check/address the copilot comments if relevant. |
ee38a87 to
0e8387f
Compare
|
@genegr this one is also a simple bug fix. Could you rebase it over branch |
|
Pushed a new revision addressing the three review comments:
Branch was also rebased onto current |
FlashArrayAdapter.getManagedStorageStats() returns null whenever the
backing pod has no volumes (footprint == 0) and never reports anything
other than the pod quota otherwise. A freshly-registered pool that sits
on a pod without an explicit quota therefore shows
disksizetotal=0, disksizeused=0
and the ClusterScopeStoragePoolAllocator refuses to allocate any volume
against it (zero-capacity pool is skipped). The plugin is unusable
until a pod quota is set manually on the array - which is not documented
anywhere and not discoverable from the CloudStack side.
Fix: fall back to the arrays total physical capacity (retrieved via
GET /arrays?space=true) when the pod has no quota, or when the quota
is zero. The used value falls back to the pod footprint, defaulting to
0 when absent. Only return null when no capacity value is obtainable at
all, which now only happens if the array itself is unreachable.
The math for usedBytes was also simplified: the previous form
pod.getQuotaLimit() - (pod.getQuotaLimit() - pod.getFootprint())
is just pod.getFootprint() with an extra NPE risk when getQuotaLimit()
is null.
Description
This PR fixes
FlashArrayAdapter.getManagedStorageStats()so that a FlashArray primary pool registered against a pod without an explicit quota reports real capacity to the CloudStack allocator instead of0 / 0.Problem
getManagedStorageStats()returnsnullwhenever the pod footprint is0, and otherwise derives capacity purely from the pod quota:A freshly-registered pool on a pod without a quota therefore surfaces as:
ClusterScopeStoragePoolAllocatortreats a zero-capacity pool as ineligible and skips it. The user's first attempt to deploy onto the pool fails withUnable to find suitable primary storage, and there is no documented way to fix it from the CloudStack side — the operator has to discover that pod quotas drive the reported capacity and set one on the array by hand.The secondary bug is the
usedBytesmath:pod.getQuotaLimit() - (pod.getQuotaLimit() - pod.getFootprint())is justpod.getFootprint(), but with an extra NullPointerException surface whengetQuotaLimit()returnsnull.Fix
pod.getQuotaLimit()when present and non-zero; otherwise fall back to the array's total physical capacity viaGET /arrays?space=true(a newgetArrayTotalCapacity()helper).nullwhen neither value is obtainable (i.e. the array REST is unreachable) — not when the pod is simply empty.pod.getFootprint()asusedBytes, defaulting to0when the field is absent. Drops the NPE-prone math.Expected behaviour:
cmk list storagepools name=<pool>shows a non-zerodisksizetotal(the pod quota if set, or the array total otherwise); subsequent deploys route to the pool normally.Actual behaviour (before this PR):
disksizetotal=0, the allocator skips the pool, deploys fail withUnable to find suitable primary storage, operator has to manually set a pod quota on the array.Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
A freshly-registered pool is unusable for allocation until the operator discovers the undocumented pod-quota requirement and sets it by hand on the array. Not a BLOCKER (the pool still builds up; create-from-API with an explicit pool id still works), but any default-offering-based deploy hits it.
Screenshots (if appropriate):
N/A — backend adapter change, no UI surface.
How Has This Been Tested?
Tested against Purity 6.7 on a two-node KVM cluster with a FlashArray pool registered against a pod that has no quota set.
cmk create storagepool ... provider=FlashArray url=https://<user>:<pass>@<fa>:443/api?pod=<name>&api_skiptlsvalidation=true.cmk list storagepools name=<pool>— verifieddisksizetotal== array total physical capacity (matchesGET /arrays?space=true→capacity= 29.5 TB on the test array).cmk deploy virtualmachine ...using an offering tagged to the pool — succeeds.cmk create volume+cmk attach volume— succeeds;disksizeusedincrements by the provisioned size.Build:
mvn -pl plugins/storage/volume/flasharray --also-make -am -DskipTests -Dcheckstyle.skip=false --batch-mode packagepasses with checkstyle enabled.How did you try to break this feature and the system with this change?
getArrayTotalCapacity()catches the exception, logs a warning, returnsnull; the outer method then returnsnulland the allocator treats the pool as temporarily unavailable — same behaviour as any other transient storage-pool-stats failure. No crash, no NPE.pod.getQuotaLimit()is null → falls through to the array-total path → pool still reports real capacity.usedBytesdefaults to0. Verified withSELECT disksizeused FROM storage_pool WHERE name=...before any volume has been provisioned.cap instanceof Numberguards the cast; falls through tonullwithout throwing.