Move from omp to kokkos loops in DirectSolver by AbdelhadiKara · Pull Request #229 · SciCompMod/GMGPolar

AbdelhadiKara · 2026-04-24T11:44:02Z

Merge Request - GuideLine Checklist

Guideline to check code before resolve WIP and approval, respectively.
As many checkboxes as possible should be ticked.

Checks by code author:

Always to be checked:

There is at least one issue associated with the pull request.
New code adheres with the coding guidelines
No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

If functions were changed or functionality was added:

Tests for new functionality has been added
A local test was succesful

If new functionality was added:

There is appropriate documentation of your work. (use doxygen style comments)

If new third party software is used:

Did you pay attention to its license? Please remember to add it to the wiki after successful merging.

If new mathematical methods or epidemiological terms are used:

Are new methods referenced? Did you provide further documentation?

Checks by code reviewer(s):

Is the code clean of development artifacts e.g., unnecessary comments, prints, ...
The ticket goals for each associated issue are reached or problems are clearly addressed (i.e., a new issue was introduced).
There are appropriate unit tests and they pass.
The git history is clean and linearized for the merge request. All reviewers should squash commits and write a simple and meaningful commit message.
Coverage report for new code is acceptable.
No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

EmilyBourne · 2026-04-29T12:18:58Z

 {
-    matrix.row_nz_index(row, offset) = col;
-    matrix.row_nz_entry(row, offset) += val;
+    Kokkos::atomic_store(&matrix.row_nz_index(row, offset), col);


Are you sure atomic_store is needed? The old value of matrix.row_nz_index(row, offset) is unused. Isn't this because there is not overlap between threads?

Not sure it is necessary, i mean the previous implementation does not lead to compilation issues.

Atomics are never used to fix compilation issues

I will remove it

@AbdelhadiKara @EmilyBourne

Quick follow up:
A-Give distribution can either be implemented with or without Atomics. Currently I use the strategy without.
Here I guarantee with the parallelization that nodes that are handled at the same time have a horizontal and vertical distance of >=2. Thus no race conditions can occur if they start modifying their adjacent 4 neighbor values.

However this approach while good for CPU execution could be slower on GPU. Here you may want to consider Atomic operations and all nodes can then be processed simultaneously.
Note that I never tested this approach.

I think it may be even best to keep the version without atomic since a += b; is not atomic (load, add, store) and enforcing it atomic can be very costly.

TLDR: What I would recommend: Keep current parallelism as it is and don't use atomic.

EmilyBourne · 2026-04-29T12:31:21Z

+void DirectSolver_CSR_LU_Give<LevelCacheType>::nodeBuildSolverMatrixGive(int i_r, int i_theta, const PolarGrid grid,
                                                                         const bool DirBC_Interior,
-                                                                         SparseMatrixCSR<double>& solver_matrix,
+                                                                         const SparseMatrixCSR<double>& solver_matrix,


Why is this extra const needed?

nodeBuildSolverMatrixGive is called from buildSolverMatrixCircleSection which is called from the kokkos loop. following the compiler complain , it appeared necessary to fix compilation issues.

I wonder if this explains why your tests are failing. If Kokkos insists on this being const then it does not expect it to be modified so it might create a copy per thread whose values are not copied back

As noted by https://kokkos.org/kokkos-core-wiki/API/core/macros-special/host_device_macros.html#kokkos-class-lambda KOKKOS_CLASS_LAMBDA captures by const value (copy) not by reference. This is why you can only call const methods and can't modify any class members.

If you want to modify a class member I think it would be safer/cleaner to use static methods and KOKKOS_LAMBDA instead of KOKKOS_CLASS_LAMBDA

The issue with converting this member function to a static inline one is that we will no longer have access to the getStencil() function, which utilizes private class member data.

But I think that this would be the correct way to do it.

Thus I would be in favor of making the Smother/matrixStencil.h functions all static inlined to make it possible to make the buildMatrixNode() static inlined.

For this we just need to move the Stencil = {1,2,3,...} definitions from private member to the static inlined helper function and perhaps add some more arguments where grid_ and DirBC_ here and there to not directly access member variables.

Or you can try to make it work with the current member function setup.

EmilyBourne · 2026-04-29T13:32:47Z

+                int circle_idx = stride * circle_task;
+                int i_r        = grid.numberSmootherCircles() - circle_task - 1;


Suggested change

int circle_idx = stride * circle_task;

int i_r = grid.numberSmootherCircles() - circle_task - 1;

int circle_idx = stride * circle_task;

int i_r = grid.numberSmootherCircles() - circle_idx - 1;

EmilyBourne · 2026-04-29T13:33:12Z

-    assign(values_, T(0));
-    assign(column_indices_, 0);
+    Kokkos::deep_copy(values_, T(0));
+    Kokkos::deep_copy(column_indices_, 0);


Why this change?

assign stands for std vector, here we have views. you suggest to revert this one ?

assign stands for std vector

Are you sure?

You don't seem to have changed the type of these objects in this PR so why would the copy need to change?

As I remember Julian preferred to keep the assign function to avoid significantly changing the code practices habitually used in GMGPolar. I thought there was an overload of assign containing a call to deep_copy

In LinearAlgebra/Vector/vector_operations.cpp I have implemented a
assign(Vector, value) method which you can use.
It is easier on the eyes and you could replace the current for loop in that implementation with your deep copy if you like to.

But if you guys are not a big fan of my assign() function, then use your deep copy here.

EmilyBourne · 2026-04-29T13:34:09Z

+/*
 template <typename T>
 const int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) const
 {
    assert(row >= 0 && row < rows_);
    assert(nz_index >= 0 && nz_index < row_nz_size(row));
    return column_indices_(row_start_indices_(row) + nz_index);
-}
+}*/

 template <typename T>
-int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index)
+KOKKOS_FUNCTION int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) const
 {
    assert(row >= 0 && row < rows_);
    assert(nz_index >= 0 && nz_index < row_nz_size(row));
    return column_indices_(row_start_indices_(row) + nz_index);
 }


This change is very strange. Why do you no longer need the setter?

i thought it was 2 getters, a standard one and the const one. The reason of this change was the need of non constant access when calling these methods inside updateMatrixElement. Keeping the two implementations leads to compilation issues.

If it is really 2 getters then I agree with this change. It could go in it's own PR if you want to be 100% sure

I don't understand why you would need non-const access?
Const access is more restrictive so it should be sufficient

Double checking the code though this is a setter not a getter as it returns an int&. A setter should not be marked const as using the setter usually leads to the returned integer being modified. This means the class has been modified which means it is not const. It is possible to trick the compiler into thinking a method is const when it is not but this is always a bad idea.
Knowing that an object is not modified allows the compiler to reorder operations for performance or to make copies of an unmodified object without worrying that changes to the copy won't appear in the original. Telling the compiler that something is const when it isn't will therefore lead to bugs that are very tricky to diagnose

These bugs will be hard to diagnose as they will not prevent compilation or cause crashes. They will just give wrong results. They may also be heisenbugs that disappear when prints are added or when debug mode is used

I realize that is really unsafe to do that it wil be reverted. Then I have to fix the issue of modifying the data inside the static function updateMatrixElement

A standard setter would not be sufficient since we need to increment the same entry with different values multiple times in the A-Give implementation.
Thus I return a reference which I then can modify.

You could add a setter-add function which instead of assigning, adds a value. (if you want to avoid the int& return value trick)

Sorry, after discussion with @tpadioleau it seems that I am wrong about this. Anything that can be used from GPU will be const as it must be captured by copy (to copy to GPU). However neither getters nor setters change the data stored in the SparseMatrixCSR class (i.e. the pointer still points at the same memory block).

This means that if a matrix should be modifiable from a parallel loop then the associated setter must be const.
This is a bit annoying when defining a getter and a setter as these cannot be differentiated by return type alone.

We probably need to keep the setter only, but everywhere in the code where we are currently using SparseMatrixCSR<double> const we should change to use SparseMatrixCSR<double const>. This way the constness is enforced in the pointer rather than a-posteriori in the getter.

This could/should be done in its own PR

EmilyBourne · 2026-04-29T13:34:33Z

-const T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) const
-{
-    assert(row >= 0 && row < rows_);
-    assert(nz_index >= 0 && nz_index < row_nz_size(row));
-    return values_(row_start_indices_(row) + nz_index);
-}
-
-template <typename T>
-T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index)
+KOKKOS_FUNCTION T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) const


As above. Why do you only keep the getter and remove the setter?

I think that both function here are needed.
The above one is a getter which I think gets used in my CSR Solver. The other one below is a setter which gets used in DirectSolver, Smoother during CSR matrix construction. We use a reference since we perform multiple += operations on the same entry.

Note that the CSR matrix should probably live on the same memory space as Vector while the COO matrix should be staying on the CPU as it is only used for the Mumps solver.

julianlitz · 2026-04-29T20:02:49Z

You certainly started with the most difficult example to port: Building a matrix using the Give method.
But I can assure you that if we solve this problem and find a good solution, the rest will be a piece of cake.

attempt to change the first loop

1d7f8b4

AbdelhadiKara changed the title ~~attempt to change the first loop~~ Move from omp to kokkos loops in DirectSolver Apr 24, 2026

AbdelhadiKara added 6 commits April 28, 2026 11:01

make methods const

0d65a25

copy pass PolarGrid

767007c

use atomic add operation, does not fix execution issue

725bc32

pass const reference, use atomic operations

5b41fb0

use deep copy instead of assign, use kokkos function for some methods

fba67b3

pass solver matrix as const reference

e12b139

EmilyBourne reviewed Apr 29, 2026

View reviewed changes

AbdelhadiKara added 2 commits April 29, 2026 16:58

remove use of atomic_store, fix index error

788a40d

revert changes, remove kokkos function use

178d80b

AbdelhadiKara added 4 commits April 30, 2026 10:53

move methods to static inline functions

8ef3c03

remove Stecil relaed methods

1759adf

extract stencils from class

978963c

remove commented arguments

dfb57b5

		int circle_idx = stride * circle_task;
		int i_r = grid.numberSmootherCircles() - circle_task - 1;

Conversation

AbdelhadiKara commented Apr 24, 2026

Merge Request - GuideLine Checklist

Checks by code author:

Checks by code reviewer(s):

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julianlitz Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julianlitz Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julianlitz Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AbdelhadiKara Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julianlitz Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julianlitz Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julianlitz commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

julianlitz Apr 29, 2026 •

edited

Loading

julianlitz Apr 29, 2026 •

edited

Loading

julianlitz Apr 29, 2026 •

edited

Loading

AbdelhadiKara Apr 29, 2026 •

edited

Loading

julianlitz Apr 29, 2026 •

edited

Loading

julianlitz Apr 29, 2026 •

edited

Loading

julianlitz commented Apr 29, 2026 •

edited

Loading