Skip to content

Move from omp to kokkos loops in DirectSolver#229

Draft
AbdelhadiKara wants to merge 13 commits intomainfrom
akara_directsolver_loops
Draft

Move from omp to kokkos loops in DirectSolver#229
AbdelhadiKara wants to merge 13 commits intomainfrom
akara_directsolver_loops

Conversation

@AbdelhadiKara
Copy link
Copy Markdown
Collaborator

Merge Request - GuideLine Checklist

Guideline to check code before resolve WIP and approval, respectively.
As many checkboxes as possible should be ticked.

Checks by code author:

Always to be checked:

  • There is at least one issue associated with the pull request.
  • New code adheres with the coding guidelines
  • No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

If functions were changed or functionality was added:

  • Tests for new functionality has been added
  • A local test was succesful

If new functionality was added:

  • There is appropriate documentation of your work. (use doxygen style comments)

If new third party software is used:

  • Did you pay attention to its license? Please remember to add it to the wiki after successful merging.

If new mathematical methods or epidemiological terms are used:

  • Are new methods referenced? Did you provide further documentation?

Checks by code reviewer(s):

  • Is the code clean of development artifacts e.g., unnecessary comments, prints, ...
  • The ticket goals for each associated issue are reached or problems are clearly addressed (i.e., a new issue was introduced).
  • There are appropriate unit tests and they pass.
  • The git history is clean and linearized for the merge request. All reviewers should squash commits and write a simple and meaningful commit message.
  • Coverage report for new code is acceptable.
  • No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

@AbdelhadiKara AbdelhadiKara changed the title attempt to change the first loop Move from omp to kokkos loops in DirectSolver Apr 24, 2026
{
matrix.row_nz_index(row, offset) = col;
matrix.row_nz_entry(row, offset) += val;
Kokkos::atomic_store(&matrix.row_nz_index(row, offset), col);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure atomic_store is needed? The old value of matrix.row_nz_index(row, offset) is unused. Isn't this because there is not overlap between threads?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it is necessary, i mean the previous implementation does not lead to compilation issues.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomics are never used to fix compilation issues

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it

Copy link
Copy Markdown
Collaborator

@julianlitz julianlitz Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AbdelhadiKara @EmilyBourne

Quick follow up:
A-Give distribution can either be implemented with or without Atomics. Currently I use the strategy without.
Here I guarantee with the parallelization that nodes that are handled at the same time have a horizontal and vertical distance of >=2. Thus no race conditions can occur if they start modifying their adjacent 4 neighbor values.

However this approach while good for CPU execution could be slower on GPU. Here you may want to consider Atomic operations and all nodes can then be processed simultaneously.
Note that I never tested this approach.

I think it may be even best to keep the version without atomic since a += b; is not atomic (load, add, store) and enforcing it atomic can be very costly.

TLDR: What I would recommend: Keep current parallelism as it is and don't use atomic.

void DirectSolver_CSR_LU_Give<LevelCacheType>::nodeBuildSolverMatrixGive(int i_r, int i_theta, const PolarGrid grid,
const bool DirBC_Interior,
SparseMatrixCSR<double>& solver_matrix,
const SparseMatrixCSR<double>& solver_matrix,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this extra const needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeBuildSolverMatrixGive is called from buildSolverMatrixCircleSection which is called from the kokkos loop. following the compiler complain , it appeared necessary to fix compilation issues.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this explains why your tests are failing. If Kokkos insists on this being const then it does not expect it to be modified so it might create a copy per thread whose values are not copied back

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted by https://kokkos.org/kokkos-core-wiki/API/core/macros-special/host_device_macros.html#kokkos-class-lambda KOKKOS_CLASS_LAMBDA captures by const value (copy) not by reference. This is why you can only call const methods and can't modify any class members.

If you want to modify a class member I think it would be safer/cleaner to use static methods and KOKKOS_LAMBDA instead of KOKKOS_CLASS_LAMBDA

Copy link
Copy Markdown
Collaborator

@julianlitz julianlitz Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with converting this member function to a static inline one is that we will no longer have access to the getStencil() function, which utilizes private class member data.

But I think that this would be the correct way to do it.

Thus I would be in favor of making the Smother/matrixStencil.h functions all static inlined to make it possible to make the buildMatrixNode() static inlined.

For this we just need to move the Stencil = {1,2,3,...} definitions from private member to the static inlined helper function and perhaps add some more arguments where grid_ and DirBC_ here and there to not directly access member variables.

Or you can try to make it work with the current member function setup.

Comment on lines +822 to +823
int circle_idx = stride * circle_task;
int i_r = grid.numberSmootherCircles() - circle_task - 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int circle_idx = stride * circle_task;
int i_r = grid.numberSmootherCircles() - circle_task - 1;
int circle_idx = stride * circle_task;
int i_r = grid.numberSmootherCircles() - circle_idx - 1;

Comment on lines +199 to +201
assign(values_, T(0));
assign(column_indices_, 0);
Kokkos::deep_copy(values_, T(0));
Kokkos::deep_copy(column_indices_, 0);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign stands for std vector, here we have views. you suggest to revert this one ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign stands for std vector

Are you sure?

You don't seem to have changed the type of these objects in this PR so why would the copy need to change?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I remember Julian preferred to keep the assign function to avoid significantly changing the code practices habitually used in GMGPolar. I thought there was an overload of assign containing a call to deep_copy

Copy link
Copy Markdown
Collaborator

@julianlitz julianlitz Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LinearAlgebra/Vector/vector_operations.cpp I have implemented a
assign(Vector, value) method which you can use.
It is easier on the eyes and you could replace the current for loop in that implementation with your deep copy if you like to.

But if you guys are not a big fan of my assign() function, then use your deep copy here.

Comment on lines 282 to 297
/*
template <typename T>
const int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) const
{
assert(row >= 0 && row < rows_);
assert(nz_index >= 0 && nz_index < row_nz_size(row));
return column_indices_(row_start_indices_(row) + nz_index);
}
}*/

template <typename T>
int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index)
KOKKOS_FUNCTION int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) const
{
assert(row >= 0 && row < rows_);
assert(nz_index >= 0 && nz_index < row_nz_size(row));
return column_indices_(row_start_indices_(row) + nz_index);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is very strange. Why do you no longer need the setter?

Copy link
Copy Markdown
Collaborator Author

@AbdelhadiKara AbdelhadiKara Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought it was 2 getters, a standard one and the const one. The reason of this change was the need of non constant access when calling these methods inside updateMatrixElement. Keeping the two implementations leads to compilation issues.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is really 2 getters then I agree with this change. It could go in it's own PR if you want to be 100% sure

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you would need non-const access?
Const access is more restrictive so it should be sufficient

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking the code though this is a setter not a getter as it returns an int&. A setter should not be marked const as using the setter usually leads to the returned integer being modified. This means the class has been modified which means it is not const. It is possible to trick the compiler into thinking a method is const when it is not but this is always a bad idea.
Knowing that an object is not modified allows the compiler to reorder operations for performance or to make copies of an unmodified object without worrying that changes to the copy won't appear in the original. Telling the compiler that something is const when it isn't will therefore lead to bugs that are very tricky to diagnose

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bugs will be hard to diagnose as they will not prevent compilation or cause crashes. They will just give wrong results. They may also be heisenbugs that disappear when prints are added or when debug mode is used

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that is really unsafe to do that it wil be reverted. Then I have to fix the issue of modifying the data inside the static function updateMatrixElement

Copy link
Copy Markdown
Collaborator

@julianlitz julianlitz Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A standard setter would not be sufficient since we need to increment the same entry with different values multiple times in the A-Give implementation.
Thus I return a reference which I then can modify.

You could add a setter-add function which instead of assigning, adds a value. (if you want to avoid the int& return value trick)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, after discussion with @tpadioleau it seems that I am wrong about this. Anything that can be used from GPU will be const as it must be captured by copy (to copy to GPU). However neither getters nor setters change the data stored in the SparseMatrixCSR class (i.e. the pointer still points at the same memory block).

This means that if a matrix should be modifiable from a parallel loop then the associated setter must be const.
This is a bit annoying when defining a getter and a setter as these cannot be differentiated by return type alone.

We probably need to keep the setter only, but everywhere in the code where we are currently using SparseMatrixCSR<double> const we should change to use SparseMatrixCSR<double const>. This way the constness is enforced in the pointer rather than a-posteriori in the getter.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could/should be done in its own PR

Comment on lines +299 to +300
const T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) const
{
assert(row >= 0 && row < rows_);
assert(nz_index >= 0 && nz_index < row_nz_size(row));
return values_(row_start_indices_(row) + nz_index);
}

template <typename T>
T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index)
KOKKOS_FUNCTION T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) const
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above. Why do you only keep the getter and remove the setter?

Copy link
Copy Markdown
Collaborator

@julianlitz julianlitz Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that both function here are needed.
The above one is a getter which I think gets used in my CSR Solver. The other one below is a setter which gets used in DirectSolver, Smoother during CSR matrix construction. We use a reference since we perform multiple += operations on the same entry.

Note that the CSR matrix should probably live on the same memory space as Vector while the COO matrix should be staying on the CPU as it is only used for the Mumps solver.

@julianlitz
Copy link
Copy Markdown
Collaborator

julianlitz commented Apr 29, 2026

You certainly started with the most difficult example to port: Building a matrix using the Give method.
But I can assure you that if we solve this problem and find a good solution, the rest will be a piece of cake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants