Move from omp to kokkos loops in DirectSolver#229
Move from omp to kokkos loops in DirectSolver#229AbdelhadiKara wants to merge 13 commits intomainfrom
Conversation
| { | ||
| matrix.row_nz_index(row, offset) = col; | ||
| matrix.row_nz_entry(row, offset) += val; | ||
| Kokkos::atomic_store(&matrix.row_nz_index(row, offset), col); |
There was a problem hiding this comment.
Are you sure atomic_store is needed? The old value of matrix.row_nz_index(row, offset) is unused. Isn't this because there is not overlap between threads?
There was a problem hiding this comment.
Not sure it is necessary, i mean the previous implementation does not lead to compilation issues.
There was a problem hiding this comment.
Atomics are never used to fix compilation issues
There was a problem hiding this comment.
I will remove it
There was a problem hiding this comment.
Quick follow up:
A-Give distribution can either be implemented with or without Atomics. Currently I use the strategy without.
Here I guarantee with the parallelization that nodes that are handled at the same time have a horizontal and vertical distance of >=2. Thus no race conditions can occur if they start modifying their adjacent 4 neighbor values.
However this approach while good for CPU execution could be slower on GPU. Here you may want to consider Atomic operations and all nodes can then be processed simultaneously.
Note that I never tested this approach.
I think it may be even best to keep the version without atomic since a += b; is not atomic (load, add, store) and enforcing it atomic can be very costly.
TLDR: What I would recommend: Keep current parallelism as it is and don't use atomic.
| void DirectSolver_CSR_LU_Give<LevelCacheType>::nodeBuildSolverMatrixGive(int i_r, int i_theta, const PolarGrid grid, | ||
| const bool DirBC_Interior, | ||
| SparseMatrixCSR<double>& solver_matrix, | ||
| const SparseMatrixCSR<double>& solver_matrix, |
There was a problem hiding this comment.
Why is this extra const needed?
There was a problem hiding this comment.
nodeBuildSolverMatrixGive is called from buildSolverMatrixCircleSection which is called from the kokkos loop. following the compiler complain , it appeared necessary to fix compilation issues.
There was a problem hiding this comment.
I wonder if this explains why your tests are failing. If Kokkos insists on this being const then it does not expect it to be modified so it might create a copy per thread whose values are not copied back
There was a problem hiding this comment.
As noted by https://kokkos.org/kokkos-core-wiki/API/core/macros-special/host_device_macros.html#kokkos-class-lambda KOKKOS_CLASS_LAMBDA captures by const value (copy) not by reference. This is why you can only call const methods and can't modify any class members.
If you want to modify a class member I think it would be safer/cleaner to use static methods and KOKKOS_LAMBDA instead of KOKKOS_CLASS_LAMBDA
There was a problem hiding this comment.
The issue with converting this member function to a static inline one is that we will no longer have access to the getStencil() function, which utilizes private class member data.
But I think that this would be the correct way to do it.
Thus I would be in favor of making the Smother/matrixStencil.h functions all static inlined to make it possible to make the buildMatrixNode() static inlined.
For this we just need to move the Stencil = {1,2,3,...} definitions from private member to the static inlined helper function and perhaps add some more arguments where grid_ and DirBC_ here and there to not directly access member variables.
Or you can try to make it work with the current member function setup.
| int circle_idx = stride * circle_task; | ||
| int i_r = grid.numberSmootherCircles() - circle_task - 1; |
There was a problem hiding this comment.
| int circle_idx = stride * circle_task; | |
| int i_r = grid.numberSmootherCircles() - circle_task - 1; | |
| int circle_idx = stride * circle_task; | |
| int i_r = grid.numberSmootherCircles() - circle_idx - 1; |
| assign(values_, T(0)); | ||
| assign(column_indices_, 0); | ||
| Kokkos::deep_copy(values_, T(0)); | ||
| Kokkos::deep_copy(column_indices_, 0); |
There was a problem hiding this comment.
assign stands for std vector, here we have views. you suggest to revert this one ?
There was a problem hiding this comment.
assign stands for std vector
Are you sure?
You don't seem to have changed the type of these objects in this PR so why would the copy need to change?
There was a problem hiding this comment.
As I remember Julian preferred to keep the assign function to avoid significantly changing the code practices habitually used in GMGPolar. I thought there was an overload of assign containing a call to deep_copy
There was a problem hiding this comment.
In LinearAlgebra/Vector/vector_operations.cpp I have implemented a
assign(Vector, value) method which you can use.
It is easier on the eyes and you could replace the current for loop in that implementation with your deep copy if you like to.
But if you guys are not a big fan of my assign() function, then use your deep copy here.
| /* | ||
| template <typename T> | ||
| const int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) const | ||
| { | ||
| assert(row >= 0 && row < rows_); | ||
| assert(nz_index >= 0 && nz_index < row_nz_size(row)); | ||
| return column_indices_(row_start_indices_(row) + nz_index); | ||
| } | ||
| }*/ | ||
|
|
||
| template <typename T> | ||
| int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) | ||
| KOKKOS_FUNCTION int& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) const | ||
| { | ||
| assert(row >= 0 && row < rows_); | ||
| assert(nz_index >= 0 && nz_index < row_nz_size(row)); | ||
| return column_indices_(row_start_indices_(row) + nz_index); | ||
| } |
There was a problem hiding this comment.
This change is very strange. Why do you no longer need the setter?
There was a problem hiding this comment.
i thought it was 2 getters, a standard one and the const one. The reason of this change was the need of non constant access when calling these methods inside updateMatrixElement. Keeping the two implementations leads to compilation issues.
There was a problem hiding this comment.
If it is really 2 getters then I agree with this change. It could go in it's own PR if you want to be 100% sure
There was a problem hiding this comment.
I don't understand why you would need non-const access?
Const access is more restrictive so it should be sufficient
There was a problem hiding this comment.
Double checking the code though this is a setter not a getter as it returns an int&. A setter should not be marked const as using the setter usually leads to the returned integer being modified. This means the class has been modified which means it is not const. It is possible to trick the compiler into thinking a method is const when it is not but this is always a bad idea.
Knowing that an object is not modified allows the compiler to reorder operations for performance or to make copies of an unmodified object without worrying that changes to the copy won't appear in the original. Telling the compiler that something is const when it isn't will therefore lead to bugs that are very tricky to diagnose
There was a problem hiding this comment.
These bugs will be hard to diagnose as they will not prevent compilation or cause crashes. They will just give wrong results. They may also be heisenbugs that disappear when prints are added or when debug mode is used
There was a problem hiding this comment.
I realize that is really unsafe to do that it wil be reverted. Then I have to fix the issue of modifying the data inside the static function updateMatrixElement
There was a problem hiding this comment.
A standard setter would not be sufficient since we need to increment the same entry with different values multiple times in the A-Give implementation.
Thus I return a reference which I then can modify.
You could add a setter-add function which instead of assigning, adds a value. (if you want to avoid the int& return value trick)
There was a problem hiding this comment.
Sorry, after discussion with @tpadioleau it seems that I am wrong about this. Anything that can be used from GPU will be const as it must be captured by copy (to copy to GPU). However neither getters nor setters change the data stored in the SparseMatrixCSR class (i.e. the pointer still points at the same memory block).
This means that if a matrix should be modifiable from a parallel loop then the associated setter must be const.
This is a bit annoying when defining a getter and a setter as these cannot be differentiated by return type alone.
We probably need to keep the setter only, but everywhere in the code where we are currently using SparseMatrixCSR<double> const we should change to use SparseMatrixCSR<double const>. This way the constness is enforced in the pointer rather than a-posteriori in the getter.
There was a problem hiding this comment.
This could/should be done in its own PR
| const T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) const | ||
| { | ||
| assert(row >= 0 && row < rows_); | ||
| assert(nz_index >= 0 && nz_index < row_nz_size(row)); | ||
| return values_(row_start_indices_(row) + nz_index); | ||
| } | ||
|
|
||
| template <typename T> | ||
| T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) | ||
| KOKKOS_FUNCTION T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) const |
There was a problem hiding this comment.
As above. Why do you only keep the getter and remove the setter?
There was a problem hiding this comment.
I think that both function here are needed.
The above one is a getter which I think gets used in my CSR Solver. The other one below is a setter which gets used in DirectSolver, Smoother during CSR matrix construction. We use a reference since we perform multiple += operations on the same entry.
Note that the CSR matrix should probably live on the same memory space as Vector while the COO matrix should be staying on the CPU as it is only used for the Mumps solver.
|
You certainly started with the most difficult example to port: Building a matrix using the Give method. |
Merge Request - GuideLine Checklist
Guideline to check code before resolve WIP and approval, respectively.
As many checkboxes as possible should be ticked.
Checks by code author:
Always to be checked:
If functions were changed or functionality was added:
If new functionality was added:
If new third party software is used:
If new mathematical methods or epidemiological terms are used:
Checks by code reviewer(s):