A Matrix
is a struct that holds information about a matrix that has been allocated inside R. The data is:
long rows;
long cols;
RData data;
double * ptr;
where rows
and cols
are what you’d expect, data
is the reference to the data allocated by R, and ptr
is a pointer to the underlying data array. Allocation of a new matrix is done by R. ptr
allows access and setting of elements to be done without involving R, which in many cases is the bottleneck when using R for simulations or other situations that require working with individual elements. It also allows speed gains for reasons that may be surprising. Consider this code in R:
m[1:3, 2:4] <- m2
The underlying data structure might change when you assign to a Matrix
, due to R’s delayed copying. Imagine doing that in the middle of a loop that is executed millions of times. The equivalent code in D looks like this:
m[1..3, 2..4] = m2;
The dimensions of m2
can be confirmed to match m[1..3, 2..4]
and the corresponding elements of m2
can be copied into the right place in m
directly by using ptr
. There is never a need to reallocate m
and copy its elements for this operation.
auto m = Matrix(4, 6);
Allocates a new (r x c) Matrix in R. It has a unique name that’s stored in data
. The pointer ptr
points to the underlying data array of length rc. Note that ptr
changes over time (even frequently, depending on what you’re doing). It’s rarely a good idea to store ptr
anywhere else.
Allocates a new Matrix with the same dimensions as m
. Makes a copy of the data in m
.
Allocates a new Matrix with the same dimensions as sm
. Makes a copy of the data in sm
.
Copies the elements of v
into a newly allocated Matrix with dimensions (v.length x 1).
Copies the elements of v
into a newly allocated Matrix with dimensions (r x c). Fills by column, not row.
Allocates a new Matrix of the same dimension and copies the elements into it.
Should rarely be used in user code. Creates a new Matrix and copies the data into it. If the Robj inside rd
is a Matrix, the dimensions will be the same as that Matrix. If the Robj is a Vector, it will be a Matrix with one column.
Creates a new Matrix and copies the output of evaluating code
into it. If code
evaluates to a Matrix, the dimensions will be the same as that Matrix. If the Robj is a Vector, it will be a Matrix with one column.
auto m1 = Matrix(10, 25);
auto m2 = Matrix(m1); // R creates a copy of m1
auto m1ref = m1.reference(); // Creates a reference to avoid copying
auto m3 = m1ref[0..2, 0..2]; // The right side does not create a new matrix
auto v = Vector([1.1, 2.2, 3.3, 4.4]); // Use assignment to avoid the creation of a vector
auto m4 = Matrix(v); // Matrix with one column and four rows
auto m5 = Matrix(v, 2, 2); // Matrix with two columns and two rows
auto m6 = m5.dup;
Multidimensional slicing allows the use of standard matrix notation. Do keep in mind that D indexing starts at 0, while R indexing starts at 1.
m[1,4]
m[1..3, 4]
m[1..3, 4..8]
m[3, 0..$] // $ has the usual meaning; takes all elements of row 3
For convenience and readability, there is a special struct type that denotes all elements of a row or column:
m[3, _all] // All elements of the third row
m[_all, 3] // All elements of the third column
m[_all, _all] // The entire matrix
You can pull out a block with non-consecutive rows and/or columns similar to the way it’s done in R:
m[3, [1, 4, 9]] // Column 1, 4, and 9 of row 3
The R equivalent is m[4, c(2, 5, 10)]
.
Excessive allocation of new Matrix structs will quickly degrade performance. You are assured of not allocating a new matrix if you accessing a single element like this:
double x = m[4,7];
In fact, that code works directly with the pointer to the underlying data array, so R is not involved in the operation at all. The above is equivalent to something like this:
double x = m.ptr[39];
As long as you assign to elements directly with D, there will never be a new allocation, and you will get C-levels of performance. This code:
x[0, 1..4] = [1.1, 2.2, 3.3];
is the equivalent of something like
x.ptr[9..12] = [1.1, 2.2, 3.3];
R is not involved and there is a guarantee that there will not be a new allocation. You can use a Submatrix to create a reference to a Matrix. Operations on a Submatrix do not call into R and do not allocate a new Matrix. A Submatrix is created by calling reference
or sub
:
// These are equivalent
Submatrix sm = m.reference();
auto sm = m.sub;
This library is designed to facilitate quickly writing code that is correct and reasonably performant by default. Computationally intensive programs that make heavy use of matrix operations may require additional effort to get reasonable performance. Here are a few tips to keep in mind.
First, avoid allocation where possible. Reuse allocated Matrix and Vector memory as much as possible. There’s nothing specific to this library about this recommendation; this is the most basic optimization for any computationally-intensive program written in any language. You might be tempted to go further and do your own array allocations. That probably won’t be a good use of your time unless you have a faster memory allocator than the one used by R, which seems unlikely.
Second, most of the time the default operations for matrix multiplication, solution of equations, and so on will be fast. That’s because the underlying calculations are done by C and Fortran libraries such as BLAS. You won’t gain much by rewriting your code to use BLAS for matrix multiplication since that’s already being done. Where that strategy will work is when you allocate Matrix structs at the start of a loop and then reuse them. You can write your own calls to BLAS’s dgemm
to do the matrix multiplication. Beyond this type of optimization, though, you’re most likely wasting your time.
Third, anything that goes through R and returns more than a single element might allocate a new Matrix. As an example, this slicing operation returns a new (2 x 2) matrix:
m[0..2, 0..2];
Suppose you’re doing this operation:
m1 = m[0..2, 0..2];
The right side allocates a new Matrix, the elements of that block of m
will be copied into that new Matrix, and then those elements will be copied into m1
. Since the right side will never be used again, the new Matrix will be destroyed. There is no easy solution to this. In this example
auto m2 = m[0..2, 0..2];
you want the right side to return a new Matrix. You can be explicit that you don’t want to allocate a new Matrix by using a Submatrix, which is a reference to the underlying Matrix.
auto sm = m.reference;
// Alternative syntax: auto sm = m.sub;
// Submatrix, avoids a copy
m2[0..2, 0..2] = sm[0..2, 0..2];
// New Matrix; calls the Matrix constructor
Matrix m3 = sm[0..2, 0..2];
In general, if you want the best performance, you should be creating Submatrix structs all over and working with them. Why this is not the default behavior is explained in the next section.
One could make the argument that any time you take a slice, it should return a reference to the corresponding parts of the matrix, not a new matrix with those elements copied into it. That’s consistent with D’s array slicing, where v[1..4]
is a reference to that part of the array, and if you want a copy, you have to use dup
. Indeed, that was the initial design, but this fails spectacularly unless you’re really careful.
Consider how slices being references can go wrong in vanilla D code:
import std;
void main() {
auto z = [1.1, 2, 3];
writeln(z.ptr);
auto z2 = z[];
// Same as z.ptr
writeln(z2.ptr);
z ~= 4;
// Now they're different due to a reallocation
writeln(z.ptr);
writeln(z2.ptr);
}
If you’re not careful, z2
might not be pointing to what you think it’s pointing to, and you might end up with a disastrous outcome. The good news is that you’re probably not going to run into many problems writing vanilla D code with slices (at least I don’t).
The same problem exists when slicing returns a reference to the Matrix, but on a bigger scale. It’s quite common to do things like take a row or a column of a Matrix in numerical code, and operations such as modifying the elements of a Matrix will generally lead to a reallocation, much more so than with D’s built-in arrays. Holding a copy of the pointer simply does not work because the probability of it becoming invalid is so high. It’s more complicated than writing C.
I considered an alternative solution. Rather than storing a pointer to the underlying data array, I can store the name of the variable in the Submatrix. That adds considerable overhead. On every access, you have to request the Robj that goes with the name, and then you have to get the pointer to the underlying data array. It would be an understatement to say this is inefficient.
Something that might work is to have the Submatrix hold a pointer to the Matrix. Then on each access, grab the pointer to the data array. While I won’t rule out doing this in the future, since the syntax would be convenient and it would be consistent with other slicing in D, I’m hesitant to add a second pointer. Someone wanting speed, which is really the only reason to put up with the inconvenience of a reference type, is unlikely to want an extra level of indirection.
For better or worse, the current design requires you to explicitly specify that you want a reference. That’s the clearest for the reader of the code and delivers the best performance. Almost certainly something shorter than reference
will be used. You can limit the use of references to only those cases where they’re crucial for performance, and you can limit the set of opportunities to mess things up.
These work as expected, so there’s not much elaboration needed.
Returns a newly allocated Vector with the elements of row ii copied into it.
Returns a newly allocated Vector with the elements of column ii copied into it.
Returns a newly allocated Vector with the elements of the last row copied into it.
Returns a newly allocated Vector with the elements of the last column copied into it.
Returns a newly allocated Matrix holding the product of x
and y
. Note that this is matrix multiplication, not element-by-element multiplication. The equivalent of R’s %*%
operator.
Converts v
to a Matrix with one column, then does matrix multiplication.
Converts v
to a Matrix with one column, then does matrix multiplication.
Element-by-element multiplication of x and y. Explicit naming is used to avoid confusion.
Returns a newly allocated matrix holding the sum of x
and y
.
Returns a newly allocated matrix holding x - y
.
Returns a newly allocated matrix holding the element-by-element division x / y
.
Returns a newly allocated matrix holding ax
.
Returns a newly allocated matrix holding the result of adding a
to every element of x
.
Returns a newly allocated matrix holding the result of subtracting a
from every element of x
.
Returns a newly allocated matrix holding the result of dividing every element of x
by a
.
Returns a newly allocated matrix holding ax
.
Returns a newly allocated matrix holding the result of adding a
to every element of x
.
Returns a newly allocated matrix holding the result of subtracting every element of x
from a
.
Returns a newly allocated matrix holding the result of dividing a by every element of x
.
Returns a newly allocated Matrix holding the transpose of x
.
Following R, returns a newly allocated Matrix holding the inverse of x
. See the R documentation.
Solution of a system of equations. Solves aX=b
for X
. See the R documentation.
Solution of a system of equations. Solves aX=b
for X
. See the R documentation.
Same as solve(x)
.
Returns a newly allocated Vector holding the elements of the diagonal of x
. x
is required to be square.
Returns an (ii x ii) identity matrix.
Returns an (ii x ii) identity matrix.
Add v as a new column to m. A new Matrix is allocated. m is not affected.
Add v as a new row to m. A new Matrix is allocated. m is not affected.
Add arr as a new column to m. A new Matrix is allocated. m is not affected.
Add arr as a new row to m. A new Matrix is allocated. m is not affected.
A struct holding the results of a call to svd
. See the details here.
auto s = SVD(m);
Vector d = s.d; // See the R documentation for the interpretation of these members
Matrix u = s.u;
Matrix v = s.v;
A struct holding the results of a call to eigen
. See the details here.
auto e = Eigen(m);
Vector values = e.values; // See the R documentation for the interpretation of these members
Matrix vectors = m.vectors;
Creates a reference to one column of a matrix, for convenience. You can get and set individual elements. Is a range, so you can use it with foreach.
Creates a reference to one row of a matrix, for convenience. You can get and set individual elements. Is a range, so you can use it with foreach.
Used to fill all the elements of one column of a matrix safely. See the discussion in Fill for Vector.
Used to fill all the elements of one column of a matrix safely. See the discussion in Fill for Vector.
Used to fill all the elements of the diagonal of a matrix safely. See the discussion in Fill for Vector.
Used to fill all the elements above the diagonal of a matrix safely, filling by column. See the discussion in Fill for Vector.
Used to fill all the elements below the diagonal of a matrix safely, filling by column. See the discussion in Fill for Vector.
Used to fill all the elements of a matrix safely, filling by column. See the discussion in Fill for Vector.
Used to fill all the elements of a block of a matrix safely, filling by column. See the discussion in Fill for Vector.
Holds a matrix of int values.
Holds a matrix of bool values.