Array Types Design

Requirements

We need to be able to represent roughly what Fortran allows, as it is know that the Fortran design allows compilers to deliver excellent performance and they are versatile enough and there is a lot of experience out there using them from 1990 onward. However we can go beyond Fortran and try to simplify / abstract some of the concepts, but we need to do at least what Fortran does.
We should follow Python's canonical approach where possible, using Python's typing and be consistent with the following documents:

Survey of Fortran arrays

These examples show all that Fortran allows, and we will try to come up the most natural Python equivalent in the next section.

Explicit-shape arrays

These arrays do not need an array descriptor, their lower and upper bound is always passed in as arguments (or implicitly 1 for lower bound), and they are always contiguous, so we know all information about these.

Note that the compiler can still pass these using array descriptor, but it doesn't have to, it can also pass them as just a pointer, because no other information is needed.

Default lower bound

subroutinef(n, r) integer, intent(in) :: n real(dp), intent(out) :: r(n) integer:: i do i =1, n r(i) =1.0_dp/ i**2enddoendsubroutinesubroutineg(m, n, A) integer, intent(in) :: m, n real(dp), intent(in) :: A(m, n) ... endsubroutine

These are a shortcut to:

subroutinef(n, r) integer, intent(in) :: n real(dp), intent(out) :: r(1:n) integer:: i do i =1, n r(i) =1.0_dp/ i**2enddoendsubroutinesubroutineg(m, n, A) integer, intent(in) :: m, n real(dp), intent(in) :: A(1:m, 1:n) ... endsubroutine

Custom lower bounds

subroutineprint_eigenvalues(kappa_min, kappa_max, lam) integer, intent(in) :: kappa_min, kappa_max real(dp), intent(in) :: lam(kappa_min:kappa_max) integer:: kappa do kappa = kappa_min, ubound(lam, 1) print*, kappa, lam(kappa) end doendsubroutine

Assumed-shape arrays

These arrays do need an array descriptor, their lower and upper bound as well as strides are passed in at runtime as part of the array (descriptor).

Default lower bounds

subroutinef(r) real(dp), intent(out) :: r(:) integer:: n, i n =size(r) do i =1, n r(i) =1.0_dp/ i**2enddoendsubroutinesubroutineg(A) real(dp), intent(in) :: A(:, :) ... endsubroutine

These are a shortcut to:

subroutinef(r) real(dp), intent(out) :: r(1:) integer:: n, i n =size(r) do i =1, n r(i) =1.0_dp/ i**2enddoendsubroutinesubroutineg(A) real(dp), intent(in) :: A(1:, 1:) ... endsubroutine

Custom lower bounds

subroutineprint_eigenvalues(kappa_min, lam) integer, intent(in) :: kappa_min real(dp), intent(in) :: lam(kappa_min:) integer:: kappa do kappa = kappa_min, ubound(lam, 1) print*, kappa, lam(kappa) end doendsubroutine

Generalization / Abstraction of Assumed-shape arrays

The assumed-shape arrays are a subset of the following generalization:

subroutinef(r) integer, dim :: n real(dp), intent(out) :: r(n) integer:: n, i n =size(r) do i =1, n r(i) =1.0_dp/ i**2enddoendsubroutinesubroutineg(A) integer, dim :: m, n real(dp), intent(in) :: A(m, n) ... endsubroutinesubroutineprint_eigenvalues(kappa_min, lam) integer, dim :: n integer, intent(in) :: kappa_min real(dp), intent(in) :: lam(kappa_min:n) integer:: kappa do kappa = kappa_min, ubound(lam, 1) print*, kappa, lam(kappa) end doendsubroutine

The dim :: n variable means "infer n at runtime from the actual size of the array that gets passed in". The assumed-shape array A(:,:,:) becomes a syntactic sugar to dim :: l, m, n; A(l,m,n), all three dimensions are different. However, one can declare them to be the same as follows:

subroutineg(A) integer, dim :: l, m, n real(dp), intent(in) :: A(l, m, n) ... endsubroutine

Furthermore, one can use the dim parameter in an expression such as:

functionf(A) result(r) integer, dim :: n real(dp), intent(in) :: A(n, n) real(dp), r(n**2) ... endsubroutine

At the ASR level, there should be an explicit expression for how to compute n at runtime, so the above case is a syntactic sugar for:

functionf(A) result(r) integer, dim :: n =size(A,1) real(dp), intent(in) :: A(n, n) real(dp), r(n**2) ... endsubroutine

One can do more complicated examples, such as:

functionf(A) result(r) integer, dim :: n = (size(A,1)-1)/2real(dp), intent(in) :: A(2*n+1, n**2) real(dp), r(n**2) ... endsubroutine

The compiler would check (if bounds checking is enabled) at runtime that the actual size of the array agrees with the computed n. For example A(5, 4) (n=2) and A(7, 9) (n=3) will pass, but A(4, 1) will fail, because n=(size(A,1)-1)/2 = (4-1)/2 = 3/2 = 1, but A(2*n+1, n**2) = A(3, 1) which is different to A(4, 1), so the array size is incompatible with the specification. One could use any runtime expression, including a user defined function:

functionf(A) result(r) integer, dim :: n = get_dimension_parameter(A) real(dp), intent(in) :: A(2*n+1, n**2) real(dp), r(n**2) ... endsubroutinereal(dp) pure function get_dimension_parameter(A) result(n) real(dp), intent(in) :: A(:,:) n = (size(A,1)-1)/2 end function

You can use n inside the function just like any other variable.

Note that the dimension parameter n can be any number that is inferred from the shape(A), and the actual dimensions are constructed using it. Few more examples:

functionf(A) result(r) integer, dim :: n =size(A,1) real(dp), intent(in) :: A(1:n, 1:n) real(dp), r(n**2) ... endsubroutinefunctiong(A) result(r) integer, dim :: n =size(A,1) real(dp), intent(in) :: A(0:n-1, 0:n-1) real(dp), r(n**2) ... endsubroutinefunctionh(n0, A) result(r) integer, dim :: n =size(A,1) integer, intent(in) :: n0 real(dp), intent(in) :: A(n0:n+n0-1, n0:n+n0-1) real(dp), r(n**2) ... endsubroutinefunctionh2(n0, A) result(r) integer, dim :: nupper =size(A,1)+n0-1integer, intent(in) :: n0 real(dp), intent(in) :: A(n0:nupper, n0:nupper) real(dp), r(n**2) ... endsubroutine

When an array is passed to the subroutine, the lower/upper bounds are remapped: the new lower bound is determined from the declaration, and then the upper bound is computed so that the array has the same size (and shape); from the new upper bound, the integer, dim parameter is computed (if present) and the array declaration is "bounds checked" to ensure the passed in array size is consistent with the declaration by checking each dimension of the passed in array with the computed runtime dimension in the declaration.

Local variables:

integer, dim :: n real(dp), allocatable :: H(n,n) n =10 allocate(H(n,n)) ! ok deallocate(H) allocate(H(10,10)) ! ok deallocate(H) allocate(H(5,5)) ! error: the dimensions is not equal to (n,n) deallocate(H) allocate(H(n,n+1)) ! error: the dimensions are not equal to (n,n) deallocate(H) n =5 allocate(H(5,5)) ! ok

At the point of calling "allocate", the dim variable n must be the size of the array being allocated. The easiest to assure that is to use the variable n itself exactly as in the declaration. Also, every time we read n, the Debug time check will ensure that n == size(H,1) and n == size(H,2). So we can assign to it, but we need to call allocate before we can use n.

However this has issues if we want to use another variable to allocate H. We would need to set n explicitly beforehand. That seems unfortunate, but presumably that would only be in a minority of cases.

The alternative design is that dim :: n cannot be used as a local variable. However that is unfortunate inside subroutines where it would be natural. And for local variables it would require a duplication of n, one for declaration, another one for the allocate statement. It seems it would make things compact if we allow n to be used as a local variable.

ASR Design

At the ASR level, it seems we can thus always define lower and upper bound of an array as an expression. That expression can contain pure (user or intrinsic) function calls, arguments of the function, as well as the internal integer, dim variables. The integer, dim variable in the local symbol will always have an initializer expression, to know how to compute it at runtime (this initializer can only contain size and shape intrinsic, not lower/upper_bound, as those are not yet determined). The frontend can infer this initializer in many common simpler cases (but perhaps not all), and the user can always specify it explicitly.

Rosetta Stone

We will "desugar" the Fortran examples, always specifying the lower bound explicitly, and using the "extended" more abstract/general "integer, dim" approach for assumed-shape arrays.

Examples from PEP 646

In this section we will do all examples from the PEP

1

fromtypingimportNewTypeHeight=NewType('Height', int) Width=NewType('Width', int) x: Array[float, Height, Width] =Array()

Corresponds to:

integer, dim :: height, width real, allocatable :: x(height, width) x = array() ! It will allocate `x` to some shape, and `height`, `weight` will contain the dimensions

2

Python:

fromtypingimportLiteralasLx: Array[float, L[480], L[640]] =Array()

Fortran:

real:: x(480, 640) x = array()

3

Python:

defpointwise_multiply( x: Array[*Shape], y: Array[*Shape] ) ->Array[*Shape]: ... x: Array[Height] y: Array[Width] z: Array[Height, Width] pointwise_multiply(x, x) # Validpointwise_multiply(x, y) # Errorpointwise_multiply(x, z) # Error

4

Python:

Shape=TypeVarTuple('Shape') Batch=NewType('Batch', int) Channels=NewType('Channels', int) defadd_batch_axis(x: Array[*Shape]) ->Array[Batch, *Shape]: ... defdel_batch_axis(x: Array[Batch, *Shape]) ->Array[*Shape]: ... defadd_batch_channels( x: Array[*Shape] ) ->Array[Batch, *Shape, Channels]: ... a: Array[Height, Width] b=add_batch_axis(a) # Inferred type is Array[Batch, Height, Width]c=del_batch_axis(b) # Array[Height, Width]d=add_batch_channels(a) # Array[Batch, Height, Width, Channels]

5

Python:

K=TypeVar('K') N=TypeVar('N') defmatrix_vector_multiply(x: Array[K, N], v: Array[N]) ->Array[K]: ... a: Array[Literal[64], Literal[32]] b: Array[Literal[32]] matrix_vector_multiply(a, b) # Result is Array[Literal[64]]

Fortran:

functionmatrix_vector_multiply(x, v) result(y) integer, dim :: K =size(x,1), N =size(x,2) ! Or equivalently `N =size(v)` or `N =size(v,1)` real, intent(in) :: x(K,N), v(N) real:: y(K) ... endfunction

6

Python:

N=TypeVar("N") defrepeat_each_element(x: Array[N]) ->Array[Mul[2, N]]: ...

Fortran:

functionrepeat_each_element(x) result(r) integer, dim :: N =size(x) real, intent(in) :: x(N) real:: r(2*N) ... endfunction

Fortran examples

F1

Fortran:

subroutinef(n, r) integer, intent(in) :: n real(dp), intent(out) :: r(n) integer:: i do i =1, n r(i) =1.0_dp/ i**2enddoendsubroutine

Python:

fromltypesimporti32, f64, IntentOutfromtypingimportTypeVar, Annotated, Arrayn=TypeVar('n') # `n` will be equal to the argument name `n`deff(n: i32, r: Annotated[Array[f64, n], IntentOut]): i: i32foriinrange(n): r[i] =1/i**2

F2

Fortran:

functionf(n) result(r) integer, intent(in) :: n real(dp) :: r(n) integer:: i do i =1, n r(i) =1.0_dp/ i**2enddoendsubroutinereal(dp) :: x(5) x = f(5)

Python:

fromltypesimporti32, f64, IntentOutfromtypingimportTypeVar, Annotated, Array, LiteralasLn=TypeVar('n') # `n` will be equal to the argument name `n`deff(n: i32) ->Array[f64, n]: i: i32r: Array[f64, n] =empty(n) foriinrange(n): r[i] =1/i**2returnrx: Array[f64, L(5)] x=f(5)

F3

Fortran:

functionf(n) result(r) integer, intent(in) :: n real(dp), allocatable :: r(:) integer:: i allocate(r(n)) do i =1, n r(i) =1.0_dp/ i**2enddoendsubroutinereal(dp) :: x(5) x = f(5) real(dp), allocatable :: y(:) y = f(5) deallocate(y) allocate(y(10)) y =1

Python:

fromltypesimporti32, f64, IntentOut, Allocatable, deallocatefromtypingimportTypeVar, Annotated, Array, LiteralasLm=TypeVar('m') deff(n: i32) ->Annotated[Array[f64, m], Allocatable]: i: i32r: Annotated[Array[f64, n], Allocatable] =empty(n) foriinrange(n): r[i] =1/i**2returnrx: Array[f64, L(5)] x=f(5) N=TypeVar('N') y: Annotated[Array[f64, N], Allocatable] y=f(5) delyy=empty(10)

F4

Note: here the variables n, k obtain their values when A is assigned out of the function. Then variable m is assigned in the next call, and variable k is checked that it is consistent with A and B.

Fortran:

integer, dim :: n, k, m real, allocatable :: A(n, k), B(k, m), C(n, m) call loadtxt("A.txt", A) call loadtxt("B.txt", B) C = matmul(A, B)

Python:

n=TypeVar('n') k=TypeVar('k') m=TypeVar('m') A: Annotated[Array[f32, n, k], Allocatable] B: Annotated[Array[f32, k, m], Allocatable] C: Annotated[Array[f32, n, m], Allocatable] A=np.loadtxt("A.txt") B=np.loadtxt("B.txt") C=np.matmul(A,B)

F5

Returning arrays whose size is the same as an input array.

Standard Fortran:

functionsqrt2(x) result(y) real, intent(in) :: x(:) real:: y(size(x)) y =sqrt(x) endfunction

Our extension:

functionsqrt2(x) result(y) integer, dim :: n =size(x) real, intent(in) :: x(n) real:: y(n) y =sqrt(x) endfunction

Python:

fromltypesimportf32fromtypingimportTypeVar, Arrayfromnumpyimportsize, sqrtn=TypeVar('n') defsqrt2(x: Array[f32, n]) ->Array[f32, n]: returnsqrt(x)

The other option is to return an allocatable array, but that approach is usually not recommended, because it might be slower:

Standard Fortran:

functionsqrt2(x) result(y) real, intent(in) :: x(:) real, allocatable :: y(:) y =sqrt(x) endfunction

Our extension:

functionsqrt2(x) result(y) integer, dim :: n =size(x), m real, intent(in) :: x(n) real, allocatable :: y(m) y =sqrt(x) endfunction

Python:

fromltypesimportf32fromtypingimportTypeVar, Array, Annotated, Allocatablefromnumpyimportsize, sqrtn=TypeVar('n') m=TypeVar('m') defsqrt2(x: Array[f32, n]) ->Annotated[Array[f32, m], Allocatable]: returnsqrt(x)

FAQ

Q: How to design Python syntax / typing for arguments and local arrays (allocatable attribute)
A: Annotated[Array[f64, n], Allocated], where n = TypeVar("n") can be the same name as an argument, in which case it will be reused

Q: How to design ASR in the most natural and abstract way
A: https://gitlab.com/lfortran/lfortran/-/issues/666

Q: Should ASR allow custom lower bounds?
A: For now it should, but we can later remove it if needed (https://gitlab.com/lfortran/lfortran/-/issues/666)

Array Types Design

Requirements

Survey of Fortran arrays

Explicit-shape arrays

Default lower bound

Custom lower bounds

Assumed-shape arrays

Default lower bounds

Custom lower bounds

Generalization / Abstraction of Assumed-shape arrays

ASR Design

Rosetta Stone

Examples from PEP 646

1

2

3

4

5

6

Fortran examples

F1

F2

F3

F4

F5

FAQ

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally