Then, each MPI process performs steps of the Runge–Kutta method for corresponding panels. Matrices Q and R are split to K groups of rows (panels) so that each portion of data stores approximately equal numbers of non-zero values. Computations are paralleled via MPI on K cluster nodes, as follows. The method requires O ( N 2 ) additional memory for storing intermediate results. Then we employ the forth-order Runge–Kutta method, one step of which takes O ( N 4 ) time for dense matrices H and L and O ( N 3 ) time for sparse matrices. If the matrices Q and R are sparse, we employ the graph partitioning library, ParMetis, to minimize further MPI communications. Fortunately, it does not take a huge amount of memory and therefore can be run on a smaller number of computational nodes than the Data Preparation step. Scalable parallelization of this step is a challenging problem because of multiple data dependencies. While the Data Preparation step is very memory consuming, this step is time consuming. During this step, the matrix R is stored as a set of red-black trees (each row is stored as a separate tree) and, therefore, adding each calculated coefficient to the tree requires O ( l o g N ) operations, which lead to the total time complexity of the step equal to O ( N 5 l o g N ).ĭuring this step, we integrate the linear real-valued ODE system ( 10) over time. If both indices are sparse, we need O ( N 5 ) operations. If one of the indices s and n is sparse and the other is dense, time complexity can also be estimated as O ( N 5 ), thanks to the structure of F and Z. However, due to the specific structure of F and Z tensors, it is O ( N 5 ) operations only. For “dense” s-indexes and “dense” n-indices total time complexity should be equal to O ( N 6 ). Hence, calculating the product of the number of elements and time complexity of calculating of each element, we can estimate overall time complexity as follows. Therefore, for every r s n tensor, the number of nonzero coefficients z j l n, f k l s, z k l n, f j l s varies from O ( N ) to O ( N 2 ), which results in maximal complexity of every r s n calculation equal to O ( N 4 ). In our prior work, we noted that there exists O ( N ) “dense” sections containing O ( N 2 ) elements, and O ( N 2 ) “sparse” sections containing O ( N ) elements. Tensors F and Z are filled in such a way that each of their two-dimensional plane sections contains from O ( N ) (“sparse” section) to O ( N 2 ) (“dense” section) elements. Here, we present a parallel cluster-based implementation of the algorithm and demonstrate that it allows us to integrate a sparse Lindbladian model of the dimension N = 2000 and a dense random Lindbladian model of the dimension N = 200 by using 25 nodes with 64 GB RAM per node. However, infeasible memory costs remains a serious obstacle on the way to large models. Recently, we presented an implementation of the transformation with the computational complexity, scaling as O ( N 5 l o g N ) for dense Lindbaldians and O ( N 3 l o g N ) for sparse ones. By using the generalized Gell–Mann matrices as a basis, any Lindblad equation can be transformed into a system of ordinary differential equations with real coefficients. In this paper, we address master equations of the Lindblad form, which are a popular theoretical tools in quantum optics, cavity quantum electrodynamics, and optomechanics. These systems are often modeled by using Markovian quantum master equations describing the evolution of the system density operators. With their constantly increasing peak performance and memory capacity, modern supercomputers offer new perspectives on numerical studies of open many-body quantum systems.
0 Comments
Leave a Reply. |