Large-Scale Simulation: Models, Algorithms, and Applications
Figure 2 shows that after modifying rvolumeconnect , variations in LFPs are on the order of machine precision. We note that repeatability is only guaranteed for a fixed number of CPU cores and fixed partitioning of the neural network. Figure 2. Comparison of LFPs between two simulations with identical inputs after modifying the rvolumeconnect function.
This is demonstrated by the blue circles in Figure 3 , which show the memory usage per core with increasing network size while maintaining 61 neurons per core and 2, connections per neuron. Despite a fixed number of neurons and connections per core, we observed a substantial increase in the memory usage per core. This became a significant hindrance to scalability when the memory usage per core surpassed the available memory per core, which is 3.
Beyond this network size, we were forced to reduce the number of cores per node so that each core could use a larger portion of the GB of shared memory. This significantly increased the computational cost. For example, the , neuron simulation in Figure 3 was partitioned onto 4, cores, but had a memory requirement of almost 22 GB per core, limiting the number of active cores per node to 5.
With a limit of 5 cores per node, the 4, cores had to be divided amongst nodes, effectively requiring 29, cores. This discrepancy would only grow as the neural network increased in size, requiring more memory per core. A linear least-squares fit to the data is indicated by the solid blue line in Figure 3. Assuming that the linear growth in memory usage continues beyond , neurons, it is projected that by 1.
Figure 3. The symbols represents measurements from simulations. The solid lines represent linear fits to the data.
My Shopping Bag
Dashed and dotted lines are the available memory on the Thunder system. An initial attempt to fix the various memory leaks proved to be tremendously labor intensive. The BDW-GC replaces standard memory allocation calls and automatically recycles memory when found to be inaccessible. Without the garbage collector, a simulation of , neurons was computationally intractable due to the large memory requirement limiting the active CPU cores per node to two.
With the BDW-GC, the memory usage is reduced by two orders of magnitude, enabling full use of each node. A linear least-squares fit to the data is indicated by the solid red line in Figure 3. The plot suggests that the memory usage still grows linearly with increasing neural network size, however, the rate of growth has decreased dramatically. Furthermore, the limiting system size at which the memory requirements surpass GB would increase to over million neurons.
The added overhead with any garbage collector is always a concern. To assess the added cost of the BDW-GC, we compared the wall clock time for setup and to integrate 1 simulated second. The results in Figure 4 reveal that both setup time and simulation time are slower for large networks with the BDW-GC. The wall clock time for setup of the , neuron simulation increased by a factor of 4 with the BDW-GC. Whether this is an acceptable increase in time will depend on the user and simulation parameters.
For example, if simulating large networks for fractions of a second, the setup time will dominate and the cost of the BDW-GC will be significant.
Algorithm for large-scale brain simulations
However, this effect will be much smaller if simulating a large network for tens of seconds. We view the added cost of the BDW-GC to be an acceptable tradeoff given the savings achieved by utilizing all of the cores on the node and by enabling simulations that could not otherwise be performed, even if utilizing a single core per node.
Figure 4. A systematic study would be required to determine the exact parameter space in which this error occurs. However, we did not observe this error when simulating up to 1.
- Good hair, Bad hair.
- DIVING TO ADVENTURE: 50 Years of Underwater Adventure.
- Marketing-Controlling - GAP-Analyse und Auswertung bei Dienstleistung (German Edition)!
- New Title 1 (Monkeybicycle).
The integer overflow is located in the buffer manager code, which contains variables of type short that are responsible for tracking remote messages. By changing the variables types to bit integers, the overflow issue was eliminated for all simulation parameters used in this work. The baseline model parameters were 2, synaptic connections per neuron and 10 Hz Poisson distributed spike train input. In the subsequent sections, we varied the connection density and spiking rate to explore the effects of these parameter changes on the performance of the modified version of PGENESIS.
- DPSIM Modelling: Dynamic Optimization in Large Scale Simulation Models.
- Customer Reviews.
- Article metrics.
- Air conditioning and Refrigeration Repair Made Easy : Complete Troubleshooting Charts And Repair Guides For Commercial.
- Whatever Gods May Be;
- Day One: Advanced IPv6 Configuration (Junos Networking Technologies Series).
- Join Kobo & start eReading today?
Wall clock times to integrate 1 simulation second, and corresponding weak scaling efficiencies, are presented in Figures 5A,B , respectively, for multiple network decompositions with varying number of neurons per core. For each partitioning, the weak scaling efficiency drops significantly between and 1, cores. However, even in this regime there are clear benefits to further parallelization. Figure 5. A Wall clock time for simulations of neural networks with 2, synapses per neuron.
DPSIM Modelling: Dynamic Optimization in Large Scale Simulation Models | SpringerLink
B Weak scaling efficiency, defined in Equation 1, for the timings in A. For this particular model and range of network sizes, we observe an approximate O N scaling of the minimum wall time with respect to the partitioning of neurons per core, where N is the total number of neurons in the system. This scaling is illustrated by the black dashed line in Figure 5A. While constant wall clock time would indicate ideal scalability, this is unrealistic for large neural networks given the non-local nature of synaptic connections.
Yet, the benefits of parallelization are clear when one considers the best possible scaling when increasing the system size on a fixed number of core is O N , indicated by the black dotted line in Figure 5A. At that scaling rate, simulations of millions to tens of millions of neurons would be computationally intractable.
To investigate the effect of connection density on the time to simulate 1 s of neural activity, we increased the number of synaptic connections per neuron from 2, to 20, We emphasize that, despite the higher connection density, the spiking rate remained fixed due to the negligible synaptic weights as described in section Methodology.
Due to the higher connection density, the minimum network size was increased to 3, neurons. A comparison of wall clock times is shown in Figure 6A and reveals that simulation times increased significantly with a higher connection density. This increase is not surprising given the added cost associated with routing each spike to more local and remote neurons at the higher connection density. Figure 6. A Wall clock time for simulations of neural networks with 2, and 20, synapses per neuron.
B Weak scaling efficiency, defined in Equation 1 , for the timings in A. The weak scaling efficiencies observed in Figure 6B show an improvement in weak scaling efficiency with increasing connectivity. It is reasonable to expect that a higher connection density would lead to higher communication cost, reducing parallel efficiency. However, even at 2, connections per neuron, it is likely that neurons will need to communicate with neurons on most remote CPU cores.
Whether that spike must be communicated to one neuron or 10 neurons on the remote core has limited effects on the communication costs.
It is likely that in the dilute limit of connection density, a notable effect on the parallel efficiency would be observed. Yet in the biologically realistic regime of thousands to tens of thousands of synaptic connections per neuron, we observe a significant effect on the wall clock time, but not on parallel efficiency. We increased the spiking rate from The resulting wall clock times to integrate 1 simulated second are shown in Figure 7. These results are surprising as we expected that both the frequency of communication and the amount of communicated data would increase, resulting in longer walk clock times for simulations.
A more thorough investigation of the communication algorithms employed in PGENESIS and networking configuration of the messaging passing interface MPI implementation are required to explain why the wall clock time is sensitive to connection density yet relatively insensitive to spiking rate. Figure 7. Effect of spiking rate on the wall clock time to integrate 1 simulation second.
The solid lines correspond to the original 10 Hz Poisson distributed spike train inputs which cause an average spiking rate of The dashed lines correspond to Hz Poisson distributed spike train inputs which cause an average spiking rate of 47 Hz. As illustrated in Figure 4 , the setup time was insignificant for small neural networks, but can dominate the computational cost when scaling to large networks.