# A Spiking Neural P system simulator based on CUDA

### Francis George Cabarle<sup>1</sup>, **Henry Adorna**<sup>1</sup>, Miguel Martínez-del-Amor<sup>2</sup>

<sup>1</sup>Algorithms & Complexity Lab, Dept. of Computer Science, University of the Philippines Diliman *fccabarle@up.edu.ph, ha@dcs.upd.edu.ph*<sup>2</sup>Research Group on Natural Computing, University of Seville, Spain *mdelamor@us.es* 

25.August.2011

## What did we do?

### Our Share

- Implementation of an SN P Systems without delay in GPU using (Py)CUDA capitalizing on the fact that SN P systems without delay is represented by a matrix. Also, that its (SN P Systems') operation is implementable by some matrix operation.
- As expected, the implementation base on CUDA performs better than that implemented in CPU.



### Introduction

Main Goal Early Suggestions SN P System and its Simulator Our Suggestion Simulation Results Conclusion

Motivation Simulators My first CMC paper . . .

### Membrane Computing



Symbolic presentation [1[2]2[3]3[4[5]5[6[8]8[9]9]6[7]7]4]1

### Introduction

Main Goal Early Suggestions SN P System and its Simulator Our Suggestion Simulation Results Conclusion

Motivation Simulators My first CMC paper . . .

### Implementation ... anyone?







(ref: digstuffs.com)



Motivation Simulators My first CMC paper . . .

### Early efforts ... (via P Page) since 2002 ....

January 2002: Transition P Systems Simulator.: by Angel Baranda, 'Natural Computing Group of Madrid (Spain). May 2002: A Membrane Systems Simulator.: by Gabriel Ciobanu and Dorin Paraschiv September 2003; SubLP-Studio v0.1; by Alexandros Georgiou, University of Sheffield, UK October 2004: SimCM Author: M. Isabel Nepomuceno Chamorro, Natural Computing Group, University of Sevilla, Spain November 2004: PSim: by Group for Models of Natural Computing (MNC), University of Verona, Italy September 2005: Simulation Software for Membrane Approximation Algorithm: by T. Nishida, Toyama Prefectural University. Japan March 2006: Two simulators - Vibrio Fischeri and Dynamical Probabilistic P systems: by P. Cazzaniga, D. Pescini, Universita' di Milano-Bicocca, Milan, Italy B July 2006: Cyto-Sim: Biological compartment simulator: Microsoft Research - University of Trento Centre for Computational and Systems Biology, Trento, Italy, August 2006: simulators for conformon P systems: Pierluigi Frisco  $\mathbf{0}$ November 2006: Simulators for biological processes available at the University of Sheffield, UK April 2007: Spiking Neural P Systems Simulator, by M.A. Gutierrez Naranjo and D. Ramirez Martinez, University of Sevilla, Spain March 2009. MetaPlab: a virtual laboratory for modeling biological systems by MP systems. University of Verona, Italy,

#### Introduction

Main Goal Early Suggestions SN P System and its Simulator Our Suggestion Simulation Results Conclusion

Motivation Simulators My first CMC paper . . .

### Primary motivation of this work ....



### Sevilla, (2009) Xiangxiang Zeng (SN P System), Miguel Martinez-del-Amor (GPGPU with CUDA), myself

Matrix Representation of Spiking Neural P Systems. Int. Conf. on Membrane Computing 2010

Xiangxiang Zeng, Henry Adorna, Miguel A. Martnez-del-Amor,

Lingiang Pan, Mario J. Perez-Jimenez



CMC12-2011, UPEC, 23-26 August 2011

A Spiking Neural P system simulator based on CUDA

**Motivating Questions** 

### Question

- How do we implement the parallelism in P Systems, in particular SN P Systems?
- Is there a hardware capable enough to implement parallelism of P Systems?



Previous works Our initial share . . .

### Since 2004

### G. Ciobanu, G. Wenyuan:

P Systems Running on a Cluster of Computers. Lecture Notes in Computer Science, 2933, 123-139, 2004.

### Van Nguyen, David Kearney, Gianpaolo Gioiosa:

- Balancing Performance, Flexibility, and Scalability in a Parallel Computing Platform for Membrane Computing Applications. Workshop on Membrane Computing 2007: 385-413
- An Implementation of Membrane Computing Using Reconfigurable Hardware. Computing and Informatics 27(3+): 551-569 (2008)
- An Algorithm for Non-deterministic Object Distribution in P Systems and Its Implementation in Hardware. Workshop on Membrane Computing 2008: 325-354
- A Region-Oriented Hardware Implementation for Membrane Computing Applications. Workshop on Membrane Computing 2009: 385-409

### Jose M. Cecilia, Jose M. Garca, Gines D. Guerrero, Miguel A. Martinez-del-Amor, Ignacio Perez-Hurtado, Mario J. Perez-Jimenez:



Simulating a P system based efficient solution to SAT by using GPUs. J. Log. Algebr. Program. 79(6): 317-325 (2010)





Previous works Our initial share . . .

### SN P Systems without delay via GPU

Matrix Representation Configuration vector  $(C_k)$ :  $C_0 = (2, 1, 1)$ Spiking vectors  $(S_k)$ : (1, 0, 1, 1, 0), (0, 1, 1, 1, 0)Spiking transition matrix  $(M_n)$ :



Next configuration is calculated by:  $C_{k+1} = C_k + S_k M_{\Pi}$ Matrix operations are very optimized on the GPU. A GPU based simulator for SNP system.



F. Cabarle, H. Adorna, M.A. Martnez-del-Amor. Simulating Spiking Neural P System without Delay using GPU., 9th



### BWMC.

SN P system Simulator

### SN P System

Spiking Neural P (SNP) systems: directed graph inspired by **neurons** connected by axons w/ synapses [lonescu et al. 2006]



(ref:heatonresearch.com)



SN P system Simulator

## SN P Systems

An SNP system is a construct of the form:

 $\Pi = (O, \sigma_1, \ldots, \sigma_m, syn, in, out),$ 

1.  $O = \{a\}$ , alphabet of only one object, the system spike *a*.

2.  $\sigma_1, \ldots, \sigma_m$  are *m* neurons of the form:  $\sigma_i = (n_i, R_i), 1 \le i \le m$ .

- a)  $n_i \ge 0$ , initial spike *a* in neuron  $\sigma_i$
- b) R<sub>i</sub>, finite set of rules of w/ 2 forms: (b-1) Spiking rules, E/a<sup>c</sup> → a, E is a reg. exp. over a, c ≥ 1. (b-2) Forgetting rules, a<sup>s</sup> → λ, s ≥ 1, such that for each rule E/a<sup>c</sup> → a of type (b-1) from R<sub>i</sub>, a<sup>s</sup> ∉ L(E).
- 3.  $syn = \{(i,j) \mid 1 \le i, j \le m, i \ne j\}$  are synapses between neurons.
- 4. *in*, *out*  $\in$  {1, 2, ..., *m*}, input & output neurons.

SN P system Simulator

### From Seville and India Group

- (April 2007) Spiking Neural P Systems Simulator, by M.A. Gutierrez Naranjo and D. Ramirez Martinez, University of Sevilla, Spain.
- (2011, CMC 12) Simulation of Spiking Neural P Systems Using Pnet Lab, by V.P. Metta, K. Krithivasan, D. Garg, India
- (2011, CMC 12) P Lingua based Simulator for Spiking Neural P Systems, L.F. Macias-Ramos, et al., University of Sevilla, Spain.



SN P system Simulator

### **SN P Matrix Representation**

Spiking transition matrix  $M_{SNP}$  is a matrix comprised of  $a_{ij}$  elements where  $a_{ij}$  is given as

### Definition

$$a_{ij} = \begin{cases} -c, & \text{rule } r_i \text{ is in } \sigma_j \text{ and is applied consuming } c \text{ spikes;} \\ p, & \text{rule } r_i \text{ is in } \sigma_s \text{ } (s \neq j \text{ and } (s, j) \in syn) \\ & \text{and is applied producing } p \text{ spikes in total;} \\ 0, & \text{rule } r_i \text{ is in } \sigma_s \text{ } (s \neq j \text{ and } (s, j) \notin syn). \end{cases}$$



SN P system Simulator

### An SN P Matrix

- *Configuration vector C<sub>k</sub>*: C<sub>0</sub> =< 2, 1, 1 >.
- Spiking vector  $S_k$ :  $S_0 = <1, 0, 1, 1, 0 >,$  $S'_0 = <0, 1, 1, 1, 0 >.$
- Spiking transition matrix:

$$M_{\Pi_1} = \begin{pmatrix} -1 & 1 & 1 \\ -2 & 1 & 1 \\ 1 & -1 & 1 \\ 0 & 0 & -1 \\ 0 & 0 & -2 \end{pmatrix}$$

CMC12-2011, UPEC, 23-26 August 2011



A Spiking Neural P system simulator based on CUDA

SN P system Simulator

### Next Configuration representation

- Next configuration:  $C_{k+1} = C_k + S_k \cdot M_{\Pi}$ .
- $S_0'' = <1, 1, 1, 1, 0 >$  is an *invalid*  $S_k$  (only one rule per neuron is used).



GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### Our Suggestion: GPU Computing

## GPGPU: techniques for using the GPU as a massively parallel co-processor.



### Host: the CPU vs Device: the GPU

CMC12-2011, UPEC, 23-26 August 2011

A Spiking Neural P system simulator based on CUDA

GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### CPU vs. GPU



Figure: General CPU vs. general GPU architecture [NVIDIA corp. 2011].

CMC12-2011, UPEC, 23-26 August 2011 A Spiking Neural P system simulator based on CUDA

acid

Why GPU?

GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### GPUs are the leading exemplars of modern high throughput-oriented architectures [Garland et al, 2010].

- GPUs have been successfully used to speedup many parallel applications.
- Modern GPUs are not limited only to graphics processing, as done by the first graphic cards, as they can now be used for general purpose computations [Harris, 2005]; they are now multi-core and data-parallel processors [Kirk and Hwu, 2010].



Why GPU?

GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### GPGPU (General Purpose computation on the GPU), a programmer can achieve with a single GPU, a throughput similar to that of a CPU based cluster [NVIDIA, 2010; Harris, 2010]

- the main advantages of using GPUs are their low-cost, low-maintenance and low power consumption relative to conventional parallel clusters and setups, while providing comparable or improved computational power.
- Moreover, parallel computing concepts such as hardware abstraction, scaling, and so on are handled efficiently by current GPUs.



GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

## Parallel Computing w/ GPUs

- Compute Unified Device Architecture (CUDA) by NVIDIA in 2006
- Arch + programming model, extends ANSI C





Figure: Graphics card w/ NVIDIA CUDA enabled GPU.



GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### **CUDA** Processing Flow





CMC12-2011, UPEC, 23-26 August 2011

A Spiking Neural P system simulator based on CUDA

acity

GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### **Remarks on the Computation**

- The code to be executed in a GPU is written in CUDA C (CUDA extended ANSI C programming language).
- The parallel distribution of the execution units (threads) in CUDA can be split up into multiple threads within multiple thread blocks, each contained within a grid of (thread) blocks. These grids belong to a single device/single GPU.
- Each device has multiple cores, each capable of running its own *block of threads*.



GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### CUDA model

CUDA uses single program, multiple data (SPMD) parallel paradigm [Kirk, Hwu 2010].





CMC12-2011, UPEC, 23-26 August 2011 A Spiking Neural P system simulator based on CUDA

GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### **Remarks on the Computation**

- A function known as a *kernel function* is one that is called from the host but executed in the device.
- OPUs with the same architecture as the one used in this work has a maximum number of threads per block equal to 512.
- The maximum size of each dimension of a thread block is (512 x 512 x 64), pertaining to the x,y, and z dimensions of a block respectively.
- Lastly, the maximum size of each dimension of a grid of thread block is (65535 x 65535 x 1) for the grid's x,y, and z dimensions.



GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### Remark: Matrix & parallel hardware

- Matrix operations are very optimized on parallel hardware, including GPUs [Fatahilian et al. 2004].
- GPU simulations of SNP systems using their matrix representations seem more *natural*.



GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

## **GPU Simulation Consideration**

- Input files: file versions of C<sub>k</sub>, S<sub>k</sub>, M<sub>SNP</sub>, and a file r (with the list of rules R<sub>i</sub>.)
- String manipulation: An OOPL such as Python is suited. PyCUDA was chosen in order to fully utilize the speedup of CUDA as well as minimize development time, and is a Python programming interface to CUDA.<sup>1</sup>
- Computations involving linear algebra: C programming language (which NVIDIA extended for their purposes as CUDA C) is suited.

<sup>1</sup> PyCUDA was developed by mathematician Andreas Klöckner for for a more efficient parallel computing on CUDA using Python: safer in terms of memory handling, object cleanup (among others), and faster (in terms of development time via abstractions etc).

In actuality, only the kernel functions are written in C, and those functions are embedded within the Python code



CMC12-2011, UPEC, 23-26 August 2011

GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

## Simulation notes

- SNP systems w/o delays,  $M_{\Pi}$  in row-major order.
- Non-determinism: produce all possible and valid S<sub>k</sub>'s from given C<sub>k</sub>'s.
- Use *PyCUDA* for handling of characters + reg exp, *C* for integral computations.
- PyCUDA is Python wrapper for CUDA, used in HPC [Klöckner 2009].
- $C_k$ ,  $S_k$ ,  $M_{\Pi}$  as mutable PyCUDA *lists*, not strings.
- 2 stopping criteria:
  - Zero C<sub>k</sub>
  - Repetition of C<sub>k</sub>'s

GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

## Simulation notes

- Host/CPU side (python/C)→ Read inputs (C<sub>0</sub>/C<sub>k</sub>'s, S<sub>k</sub>'s, M<sub>Π</sub>), write and calculate next S<sub>k</sub>'s, C<sub>k+1</sub>'s. (*decision making*)
- ② Device/GPU side (CUDA) → Matrix addition + multiplication (*outsourced parallel work*) i.e. perform  $(C_{k+1} = C_k + S_k \cdot M_{\Pi})$  in parallel.



GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

## Simulation Algorithm

Overview:

**Inputs**:  $C_0$  ( $C_k$ 's afterwards),  $M_{\Box}$ , r (rule file) **Outputs**: All valid+possible  $S_k$ 's,  $C_k$ 's.

- I Load inputs (Host)
- II Compute all possible+valid  $S_k$ 's using  $C_k$ 's (Host).
- III Per  $S_k$ , compute next  $C_k$  (**Device**).
- IV Repeat I, II and III until at least 1 stopping criteria is satisfied (Host/Device).



GPU computing: CUDA SN P Systems in GPU Algorithm Simulation Flow Diagram

### Simulation Flow Diagram



Figure: Diagram showing the simulation flow, with the host and device

CMC12-2011, UPEC, 23-26 August 2011 A Spiking Neural P system simulator based on CUDA

acia

Simulation runs

### Machine model specification

- Setup of *snpgpu-sim3* simulated Π<sub>1</sub> and Π<sub>2</sub> using an Apple iMac running Mac OS X 10.5.8, with an Intel Core2Duo CPU at 2.66GHz and with a 6MB L2 cache.
- The GPU of the iMac is an NVIDIA GeForce 9400 graphics card at 1.15 GHz, with 256 MB Video RAM (or around 266x10<sup>6</sup> bytes), 16 cores, running CUDA version 3.1.



Simulation runs

### Simulation results for $\Pi_1$





Simulation results for  $\Pi_1$  using different  $C_k$  values.

Speedup is 1.4  $\times$  .



CMC12-2011, UPEC, 23-26 August 2011 A Spiking Neural P system simulator based on CUDA

Simulation runs

### Simulation results for $\Pi_2$



Simulation results for  $\Pi_2$  using different  $C_k$  values.

Speedup is  $6.8\times$  .



SNP system  $\Pi_2$  from [lonescu et al. 2006].

14 rules and 9 neurons, 14  $\times$  9 matrix

aci

CMC12-2011, UPEC, 23-26 August 2011 A Spiking Neural P system simulator based on CUDA

Simulation runs

### Note:

Max number of neurons allowable in current setup:

 $C_k = 266 \times 10^6 \text{ bytes} / (16 \text{ bytes} + 4 \text{ bytes} \cdot |R|),$ 

### where

- Simulation requirements: *GbMem* = 4 · *sizeof*(*C<sub>k</sub>*) + *sizeof*(*M*<sub>Π</sub>), using standard *C* language *sizeof*() func'n (*int* type is 4 bytes).
- Max Global memory (*GbMem*) of used GPU:  $266 \times 10^6$  bytes
- $M_{\Pi}$  is  $|R| \times |C_k|$  (total rules by total neurons).

Simulation limit is a function of  $R \& C_k$ .

## What do we get?

In this paper :

- we have *snpgpu-sim3*, can now simulate SNP systems with regular expressions (those of the form (b-1)).
- The speedup of *snpgpu-sim3* over *snpcpu-sim* for Π<sub>1</sub> went up to 1.4 times, while it was 6.8 for Π<sub>2</sub>.
- These results show that SNP system simulation on GPUs can greatly benefit from the parallel architecture of GPUs, and that increasing the parameters (of GPU) offer even larger speedups.
- This benefit in speedup is coupled with the fact that the CUDA enabled graphics cards are readily available.
- These cards offer boosts in general purpose computations as co-processors of commonly used CPUs, at a fraction of the power consumption of CPU clusters.



### What do we do next?

- Improve parallelism . . .
- SNP system variants can be simulated by extending the current GPU simulator.
- a generic P system parser based using P-lingua formatting for the GPU based SNP system simulator.



## Thank You for Your Attention







- Introduction
- Motivation
- Simulators
- My first CMC paper ...
- 2 Main Goal
  - Motivating Questions
- 3 Early Suggestions
  - Previous works
  - Our initial share ...
- 4 SN P System and its Simulator
  - SN P system
  - Simulator
- Our Suggestion
  - GPU computing: CUDA



- SN P Systems in GPU
- Algorithm
- Simulation Flow Diagram





