# ToUCH Modules Implementation at CENG IZTECH

IŞIL ÖZ COMPUTER ENGINEERING DEPARTMENT IZMIR INSTITUTE OF TECHNOLOGY (IZTECH), TURKEY

23.07.2021



#### **COMPUTER ENGINEERING DEPARTMENT**

#### IZMIR INSTITUTE OF TECHNOLOGY

|      |       |        | DESEADOU | FRUGATION | 001174.07 | 6  |
|------|-------|--------|----------|-----------|-----------|----|
| HOME | ABOUT | PEOPLE | RESEARCH | EDUCATION | CONTACT   | C* |

#### Undergraduate Curriculum

|   | 1st Semester | 2nd Semester | 3rd Semester | 4th Semester     | 5th Semester | 6th Semester | 7th Semester | 8th Semester |
|---|--------------|--------------|--------------|------------------|--------------|--------------|--------------|--------------|
|   | DEPERTMENT   | COURSE CODE  | COURSE NAME  | \$ но            | urs 🗢 Pri    | EREQUISITES  | CREDITS      | ♦ ECTS       |
| 1 | CENG         | 311          | Computer A   | rchitecture (3-  | +2) CE       | ENG 214      | 4            | 8            |
|   | CENG         |              | information  | Management (3-   | +0)          |              | 3            | 7            |
|   | CENG         | 323          | Project Man  | agement (3-      | +0)          |              | 3            | 8            |
|   |              |              | Technical El | ective I (3-     | +0)          |              | 3            | 5            |
|   |              |              | Non Technic  | cal Elective (3- | +0)          |              | 3            | 3            |
|   |              |              |              |                  |              |              | Total ECTS:  | 31           |
|   |              |              |              | Total E          | CTS: 261     |              |              |              |

Q

#### **Course Overview**

- Textbook: Computer Organization and Design: The Hardware/Software Interface by Hennessy/Patterson, MIPS, 5th Edition.
- CPU performance, MIPS assembly, Arithmetic operations, Processor design, Pipelining, Memory/Cache, Cache performance
- MIPS assignment on SPIM simulator, CPU design project with Verilog on ModelSim, Cache performance assignment on Intel Pin cache simulator tool
- No parallelism concepts

| WEEK | LECTURE           | LAB                              |                       |
|------|-------------------|----------------------------------|-----------------------|
| 1    | Introduction      | C                                |                       |
| 2    | Performance       | C                                | HW1 (Performance) out |
| 3    | MIPS              | MIPS/SPIM                        |                       |
| 4    | MIPS              | MIPS examples                    | HW1 deadline          |
| 5    | MIPS              | MIPS examples                    | HW2 (MIPS) out        |
| 6    | Arithmetic        | Arithmetic Questions             |                       |
| 7    | Arithmetic        | Recitation (Midterm Preparation) | HW2 deadline          |
| 8    | Midterm           | NO LAB                           |                       |
| 9    | Processor         | Processor/Verilog                |                       |
| 10   | Processor         | Processor/Modelsim               | HW3 (Processor) out   |
| 11   | Pipelining        | Pipelining Questions             |                       |
| 12   | Memory            | Cache Simulator                  | HW3 deadline          |
| 13   | Cache             | Cache                            |                       |
| 14   | Cache Performance |                                  | HW4 (Cache) out       |
|      |                   |                                  | HW4 deadline          |
|      |                   |                                  | Final                 |
|      |                   |                                  |                       |

### **ToUCH Modules**

- [C1] Introduction to ARM
- [D1] Introduction to CUDA Programming
- [C2] GPU Memory Hierarchy

## [C1] Introduction to ARM

- Introduction to ARM Lecture
  - ARM vs MIPS
- Intro to ARM Thumb Lecture
  - Discuss tradeoffs
- Intro to ARM NEON Lecture
  - Introduce SIMD
  - Coprocessor/heterogeneous architecture
- Not planning to have labs/any related assignments, but show the code samples

#### [D1] Introduction to CUDA Programming

- After CPU design and pipelining
- Introduction to parallel architectures (multicores, then GPU)
- Some material (mostly figures) from my CUDA course for the architectural concepts
- CUDA lecture from ToUCH
  - Show / run the vectorAdd code in the lecture, compare the performance with pthread/OpenMP version
- Talk about why deep learning is so successful :)
  - Modern ML is just high performance computing!

# [C2] GPU Memory Hierarchy

- Improved performance through GPU shared memory Lecture
- Already having tiled matrix multiplication code in C to show the cache performance
- Tiled matrix multiplication CUDA code utilizing shared memory
- Show / run the codes in the lab session, compare the performance
- Show the power of the direct management on faster memory

| WEEK | LECTURE           | LAB                              |                       |
|------|-------------------|----------------------------------|-----------------------|
| 1    | Introduction      | C                                |                       |
| 2    | Performance       | C                                | HW1 (Performance) out |
| 3    | MIPS              | MIPS/SPIM                        |                       |
| 4    | MIPS              | MIPS examples                    | HW1 deadline          |
| 5    | MIPS              | MIPS examples                    | HW2 (MIPS) out        |
| 6    | Arithmetic        | Arithmetic Questions             |                       |
| 7    | Arithmetic        | Recitation (Midterm Preparation) | HW2 deadline          |
| 8    | Midterm           | NO LAB                           |                       |
| 9    | Processor         | Processor/Verilog                |                       |
| 10   | Processor         | Processor/Modelsim               | HW3 (Processor) out   |
| 11   | Pipelining        | Pipelining Questions             |                       |
| 12   | Memory            | Cache Simulator                  | HW3 deadline          |
| 13   | Cache             | Cache                            |                       |
| 14   | Cache Performance |                                  | HW4 (Cache) out       |
|      |                   |                                  | HW4 deadline          |
|      |                   |                                  | Final                 |
|      |                   |                                  |                       |

| WEEK | LECTURE                           | LAB                              |                       |
|------|-----------------------------------|----------------------------------|-----------------------|
| 1    | Introduction                      | C                                |                       |
| 2    | Performance                       | C                                | HW1 (Performance) out |
| 3    | MIPS                              | MIPS/SPIM                        |                       |
| 4    | MIPS                              | MIPS examples                    | HW1 deadline          |
| 5    | MIPS                              | MIPS examples                    | HW2 (MIPS) out        |
| 6    | ARM [C1]                          | Recitation (Midterm Preparation) |                       |
| 7    | Midterm                           | NO LAB                           | HW2 deadline          |
| 8    | Processor                         | Processor/Verilog                |                       |
| 9    | Processor                         | Processor/Modelsim               | HW3 (Processor) out   |
| 10   | Pipelining                        | Pipelining Questions             |                       |
| 11   | Parallelism [D1]                  | OpenMP/CUDA                      | HW3 deadline          |
| 12   | Memory                            | Cache Simulator                  |                       |
| 13   | Cache                             | Cache                            |                       |
| 14   | Cache /GPU SM<br>Performance [C2] |                                  | HW4 (Cache) out       |
|      |                                   |                                  | HW4 deadline          |
|      |                                   |                                  | Final                 |

THANK YOU!