Multimedia Signal Processing on Processors with Many Cores

                                    陳彥光 博士               

                             Dr. Yen-Kuang Chen

Principal Engineer at Microprocessor Technology Lab., Intel

 

時  間:中華民國九十六年十二月二十八日(星期五)
地  點:國立清華大學第二綜合大樓 8F 國際會議廳
主辦單位:國立清華大學積體電路設計技術研發中心
     經濟部學界開發產業技術計畫「前瞻高效能低耗能之雙處理器系統技術研發」
     
協辦單位:清華大學資訊工程學系、交通大學電子與資訊研究中心

 

大會主席:國立清華大學 資工系 李政崑教授
議程主席:國立清華大學 資工系 賴尚宏教授


                    Agenda

Time
Session Topic
Speaker
Session

Chairman

8:50-9:20
Registration
9:20-9:30
Opening
9:30-10:30
1st

session

Basic introduction to multi-core architecture
Dr. Yen-Kuang Chen
清大資工系
賴尚宏教授
10:30-10:50
Coffee Break
10:50-12:20
2nd

session

  Theory and principles in parallelization

Dr. Yen-Kuang Chen

清大資工系
李政崑教授
12:20-14:00
Lunch Break
14:00-15:00
3rd

session

Advanced optimization techniques in multi-core architecture
Dr. Yen-Kuang Chen

工研院STC

曾紹崟經理 

費用:1.12月25日以前報名及繳費:學生、教師500元、財團法人1,000元、其他1,500元。
   2.12月25日以後報名及繳費:學生、教師700元、財團法人1,500元、其他2,000元。
(以上費用包含講義、餐盒、茶點)
諮詢專線:(03)5742300、5742311     FAX:(03)5745594
網路報名:http://pllab.cs.nthu.edu.tw/moeapac/20071228workshop/20071228workshop.htm
     名額共130位[12月25日(二)截止網路報名]
繳費方式:報名費請于12月25日報名截止日期之前,以郵局匯票或支票(僅接受即期支票)方式寄達
     【匯票戶名:國立清華大學  支票戶名:國立清華大學】
地址:30013 新竹市光復路二段101號 清華大學積體電路中心 曾雯姬小姐 收

 

Abstract:  

This tutorial covers algorithm design and algorithmic-level optimization for future processors with many cores. For the best performance of multimedia applications on future personal computers, we must carefully consider the interplay between microprocessors and algorithms/applications. Beyond frequency increases, the performance of personal computers has improved significantly because of the introduction of multiple cores (e.g., the latest Intel Core Quad processor). Moving forward, we expect a trend of increasing the number of processing cores in a single personal computer (e.g., the latest Intel 80-core research prototype with tera-FLOPS). To harness the computational capability from multi-core processors, one of the best ways is to exploit the thread-level parallelism in the applications. As there is a symbiotic relationship between computation and memory, to achieve best effect of the highest level of computation is to assure the best memory performance. Hence, we must design or choose the algorithm for maximal thread-level parallelism and cache localities. The implication is applicable to developing algorithms/applications for not only personal computers, but also SoCs with multiple cores.

 

Tutorial Outline:  

1.      Overview, motivation, and introduction

a.       Sequential vs. parallel processing in personal computers & SoCs

b.      Thread-level parallelism (Hyper-Threading Technology, Dual Core)

2.      Symbiotic design of multi-threading algorithm

a.       Partition application into multiple threads, using SPMD (design example: H.264 encoder)

b.      Avoid sequential dependencies (design example: Canny Edge Detector)

c.       Dynamically balance loads for better parallelism (design example: MPEG-2 video decoder and articulated body tracking)

d.      Reduce overheads (design example: two alternative graph mining algorithms, Hough transform)

e.       Take advantage of sharing cache to increase effectiveness (design example: SVM-based face detection, image remapping, matrix-matrix multiplication)

3.      Future implications and conclusions

 

Duration: 3 hours

 

Potential audience: This tutorial is intended to provide a basic overview of implementing multimedia signal processing applications on modern processors with many cores. 

 

Speakers’ Biography:  

Yen-Kuang Chen received his Ph.D. from Princeton University, and is a Principal Engineer in Corporate Technology Group, Intel Corporation. His research interests include developing innovative multimedia applications, studying the performance bottleneck in current computers, and designing next generation microprocessor/platform. In particular, he is currently analyzing the emerging multimedia applications and providing inputs to the definition of the next-generation CPUs and GPUs with many cores. He is one of the key contributors to Supplemental Streaming SIMD Extension 3 in Intel® Core™ 2 Duo processors. He has 10+ US patents, 25+ pending patent applications, and 75+ technical publications. He is an associate editor of the Journal of VLSI Signal Processing Systems (including special issues on “System-on-a-Chip for Multimedia Systems”, “Design and Programming of Signal Processors for Multimedia Communication”, and “Multi-core Enabled Multimedia Applications & Architectures”) and of IEEE Transactions on Circuit and System I. He has served as a program committee member of 20+ international conferences and workshops on multimedia, video communication, image processing, VLSI circuits and systems, parallel processing, and software optimization. He is an invited participant to 2002 Frontiers of Engineering Symposium (National Academy of Engineering) and to 2003 German-American Frontiers of Engineering Symposium (Alexander von Humboldt Foundation). He is an IEEE Senior Member and an ACM Senior Member.

 

Related Publications:

1.      "Trend and Challenge on System-on-a-Chip Designs," Y.-K. Chen and S.Y. Kung, to appear in Journal of VLSI Signal Processing Systems, 2008.

2.      "High-Performance Physical Simulations on Next-Generation Architecture with Many Cores," Y.-K. Chen, J. Chhugani, C. Hughes, D. Kim, S. Kumar, V. Lee, A. Lin, A. Nguyen, E. Sifakis, and M. Smelyanskiy, Intel Technology Journal, Aug. 2007.

3.      "Media Mining---Emerging Tera-Scale Computing Applications," Y. Chen, E. Li, W. Li, T. Wang, J. Li, X. Tong, P. Wang, W. Hu, Y. Zhang, and Y.-K. Chen, Intel Technology Journal, Aug. 2007.

4.      “Implementation of H.264 Encoder and Decoder on Personal Computers,” Y.-K. Chen, E. Q. Li, X. Zhou, and S. L. Ge, Journal of Visual Communications and Image Representations, vol. 17, no. 2 , pp 509-532, Apr. 2006.

5.      "A Compiler for Exploiting Nested-Parallelism in OpenMP Programs," X. Tian, J. Hoeflinger, G. Haab, Y.-K. Chen, M. Girkar, S. Shah, Parallel Computing Journal, vol. 31, no. 10-12, pp. 960-983, Oct. 2005.

6.      “Media Applications on Hyper-Threading Technology,” Y.-K. Chen, M. Holliman, E. Debes, S. Zheltov, A. Knyazev, S. Bratanov, R. Belenov, I. Santos, Intel Technology Journal, pp. 47-57, Feb. 2002.

7.      “Parallelization, Performance Analysis, and Algorithm Consideration of Hough Transform on Chip Multiprocessors,” W. Li, and Y.-K. Chen, in Workshop on Design, Architecture and Simulation of Chip Multi-Processors, Dec. 2007.

8.       “Computer Vision on Multi-Core Processors: Articulated Body Tracking,” T. Chen, D. Budnikov, C. Hughes, and Y.-K. Chen, in Int’l Conf. on Multimedia and Expo, July 2007.

9.      "Adaptive Parallel Graph Mining for CMP Architectures," G. Buehrer, S. Parthasarathy, and Y.-K. Chen, in Int’l Conf. on Data Mining, pp. 97-106, Dec. 2006.

10.  "Efficient Frequent Pattern Mining on Shared Memory Systems: Implications for Chip Multiprocessor Architectures," G. Buehrer, S. Parthasarathy, A. Ghoting, Y.-K. Chen, D. Kim, and A. Nguyen, in Memory Systems Performance and Correctness Workshop, Oct. 2006.

11.  “Towards Efficient Multi-Level Threading of H.264 Encoder on Intel Hyper-Threading Architectures,” Y.-K. Chen, X. Tian, S. Ge, M. Girkar, in Proc. of Int’l Parallel and Distributed Processing Symp., Apr. 2004.

12.  “Implementation of H.264 Encoder on General-Purpose Processors with Hyper-Threading Technology,” E. Q. Li and Y.-K. Chen, in Proc. of SPIE Visual Communications and Image Processing, vol. 5308, pp. 384—395, Jan. 2004.

13.  “Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper-Threading Architectures,” S. Ge, X. Tian, and Y.-K. Chen, in Pacific-Rim Conf. on Multimedia, Dec 2003.

14.  “Exploring the Use of Hyper-Threading Technology for Multimedia Applications with Intel OpenMP Compiler,” X. Tian, Y.-K. Chen, M. Girkar, S. Ge, R. Lienhart, and S. Shah, in Int’l Parallel and Distributed Processing Symp., pp. 36-43, Apr. 2003.

15.  “Exploring the Use of Hyper-Threading Technology for Multimedia Apps,” X. Tian, M. Girkar, Y.-K. Chen, A. Bik, and E. Su, OSnews Magazine, Mar. 12, 2003.

16.  "Evaluating Performance of Multimedia Application on Simultaneous Multi-Threading," Y.-K. Chen, E. Debes, R. Lienhart, M. Holliman, and M. Yeung, in Proc. of Int'l Conf. on Parallel and Distributed Systems, pp. 529-534, Dec. 2002.

17.  "The Impact of SMT/SMP Designs on Multimedia Software Engineering---A Workload Analysis Study,” Y.-K. Chen, R. Lienhart, E. Debes, M. Holliman, and M. Yeung, in Proc. of Int’l Symp. on Multimedia Software Engineering, Dec. 2002.

18.  "Video Applications on Hyper-Threading Technology," Y.-K. Chen, M. Holliman, and E. Debes, in Int'l Conf. on Multimedia and Expo, vol. 2, pp. 193 -196, Aug. 2002.