a techfocus media publication :: June 17, 2008 :: volume XI, no. 12

FROM THE EDITOR

In the continuing evolution from single- to multi-core, we have quietly ignored one area where a great many cores have l ong been successfully employed on a single chip.  NVIDIA announced their newest Tesla GPU this week, and they are aiming it directly at the general-purpose processing with graphics processing units (GPGPU) market.  Combined with their CUDA parallel programming environment, there is a lot we can learn from this that may apply to our future embedded designs or our current HPEC (High-Performance Embedded Computing) applications.  Our latest feature has the details.

Thanks for reading! If there's anything we can do to make our publications more useful to you, please let us know at: comments@embeddedtechjournal.com. If you'd rather sound off in public, please post your comments or questions in our new Journal Forums.

Kevin Morris – Editor in Chief
Techfocus Media, Inc.

EVENTS and ANNOUNCEMENTS

Only Actel® gets you close to “zero power.”
Any other claims of low power superiority are just that. According to their own data, Altera® and Xilinx® use between 10 and 1700 times the power of Actel IGLOO® FPGAs, depending on device and mode.
See the proof.


Take our new
SUPER QUICK, JUST A COUPLE OF QUESTIONS,
WON'T TAKE MUCH TIME AT ALL (WE PROMISE)

2008 Journal Reader Survey.

WEIGH IN NOW!!


NEW!! IC Journal - Do you love Embedded Tech Journal? We're happy to announce our new IC Design and Verification Journal.  It's just like Embedded Tech Journal except, you know, about ASICs and stuff. 
Subscribe today for free.

LATEST NEWS

June 17, 2008

Avnet Electronics Marketing Launches New Intel Architecture Development Tool for Embedded Systems

IAR Systems announces efficient RTOS and file system for ultra-low power MSP430 microcontrollers

Real-time 10GbE Packet Processing Expertise Powers Intelligent Solutions for Deep Packet Inspection Challenges

National Instruments Introduces Dual Gigabit Ethernet Interface for PXI Express

Freescale Joins Microsoft’s Windows Embedded Partner Program, Accelerating Promotion of Freescale’s i.MX Product Development Kit for Windows Embedded CE Based Devices

Freescale Introduces Industry’s First Multi-Standard Accelerator Supporting Emerging Wireless Technologies for 3G-LTE, WiMAX, HSPA+ and TDD-LTE Base Stations

Emerson Unveils Containerized Computing Solution Designed to Meet Growing Consumer Demand for Wireless Networks and Broadband

AMCC PowerPC 460GTx Increases Performance and Bandwidth for Networking and Telecommunication Applications

Freescale Launches Comprehensive Multimedia Development Platform for Advanced Consumer and General Embedded Devices

June 16, 2008

Murata Power Solutions expands its range of high efficiency DC/DC Converters with introduction of 2W single output NMG series

Microsoft Introduces First Embedded Operating System Optimized Specifically for Portable Navigation Device (PND) Manufacturers

MontaVista Provides First No-Cost Evaluation of Commercial Linux for Freescale QorIQ™ P4080 Multicore Processor

Virage Logic Unveils One Mega-Bit Embedded Reprogrammable Non-Volatile Memory (NVM) on Standard CMOS Process

Freescale Debuts Three New Power Architecture™ Microcontrollers for Automotive Designs

National Instruments Introduces New LabVIEW Toolkit for GPS Receiver Testing

IAR Systems Supports S08 Microcontrollers Through Strategic Collaboration with Freescale

AMCC and Marvell Team up to Demonstrate Interoperability of Industry-Leading 10GbE Metro Ethernet Solutions

PLX Broadens PCI Express Gen 2 Portfolio with Low-Lane-Count Switches

IDT Extends Family of High Performance Serial RapidIO Central Packet Switches for the Embedded Market

Freescale QorIQ™ Communications Platforms Signal a New Way Forward for Embedded Multicore Technology

June 13, 2008

Mentor Graphics Expands Nucleus Platform Solutions to Freescale i.MX31 Processor for Multimedia Applications

Acme Packet Extends Service Accountability Toolset

June 12, 2008

Avnet Offers Seminar for New Generation of Texas Instruments Ultra Low Power Devices

Freescale Joins RF4CE Consortium to Develop RF Standard for Entertainment Control

EDSA and IEEE Partner to Offer New Power Systems Engineering Continuing Education Courses

June 11, 2008

Freescale Enables Smart Grids to Help Address World’s Energy Challenges

TranSwitch Demonstrates Next-Gen Ethernet-based Solutions at NXTcomm 2008

CURRENT FEATURE ARTICLES

A Passel of Processors
NVIDIA’s Tesla T10P Blurs Some Lines
(Kevin Morris)
Shortening the Rope
LRDA Checks Cert C and MISRA C++ (Bryon Moyer)
New Toys
(Dick Selwood)
Shared Responsibility
Dynamic Analysis for Race Conditions and Deadlocks in Java
(Bryon Moyer)
Special Recognition
A Neural Network for Embedded Systems
(Bryon Moyer)

Multicore Messaging Manifested
Polycore Implements MCAPI
(Bryon Moyer)

JOURNAL WEBCASTS

CHALK TALK Power Matters. Trying to tame power consumption in your battery-powered device? Join Journal Webcasts host Amelia Dalton as she chats with Wendy Lockhart of Actel about how you can use ultra-low power programmable devices from Actel in even the most power-sensitive designs. (Actel)

CHALK TALK Creating Secure Mobile Devices With Open Kernel Labs OKL4. In this Chalk Talk, Amelia Dalton delves into the world of software security and microkernels in mobile devices with Gernot Heiser and Rob McCammon of Open Kernel Labs. (Open Kernel Labs)

CHALK TALK Low Power Design With Xilinx and Linear Technology. Join Amelia Dalton as she chats with Mark Moran of Xilinx and Afshin Odabaee of Linear Technology about low power FPGA based designs. (Xilinx)

CHALK TALK Designing Embedded Systems With Linux and low cost FPGAs. Join Amelia Dalton as she chats with industry experts about simplifying embedded systems design with Linux running on low-cost programmable system-on-chip platforms. (Xilinx)

CHALK TALK Lowest Total System Cost With Xilinx
Spartan-3
. Amelia Dalton chats with Mark Moran of Xilinx about reducing your overall system cost with Xilinx Spartan-3 family of FPGAs (Xilinx)


CHALK TALK Low Cost FPGA with Serdes Lattice ECP2M. Amelia Dalton talks with Bertrand Leigh of Lattice Semiconductor about low-cost FPGAs with multi-gigabit SerDes interface capability. (Lattice Semiconductor)

[click here for more webcasts]

A Passel of Processors
NVIDIA’s Tesla T10P Blurs Some Lines
(Kevin Morris)

Picture this architecture – a high speed application processor doing control coupled to an accelerator comprised of a mass of processing elements ready to power-parallelize compute-intensive components of a complex problem.  Sound familiar?  Supercomputers have taken advantage of acceleration using schemes like this for a while.  People using FPGAs for co-processors do it all the time.

Now, picture a new chip with 1.4 billion transistors, an array of 240 cores, and a processing throughput equivalent to about 1 TeraFLOPS.  Many readers of this publication would probably guess a new FPGA, right? 

With the new Tesla T10P GPU, NVIDIA is making a lot of us editors re-work our glossaries.  The T10P is a GPU that’s aimed directly at the high-performance computing community, not just accidentally clipping it with a bank shot while going after the real target market of graphics acceleration.  The T10P represents NVIDIA’s second generation of CUDA (Compute Unified Device Architecture) GPUs (with the Tesla 8 being the first).  CUDA is a C dialect with specific constructs for parallelism, and it allows direct access to the low-level hardware capabilities of the processor cores of the GPU.  Why would we want that?  To do non-graphics applications, of course. 

You see – unless your performance-critical application happens to involve a lot of shading and texture mapping, GPUs have traditionally been a locked treasure chest, not ready to share all that parallel-processing goodness with those who aren’t trying to blast billions of bits onto a screen.  Many people have always known that processing power was in there, though, and an access mechanism like CUDA is the key that lets them get in to put all those processors to work - doing a lot of interesting tasks that are most certainly NOT graphics acceleration.

This idea of using GPUs for general purpose processing is called GPGPU - General Purpose (processing with) Graphics Processing Units. (The acronym department kinda’ blew it with that one.)  CUDA is not the first effort along those lines.  ATI (Now AMD) had a beta API called (OK, these guys are much more rock-n-roll with their acronyms) “Close to Metal” (CTM) that allowed direct access to the low-level instructions in their R580 GPUs.  CUDA is the effort that seems to be getting industry traction right now, however, and NVIDIA has beefed up the latest Tesla processor with their sights set straight at that market.

Now, there is a world called “reconfigurable computing” that many of us FPGA folks often visit.  In the reconfigurable computing world, people have worked for years (and arguably decades) to harness the inherent hardware parallelism of FPGAs in order to create co-processors in high-performance computers (HPCs).  If, they reason, the programming model can be simplified enough, previously unattainable processing power (and, more importantly, processing power per Watt) could be obtained through the magic of FPGAs.

Of course, all those efforts have failed to pull many but the hardest-core performance-hungry over the chasm into the realm of FPGA-based reconfigurable computing.  Unless you want to learn HDL or trust a bleeding-edge software-to-HDL compilation/synthesis tool, the task of getting your critical algorithm onto FPGA hardware was untenable - even for high-IQ cognoscenti like rocket scientists, cryptographers, and genetic engineers.  When those folks are afraid that your solution is “too complicated to use,” you have something of a problem.

High performance computing has chugged along for several years with this gap of capability versus usability.  On one side, the challenge is to see how much power one can purchase from the utility company in order to operate and air-condition racks and racks of blades covered with multi-gigahertz one-to-four-core processors.  This is the “easy and expensive” solution.  Oh the other side, we have reconfigurable computing folks spending tiny fractions of that budget on both hardware and electrical current, but blowing the difference trying to hire VHDL and Verilog experts to code up complex biomedical, geological, and financial algorithms in hardware description languages. [more]


Visit Techfocus Media


You're receiving this newsletter because you subscribed at our web site www.embeddedtechjournal.com.
If someone forwarded this newsletter to you and you'd like to receive your own free subscription, go to: www.embeddedtechjournal.com/update.
If at any time, you would like to unsubscribe, click here. (But we hope you don't.)
If you have any questions or comments, send them to comments@embeddedtechjournal.com.

All material copyright © 2003-2008 techfocus media, inc. All rights reserved.
Privacy Statement