Professional Documents
Culture Documents
321079-001
Executive Summary
Completing a successful PowerPC* to Intel architecture software migration requires an awareness of architecture differences and its impact to the software. This white paper outlines the information that should be considered when planning a software product port from PowerPC to Intel architecture-based platforms. A thorough review of the architecture differences, operating system considerations, system initialization, migration tools, and software development products must be completed. The first thing to understand is that every situation is different. Therefore, the scope of work and effort required for the port will vary between situations. The information outlined in this paper will identify items that need to be considered when investigating the migration and resources that can assist during the implementation.
321079
Contents
Executive Summary ............................................................................... 2 Introduction .......................................................................................... 4 Intel Embedded Design Center ........................................................ 4 Reasons to Migrate to Intel Architecture ............................................. 4 Migration Considerations ....................................................................... 5 Hardware Architecture Differences ..................................................... 5 Operating Systems .......................................................................... 7 System Initialization Firmware .......................................................... 8 Architecture Migration Tools............................................................ 11 Software Development Tools........................................................... 13 Multi-core Solutions ....................................................................... 15 Training and Design Information ......................................................... 18 Intel Software College.................................................................. 18 Intel Software Network ................................................................ 18 Migration Design Guide (Putting It All Together) ................................. 18 Step 1 Port PowerPC* Code to Target Operating System .................. 19 Step 2 Execute Code Correctly on One Intel Architecture Core ................................................................. 19 Step 3 Optimize the Code for Performance on One Intel Architecture Core .......................................................... 20 Step 4 Apply Multi-core Software Design Updates ........................... 21 Step 5 Optimize the Software Design for Multi-core Intel Architecture Performance ...................................................... 21 Conclusion ........................................................................................... 22
321079
Introduction
Porting software to platforms of different processor architecture can be simple or require additional effort depending on the design (portability) of the original software. Software that is specifically written to run on one hardware architecture will need to be updated to support the architecture differences. For software implementations that abstract away the hardware and operating system specific information, the port could be as simple as a recompile. One of the main migration hurdles is the Endianness difference between PowerPC* (PPC) and Intel architecture. Other considerations include variations between the current and target operating systems and development tools. Completing a successful port involves assessing and understanding the current situation and requirements before the migration begins.
Migration Considerations
Architecture migration includes consideration of multiple software design areas including several hardware architectural differences, operating system, system initialization, and migration and development tools. Another architecture aspect to be considered when migrating from PPC to Intel architecture is moving from a uniprocessor serial code to a multi-core software system. This paper discusses each of these areas along with various design choices. Understanding that every migration situation is different, the migration design guide will step system designers through situational decisions and solutions, which will guide their overall migration plan.
Instruction set
PPC and Intel architecture instructions are very different. For some instructions there is no one to one (PPC to Intel architecture) Instructions equivalent. Refer to the Intel Software Developer Manuals and instruction set information and tools that may assist the assembly code migration. PPC instructions are all 4 bytes in size and must be aligned on 4 byte boundaries. Intel architecture instructions vary in size and therefore do not require alignment. Alignment On PPC a bool is 4 bytes. On Intel architecture, a bool is 1 byte. Make the code portable by changing the PPC boolean data to an unsigned 32-bit integer. PPC uses Altivec* instructions. Intel architecture uses Streaming Vector oriented SIMD Extensions (SSE). Refer to the Vector Oriented Code section instructions for details about migrating Altivec to SSE instructions.
Operations
Divide-by-zero For Integer divide-by-zero, PPC simply returns zero. On Intel architecture, executing this operation is fatal. Code should always check the denominator for zero before executing the divide operation. There is no difference in operation between PPC and Intel architecture floating point divide-by-zero.
321079
Table 1.
Hardware Devices
If a PPC driver or library comes from a third party vendor, check with the vendor for equivalent Intel architecture products. If any device drivers or libraries are developed in-house, they will need to be re-written for Intel architecture. Refer to the Device Drivers section of this paper for chipset and graphics driver information.
Registers
Calling conventions Specified by the application binary interface (ABI) Arguments are passed in registers for PPC. For Intel architecture, arguments are passed on the stack. Intel architecture has fewer registers than PPC and therefore local variables may be stored on the stack as well.
Memory
Endianness describes how multi-byte data is represented by a computer system and is dictated by the CPU architecture of the system. Intel architecture uses little endian and PPC uses big endian format to store multi-byte data. The difference in Endianarchitecture is an issue when software or data is shared between computer systems. Refer to the Endianness section of this paper for more information. The order of bit fields in memory can be reversed between architectures. Refer to the Bit Fields and Bit Masks section of the Endianness white paper for more details.
Bit fields
1. "Architectural Differences." Universal Binary Programming Guidelines. 26 Feb 2007. Apple.com. 18 Dec 2008. http://developer.apple.com/documentation/MacOSX/Conceptual/universal_binary/ universal_binary_intro/chapter_1_section_1.html
Endianness
Endianness describes how multi-byte data is represented by a computer system and is dictated by the CPU architecture of the system. Unfortunately not all computer systems are designed with the same Endian-architecture. Big endian is an order in which the big end (most significant value in the sequence) is stored first, at the lowest storage address. The most significant byte is stored in the leftmost position. PPC systems use the big endian model, where the most significant byte is at the lowest address in memory. Little endian is an order in which the little end (least significant value in the sequence) is stored first. The most significant byte is stored in the rightmost position. Intel architecture systems use the little endian model, where the least significant byte is at the lowest address in memory. In Table 2, the 32-bit hex value 0x12345678 is stored in memory as follows for each Endianarchitecture. The lowest memory address is represented in the leftmost position. You can break up your copy into three levels of headings if desired, but no more.
321079
Table 2.
Endian Order
Big Endian Little Endian
Byte 00 12 78 (LSB)
Byte 01 34 56
Byte 02 56 34
Byte 03 78 12
The difference in Endian-architecture is an issue when software or data is shared between computer systems; between files or passed through a network connection. If the code is not endian-neutral it must be updated to account for little endian architecture because difference in byte ordering can produce incorrect results. For complete details that describe software considerations related to microprocessor Endian architecture and guidelines for developing Endian-neutral code see the Endian White Paper at:http://www.intel.com/design/intarch/papers/endian.htm.
Operating Systems
If the architecture migration includes a port to a new OS, check with the target OS distributor to see if there is an OS migration guide available that supports the current and target OS pair used in the migration. Considerations for porting source code to a new OS not only includes updating the OS calls, but also includes locating the correct version of all necessary third-party utilities and libraries needed to build the application. Common examples are: Source control system Developer tools Build utilities Licensing, graphics, or other third-party libraries If the situation allows, make sure to port to the OS version that will be used for the target multi-core solution. Meaning, if SMP will be used as the target OS solution, port to the SMP version of the target OS.
321079
Device Drivers
If the PPC driver is developed in-house, the low level initialization will need to be updated for Intel architecture. Open source versions of the driver may help guide the changes that are required.
321079
321079
The UEFI Forum is responsible for two specifications: 1. The Unified Extensible Firmware Interface (UEFI) specification - Defines interfaces between OS, add-in firmware drivers and system firmware where the OS and other high-level software should ONLY interact with exposed interfaces and services defined by the UEFI specification: Includes the EFI Byte Code (EBC) specification which defines an interpretive layer for portable component drivers. 2. Platform Initialization Interface (PI) specifications The core code and services that are required for an implementation of the Platform Initialization (PI) specifications (hereafter referred to as the PI Architecture). Interoperability standards between firmware phases and pre-OS components from different providers. Figure 1. UEFI Block Diagram
OS
Pre-boot Tools
UEFI Specification
text
PI Specification
Hardware
The UEFI specifications define a model for the interface between operating systems and platform firmware. The interface consists of data tables that contain platform-related information, plus boot and runtime service calls that are available to the operating system and its loader. Together, these provide a standard environment for booting an operating system and running pre-boot applications. For more details about the UEFI specifications, writing UEFI drivers, and how to use the UEFI Sample Implementation and UEFI Application Toolkit, see the UEFI web site at http://www.uefi.org/.
10
321079
321079
11
12
321079
Intel Compilers
Intel Compilers are compatible with other tools you might use, integrate into popular development environments and are source and binary compatibility with other widelyused compilers. The Intel compilers offer the support for creating multi-threaded applications and includes features for advanced optimization, automatic processor dispatch, vectorization, auto-parallelization, multithreading, OpenMP*, data prefetching, and loop unrolling, along with highly optimized libraries. Visit the product web site at: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/ 284132.htm. OpenMP* is a standard for compiler based multiprocessing features. To learn more about OpenMP and the specification visit the web site: http://www.openmp.org.
321079
13
14
321079
Multi-core Solutions
There are several factors that will guide the plan for the multi-core migration. Factors include the starting point (design) of the original source code, as well as migration goals and constraints. Each method has its own strengths. More operating systems are now providing Symmetric Multiprocessing (SMP), including embedded RTOSs, but SMP requires code to be architected to take advantage of multiple CPUs (parallelized). For situations where the application(s) is not well suited for parallelization, Asymmetric Multiprocessing (AMP) and Virtualization could be a more viable solution for leveraging the extra processing capabilities of multi-core hardware. Employing virtualization and partitioning in the embedded system will enable some benefit to be derived from multi-core processors independent of explicit OS support. However, the ideal situation is to have symmetric multiprocessing and asymmetric multiprocessing, including virtualization, at your disposal.
Asymmetric Multiprocessing
AMP has started to show up in product descriptions for embedded processors. The term is used to refer to a case where multiple OS images are supported on a single CPU consisting of multiple cores. The term is used to distinguish from the SMP case where there is a single OS image on the CPU.
321079
15
AMP requires no application changes to leverage the benefits of multiple cores. It can leverage multiple cores by running multiple instances of the OS and application in separate partitions that are dedicated to specific cores, PCI devices, and system memory areas. AMP requires a boot loader that supports AMP (can partition the hardware resources and make OS/application assignments to the partitions). The OS must also meet requirements to support AMP such as: The OS must be relocate-able, must be able to restrict its memory region, and the OS must only operate on it assigned PCI devices.
Symmetric Multiprocessing
SMP operating systems treat all cores as equals and distribute the workload/processing to the available cores. An SMP design is probably the more efficient way to take advantage of multi-core hardware. It can be written to scale performance automatically as the number of processing cores increase. The tradeoff for the SMP performance and scalability benefit is that writing software for parallel processing can be tricky because the software design must decompose the problem into sub problems that can safely execute simultaneously (threads are used to execute the concurrent processing). For guidelines on multithreading applications see Developing Multithreaded Applications: A Platform Consistent Approach. For symmetric multi-processing using Wind River* VxWorks*, see Best Practices: Adoption of Symmetric Multiprocessing Using VxWorks and Intel Multi-core Processors White Paper. OS Based (SMP Affinity) - Use processor affinity mechanisms to assign specific cores to specific tasks/threads. This method can improve performance on multiprocessor systems by pinning threads that share data to cores that share cache, which improves data locality in cache and thus, improves cache hits. Refer to the Intel Software Network article Improved Linux* SMP Scaling: User-directed Processor Affinity for details about user directed affinity. An example where SMP affinity improved performance on a dual processor multi-core system is the case study Intel performed on an open source intrusion detection application known as SNORT*. Read the case study Supra-linear Packet Processing Performance with Intel Multi-core Processors for more details. Most of the popular commercial OSs have SMP products, such as Microsoft* Windows* server and client and Linux distributions. However, this isnt the norm for embedded and real-time OSs. Understand the level of SMP support provided by your OS.
16
321079
RTOS vendors that provide real-time SMP support for Intel architecture are: Green Hills Integrity* LynuxWorks LynxOS* QNX* Neutrino* Wind River* VxWorks*
Virtualization
The beauty of virtualization is that it can bring together the benefits of all multi-core solutions on a single system and extend those benefits with additional features such as security, quality of service (QoS), high availability (HA), and load distribution. Virtualization provides a software management layer that increases software protection between the partitions and provides core management to optimize power efficiency. Basically, the CPU is run as multiple independent partitions each running their own OS and application. This is a very effective strategy for applications that are constructed from multiple application-components that are independent and CPU bound (i.e. not bound by contention to shared resources). There is no need to make legacy software stack changes when using virtualization to partition multiple OSs to run within virtual machines (VM). Instead, let the Virtual Machine Manager (VMM) manage the assignment and access between the VMs and platform resources. There are several use cases for partitioning including: system consolidation, running an RTOS side-by-side with a GPOS (also referred to as OS colocation), and leveraging the additional processing power of multi-core hardware by replicating the application(s) and OSs across multiple cores. Intel Virtualization Technology (Intel VT) provides features that make VMM development easier and enhance performance of virtualized systems enabled with the technology. Visit the Intel Product Technologies for Intel Embedded and Communications Applications web site for more information about Intel VT and other technologies: http:// www.intel.com/technology/advanced_comm/index.htm.
321079
17
18
321079
iii. Board support packages (BSPs) for Microsoft* Windows CE*, can be downloaded from these third party vendors sites: c. d. e. Adeneo Corporation* BSQUARE* Wipro Technologies*
If any device drivers or libraries are developed in-house, they will need to be rewritten for Intel architecture. If any third party drivers or libraries are required, check with the third party vendor (TPV) for equivalent Intel architecture products. Development tools for Intel architecture. See the Intel Software Development Products section for information about Intel tools and visit the products web sites for information on OS support. On Chip Debugging Tools for Intel architecture are supported by American Arium or Macraigor Systems LLC (February 2009). BIOS Choose BIOS and/or UEFI firmware if the design will support multiple standard interfaces and expansion slots, or a host mainstream OSs with a broad set of pre-OS features, which are ready to run multiple applications. Boot Loader Choose a boot loader for minimal or specialized firmware stacks where requirements might include optimization for speed, size, or specific system requirements, and will support minimal upgrade or expansion capabilities. QNX Fastboot Technology is available for Intel AtomTM Processors.
b.
321079
19
3. If any part of the code written in assembly code it will need to be updated for IA instructions. Solutions: a. b. Basic assembly instructions Manually update the basic assembly instructions using the Intel 64 and IA-32 Architectures Software Developer's Manuals. Vector Oriented Code Solutions: i. ii. Manually update vector oriented code using the AltiVec/SSE Migration Guide Translate the vector oriented code using the NASoftware*/PowerPC*/ Altivec* to Intel/SSE conversion tools.
4. Does the software abstract the memory architecture of the processor? a. b. Yes The code is endian-neutral. No changes are required. No The code will need to be updated for little-endian memory architecture. Manually update the Endianness differences in the code. Use the Endianness White Paper as a guide to the required changes.
5. Refer to Table 1 for any other architecture differences that may need software updates. 6. Build, test and debug the code using one Intel architecture core.
Step 3 Optimize the Code for Performance on One Intel Architecture Core
Although the end product will run on multi-core architecture, performance tuning methodology first requires that serial code be optimized for serial performance. 1. Use the top down, closed end loop performance methodology, and when applicable use the Intel Software Development Products. a. Analyze the performance i. ii. b. i. Use the Intel VTune Performance Analyzer to pinpoint hotspots in the code where the processing could be distributed between the available cores. Use the Intel Thread Profiler to identify any thread imbalances. Use the Intel C++ Compiler and select features to implement advanced optimizations using Profile Guided Optimization (PGO), executable size, and power consumption. Use the Intel Performance Libraries to Increase performance with a variety of APIs that are highly tuned for Intel architecture. Functions include video, imaging, compression, cryptography, audio, speech recognition, and signal processing functions and codec component functions for digital media and data-processing applications.
ii.
c.
Debug the code Use the Intel Thread Checker to identify threading bugs, such as data race and deadlock conditions.
2. The OSV should also provide a set of software development tools. Check with the OSV to understand which tools are available. 3. Use an on-chip debugging tool (JTAG) for low level debugging at the hardware level and where a high level debugger would otherwise interfere with timing critical code.
20
321079
c.
Step 5 Optimize the Software Design for Multicore Intel Architecture Performance
Whether the design is SMP or AMP, multi-core software designs require specialized software development tools. For SMP the tools help identify and implement parallelism into the code and pinpoint threading issues such as race conditions, deadlocks, and thread load imbalances. The tuning methodology is the same as for a uni-processor, except that the goal is to correctly and efficiently execute multiple processes or threads simultaneously across multiple cores. Multi-core tools help implement parallelism and help tune and debug the parallelized code. 1. Use the top down, closed end loop performance methodology, and when applicable use the Intel Software Development Products. a. b. c. d. Intel VTune Performance Analyzer Pinpoints hotspots in the code where the processing could be distributed between the available cores. Intel C++ Compiler Multi-core features include OpenMP and auto-parallel. Intel Performance Libraries Increase parallelism with performance threaded APIs that are highly tuned for Intel architecture multi-core. Intel Threading Tools Implement threads with Intel Thread Building Blocks. Debug threads with Intel Thread Checker. Identify workload imbalances and lock contention of the threads with Intel Thread Profiler.
2. The OSV should also provide a set of multi-core development tools. Check with the OSV to understand which tools are available. 3. Use an on-chip debugging tool (JTAG) for low level debugging at the hardware level and where a high level debugger would otherwise interfere with timing critical code.
321079
21
Conclusion
This paper overviewed the software considerations and guidelines for completing a successful PowerPC* to Intel architecture software migration, as well as resources that can assist during the migration software design and implementation. The paper included information about architecture differences, migration tools, system initialization, operating system considerations, Intel software development products, and available training for Intel architecture. Remember, each situation is different and the effort required for the migration depends on the amount of abstraction that is already programmed into the code. Therefore the migration could be as simple as recompiling the software or more involved, requiring extra programming for areas of software that are hardware or OS dependent. Completing a successful port involves assessing and understanding the current situation and requirements, and planning each step before the migration begins. Dont forget to visit the Embedded Design Center at http://www.intel.com/embedded/ edc for the one-stop-shop to embedded Intel architecture design information.
22
321079
Authors
Lori M. Matassa is a Software Technical Engineer with Intel.
Acronyms
AMP API BIOS CSM DSP EDC EDK EFI GPOS HA IA IEGD ISN JTAG LSB OS PCI POSIX PPC QoS RTOS SIMD SMP SSE UEFI VM VMM Asymmetric Multiprocessing Application Programming Interface Basic Input Output System Compatibility Support Module Digital Signal Processing Intel Embedded Design Center EFI Developer Kit Extensible Firmware Interface General Purpose Operating System High Availability Intel Architecture Intel Embedded Graphics Driver Intel Software Network Joint Test Action Group Least Significant Bit Operating System Peripheral Component Interconnect Portable Operating System Interface PowerPC Quality of Service Real-time Operating System Single Instruction, Multiple data Asymmetric Multiprocessing Streaming SIMD Extensions Unified EFI Forum Virtual Machine Virtual Machine Manager
321079
23
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELS TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. This paper is for informational purposes only. THIS DOCUMENT IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE. Intel disclaims all liability, including liability for infringement of any proprietary rights, relating to use of information in this specification. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted herein. Unless otherwise agreed in writing by Intel, the Intel products are not designed for nor intended for any application in which the failure of the Intel product could create a situation where personal injury or death may occur. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata that may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents that have an order number and are referenced in this document or other Intel literature may be obtained by calling 800-548-4725 or by visiting Intels website. Intel, the Intel logo, Intel Atom, Intel Core, Intel VTune, Intel Threading Tools, Intel C++ Compiler, Intel Thread Profiler, Xeon, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Copyright 2009, Intel Corporation. All rights reserved.
24
321079