You are on page 1of 2

Understanding Office Binary File Formats https://msdn.microsoft.com/en-us/library/office/gg615407(d=printer,v=...

Understanding Office Binary File Formats


Office 2010

Summary: Learn about the binary file formats that are used in current and previous Microsoft Office products, including how to use them, their basic structures, and key concepts for interacting with them
programmatically.

Last modified: June 23, 2011

Applies to: Excel 2010 | Office 2007 | Office 2010 | Office client | Open XML | PowerPoint 2010 | SharePoint Server 2010 | VBA | Word 2010

In this article
What Are Binary File Formats?
What Versions of Microsoft Office Use Binary File Format Files?
Viewing Content in Microsoft Office Binary File Format–Based Files
Conclusion
Additional Resources

Published:   February 2011

Provided by:  Microsoft Corporation

Contents

What Are Binary File Formats?

What Versions of Microsoft Office Use Binary File Format Files?

Viewing Content in Microsoft Office Binary File Format–Based Files

Creating Custom Binary File Format Viewers

Editing Office Binary File Format–Based Files

Conclusion

Additional Resources

This article is the first in a series of articles that introduce the binary file formats used by Microsoft Office products. This first article provides an overview of how to work with Microsoft Office binary file formats in general,
and explains some of the shared structural traits and key concepts that the different formats have in common. The other articles in the series provide more detail about the individual file formats. These articles are
designed to be used in conjunction with the Microsoft Office File Format Documents available on MSDN.

Understanding the Excel .xls Binary File Format

Understanding the Outlook MS-PST Binary File Format

Understanding the PowerPoint MS-PPT Binary File Format

Understanding the Word .doc Binary File Format

This article series deals with only the four core Microsoft Office products: Microsoft Word, Microsoft PowerPoint, Microsoft Excel, and Microsoft Outlook.

What Are Binary File Formats?


A binary file format is any file format that contains primarily binary data. This includes compiled programs, images, media, and most compressed files, and files that may contain textual information but are stored as
binary data. The binary file formats used by Microsoft Office products fit in this last category. Non-binary formats may include text (.txt), .html, .xml, and their derivatives, and interpreted scripts and source code files.

All of the file data in Microsoft Office binary file formats exists in one or more streams. Each stream contains data structures to store metadata, such as user and system information and file properties, formatting
information, text content, and media content. These data structures are expressed as groups of hexadecimal numbers that the host program interprets and presents through its user interface.

Meanwhile, the organization of data structures varies within a stream. The most common unit of data is a record. A record typically contains some metadata about the file in the form of fields and flags. This includes one
or more offset values to indicate the locations of other relevant records or other data. Text is stored as numeric values that represent ANSI or Unicode characters. Images can be stored as pointers to external files or as
embedded images in their own binary file formats, such as .gif, .jpeg, or .png within the file. More active content, such as PowerPoint slide transitions, are marked with the information that is needed for interpretation,
such as the transition properties, and then rendered by the host program.

The file formats used by Microsoft Word, Microsoft PowerPoint, Microsoft Excel, and Microsoft Outlook are all documented, comprehensively, in the MSDN library in the following location: Microsoft Office File Format
Documents. From there, you can open the full specification for the file format, either directly on the MSDN site or as a .pdf file.

Note

The recommended way to perform most programming tasks in Microsoft Office is to use the Office Primary Interop Assemblies. These are a set of .NET classes that provide a complete object model for working with
Microsoft Office. This article series deals only with advanced scenarios, such as where Microsoft Office is not installed.

What Versions of Microsoft Office Use Binary File Format Files?


The Microsoft Office binary file formats discussed in this article are primarily used by Microsoft Outlook, Microsoft Excel, and previous versions of Microsoft Word and Microsoft PowerPoint. Microsoft Office Word 2007
and Office PowerPoint 2007 use XML-based file formats as their default file format, and Microsoft Excel 2010 uses a newer binary format. The following table shows the binary file format files that apply to specific
versions of Word, Excel, PowerPoint, and Outlook.

File format Application version

MS-DOC
Microsoft Word 97
Microsoft Word 2000
Microsoft Word 2002
Microsoft Office Word 2003

MS-PPT
Microsoft PowerPoint 97
Microsoft PowerPoint 2000
Microsoft PowerPoint 2002
Microsoft Office PowerPoint 2003

MS-PST
Microsoft Outlook 2000
Microsoft Outlook 2002
Microsoft Office Outlook 2003
Microsoft Office Outlook 2007
Microsoft Outlook 2010

MS-XLS
Microsoft Excel 97
Microsoft Excel 2000
Microsoft Excel 2002
Microsoft Office Excel 2003

1 of 2 22-11-2017 15:05
Understanding Office Binary File Formats https://msdn.microsoft.com/en-us/library/office/gg615407(d=printer,v=...

© 2017 Microsoft

2 of 2 22-11-2017 15:05

You might also like