Privacy, document metadata, and hidden data—these have all become hot-button issues over the past few years. In fact, the volume of information and misinformation available about hidden document content might leave you wondering if it is ever actually safe to share a file electronically.
Well, take heart. The fact is, while protecting your private information is very important, there is no reason to fear your documents. In Office 2003, you can remove virtually any potentially private hidden content from your documents. In the 2007 Microsoft Office system, it has become even easier.
In this article, I will show you how to find and remove a variety of hidden data types from the documents you create or edit in the 2007 Office system programs Word, Excel, and PowerPoint.
Understanding Document Metadata and Hidden Data
Before jumping into specific methods for accessing and removing hidden content in your 2007 Office system documents, I want to briefly address the definitions of metadata and hidden data and provide some key examples for those who might not be familiar with the topic.
What Is Document Metadata?
Is there metadata in your documents? Absolutely. Is it a privacy risk? Not necessarily.
Metadata is information about a file or its content that a software program stores in the file. Essentially every software program stores some metadata in its files, and most of it is harmless. And, though you may not see many types of metadata automatically when opening a file, most document metadata will never be a privacy risk.
For example, have you ever tried to open a file only to get a message that the file cannot be opened because it was created with a newer version of the program? Information about the program used to create the file is stored as metadata. The more capabilities a program has, the more metadata its files are likely to require.
However, some types of metadata, such as the names of authors who have worked on a document, can present a privacy risk when you share files electronically. Recent versions of Microsoft Office programs have added improved privacy protection options, like the ability to automatically remove personal information (such as author history) from a file when saved. As you will see later in this article, the 2007 Office system goes a number of steps further to help you protect the private information in your documents.
What Is Hidden Data?
Though you may hear the subject of hidden data in documents referred to generally as document metadata, most hidden content in Microsoft Office documents is not actually metadata. As mentioned above, metadata is information that the program automatically stores in the file. Hidden data, on the other hand, is content you or other document editors add to your files that may be hidden under some circumstances.
For example, comments, unresolved tracked changes, or text formatted as hidden are visible or hidden on screen depending upon your view or other program settings. However, they can be accessed and viewed by anyone who accesses the document electronically.
What Types of Metadata and Hidden Content May Be Present in My 2007 Office System Documents?
In the list below, I have included my hit list of some of the most common metadata and hidden content types that can pose a privacy risk, along with tips for how to access a few of those that often get overlooked.
Common Types of Hidden Data in the 2007 Office System Programs Word, Excel, and PowerPoint
• File Properties summary information, such as author name
• Comments and ink annotations
• Header and footer content
• In Office Word 2007, header and footer content is not visible on screen when you view a document in Full Screen Reading view, Normal view, Outline view, or Web view. View documents in Print Layout or Print Preview to see this content.
Also, keep in mind that some headers and footers may not appear in any view, but may still be saved with the document. These include headers in sections of less than one page, as well as Different First Page or Different Odd and Even headers and footers in documents where those features are turned off (or sections with an insufficient number of pages). Information on how to check for this type of header and footer content is provided later in this article.
Tip: Note that watermarks are header and footer content.
• In Office Excel 2007, once you add a header or footer to a worksheet, it is visible directly on the worksheet. However, you may not be able to see it, depending upon your view or zoom level. Additionally, headers and footers on chart sheets are visible only in Print Preview or through the Page Setup dialog box.
• In Office PowerPoint 2007 and earlier versions, footers on slides as well as headers and footers on notes pages and handouts are visible on screen if you select the option to show them through the Header and Footer dialog box. However, if you turn off the option to show the header or footer, but do not delete the related header or footer content from the Header and Footer dialog box, that content may still be saved with the file. (In 2007, footers applied to individual slides that you then turn off are deleted from the document. But, footers applied globally may still be saved with the file after you turn them off.) Keep in mind that, similar to sections in Microsoft Word or sheets in Microsoft Excel, individual slides or masters can have their own header and footer content.
Hidden Data in Word and Excel
• Unresolved tracked changes
Turning off tracked changes or the option to view the markup on screen does not remove previously deleted tracked content. In Word, accept or reject changes to permanently remove them from your document. In Excel you can do the same or turn off workbook sharing to remove change history.
Hidden Data in Word and PowerPoint
• Embedded objects originating in Excel
When an object from a Microsoft Excel file, such as a portion of a worksheet, is pasted into a Word or PowerPoint file as an embedded object, the entire workbook from which the object originated is embedded as well. So, recipients of the document can open the embedded object and access all content on all sheets of the source workbook. To protect the private content in your Excel workbook, paste objects into Word or PowerPoint as pictures instead of embedded objects.
Tip: If you have an Excel object that has already been embedded in a Word document, remember that embedded objects are stored in Word as fields. Select the object and press CTRL+SHIFT+F9 to remove the field and convert the object to a static picture. Note, however, that the data can no longer be edited once the object is converted to a picture, so you might want to save a backup copy of the original workbook before converting the object to a picture.
There is an exception, however, to the field solution. When you paste an Excel 2007 chart into a Word 2007 document using the default past method (such as CTRL+V) and then select Entire Workbook from the Paste SmartTag, the workbook is embedded, but the chart remains an active chart object rather than an embedded object, so it’s not accessible as a field that you can unlink. In those cases, access the Chart Data from the Chart Tools Design contextual tab and delete the sheets or content that you don’t want to share. Or, cut the chart and paste it back into your document as a picture.
Hidden Data in Individual Programs
• In Word: text formatted with the Hidden font attribute
• In Excel: hidden rows, columns, or sheets
• In PowerPoint: content placed outside of the slide area
The Document Inspector
Here is the very good news. During recent Microsoft Office releases, Microsoft made available a free add-in program that helps to find and resolve many types of hidden content. In the 2007 Office release, that functionality has been improved upon and integrated directly into 2007 Office system programs Word, Excel, and PowerPoint as a new feature called the Document Inspector.
The Document Inspector enables you to search for several types of hidden content and, once such content is found, to select the content you want to remove. Also note that the document inspector is extensible, so programmers can customize what content is search for and removed with this tool. See the dialog boxes that follow for the available built-in options in each program.
The Document Inspector options dialog box in Word 2007.
The Document Inspector options dialog box in Excel 2007.
The Document Inspector options dialog box in PowerPoint 2007.
To use the Document Inspector in any of the three programs, do the following:
1. To preserve your document content, make a copy of your file before using Document Inspector. Removing some types of hidden content (such as Custom XML Data) may disable some functionality in your document. Additionally, as noted in the Document Inspector options dialog box, some changes cannot be undone.
2. When you are ready to inspect the document, click the Microsoft Office Button, point to Finish, and then click Inspect Document.
3. In the Document Inspector dialog box that opens, select the content types you want the tool to identify in your document, and then click Inspect.
4. After your file has been reviewed, the Document Inspector results dialog box appears, indicating which content types were found. A Remove All button appears next to each content type found. Click Remove All only for content that you want to permanently remove from the file.
5. Note that, after you have removed the selected content, you can click the Reinspect button at the bottom of the dialog box to check the document again. Doing so opens the Document Inspector options dialog box.
Making Choices about Your Content
For some types of content that the Document Inspector can find, such as headers and footers, keep in mind that such content often poses no privacy risk. Consider reviewing the content manually before choosing whether or not to remove it.
For example, in Microsoft Word, you can check for headers and footers inadvertently saved with the document that do not appear in any view. To do this, start by viewing the document in Print Layout view and then do the following:
• Insert page breaks (CTRL+ENTER) to check for headers and footers in sections that contain insufficient pages for displaying all header and footer content. You can immediately delete page breaks when you no longer need them.
• To access the options for Different First Page and Different Odd and Even headers and footers, on the File menu click Page Setup and then click Layout.
• If Different First Page or Different Odd and Even headers or footers are not currently enabled in your document but may have been at one time, turn on those options to check for retained content.
• Where Odd and Even headers and footers are in use, turn the feature off to check for retained content in the standard headers and footers.
These steps will not harm headers and footers that are currently visible in your document. Once you have removed any unwanted content, just reset these options to once again show your intended headers and footers.
Saving Files as PDF or XPS Documents
The 2007 Office system programs also offers a free, downloadable add-in that enables you to publish files in either Portable Document Format (PDF) or the new XML Paper Specification (XPS) format. These file formats do not carry over most types of hidden content from 2007 Office system files.
So, when you want to share a file that the recipient will not need to edit as a live document, saving the file in one of these formats enables recipients to view the finished document while greatly reducing the risk of sharing private information. To save a file from 2007 Office system programs Word, Excel, or PowerPoint to one of these formats, do the following:
1. Click the Microsoft Office Button, point to Save As, and then click Publish as PDF or XPS. (If you’ve not yet installed the add-in, you’ll instead see a link on the Save As options to help you get the add-in. If that’s the case, install the add-in and then return to these steps.)
2. In the Publish to PDF or XPS dialog box, select your preferred file type from the Save as type list.
3. By default, Document Properties and document markup (tracked changes and comments) are included when you publish to PDF or XPS. To publish without including this information, and to customize other settings for your PDF or XPS document (such as publishing only certain pages of your document), click Options. Note that options differ slightly for PDF or XPS formats. Click OK when done to return to the Publish to PDF or XPS dialog box.
4. Click Publish. Your original document is not affected when you do this.
Exposing Content Using the New XML File Formats
If you’re a more advanced Microsoft Office user who doesn’t mind getting your hands dirty, the new Office Open XML file formats provide perhaps the most exciting advance in terms of the transparency of document content.
You can literally break into your 2007 Office system documents right from your computer desktop to access all of their content, hidden data included. This is possible because the new default file formats in the 2007 Office system programs Word, Excel, and PowerPoint (such as .docx, .xlsx, and .pptx, respectively) are based on ZIP technology. Simply by changing the file extension to .zip, you can open the .zip package to expose the XML, which is broken into a set of files and folders. For example, document properties are saved in one XML file within the .zip package and, if the document contains comments, those are saved in another XML file within the same .zip package.
So, what does this mean for managing hidden data? You can actually access, edit, or remove private information without even opening the entire document. Just edit or delete the XML content that you do not want to share.
For more information on working with the Office Open XML Formats, check out Office Program Manager Brian Jones’ blog – which you can access via the related links section on this blog. Also, keep an eye on this blog for upcoming posts about a webcast series I have coming up this Spring on working with the Office Open XML formats, as well as my upcoming book – due out in February (Advanced Microsoft Office Documents 2007 Edition Inside Out), which provides an extensive introduction for advanced end users on how to break into and edit the XML behind your files (find more info about the book in the My Books link on the left side of this page).
The Importance of Due Diligence
One thing you may have noticed throughout this article is that much of what is considered hidden data is content—such as tracked changes or headers and footers—that you are likely to want in your documents at some point.
However, when files are edited by multiple people, or copied and changed for a new project or new client, it is easy to miss content that should no longer be part of the document.
The best way to protect yourself from inadvertently sharing private content is simply to be aware of the types of content that might be in your file but not immediately visible and then determine for yourself the content that you feel is important to remove. Protecting your private content does not have to be time-consuming. Using the Document Inspector to check for many content types takes just a minute. Also keep in mind that removing many hidden content types can be as easy as one or two clicks—such as deleting all comments in a document or accepting all changes to permanently remove previously deleted content that was tracked with Track Changes.
Note: this article was written by me and originally published on the Microsoft Office Beta Community Site. It has been updated to reflect changes in the release version of Office 2007 and reprinted here with permission. -- Stephanie Krieger