« To Float or Not to Float (Word) | Main | Linking Text Boxes, Chart Titles, and Data Labels Dynamically to Data (Excel) »

The Whole Truth About Metadata in Office Documents
September 02, 2004

The metadata issue in Office -- particularly Word -- has become such a hot topic that the fear-building propoganda available on the topic is absolutely overflowing!

For anyone not familiar with it, document metadata just means content saved with the document that you don't see in the user interface or the printed document. Many types of metadata, even in Word, are not dangerous to your privacy at all … such as format settings, style names, etc. And here's the quick good news on those types of metadata that genuinely may pose a privacy risk: you can get rid of it all by yourself!

Some producers of add-in applications referred to as metadata scrubbers would have you believe that there is a grand conspiracy brewing within your very own documents that only they can solve for you. Even some articles from highly respected publications contain misinformation — the whole topic seems to have turned into an out-of-control, global game of telephone!

So, without further ado, here’s the straight scoop followed by some quick instructions for do-it-yourself document cleaning:

- Anyone who tells you that dangerous metadata exists in your documents that you can't remove on your own is selling something. Period. However, depending on how ‘clean’ your docs need to be -- when used carefully, some products out there can be valuable timesavers...

- Almost any software application in which you can save a file stores some type of metadata. Metadata isn’t at all exclusive to Office documents.

- There is no privacy-risk form of metadata that you can’t remove quite easily from a Word, Excel, or PowerPoint 2003 document.

- Virtually every scrubber available does more than you need, and may remove content from your document that you need in order to have full functionality. If you choose to use a scrubber, always make a copy of the document first, and review the settings carefully, so you know exactly what the tool is going to do to your documents!

Okay, so how do you get rid of the most common types of privacy-risk metadata on your own? Easier and faster than you’d think! (Instructions below apply to Office 2003 and Office XP.) Here goes:

1. In Tools, Options, Save (Word or PowerPoint) confirm that there is no check in the box labeled Allow fast saves, click OK to close Tools, Options, and then save your document. This assures that no previously deleted content will appear if your document is viewed in a text editor.

2. In Word, confirm that Track Changes is off in the document, then on the Reviewing toolbar, click Accept All Changes in Document from the Accept Changes drop-down list. This removes any previously deleted content saved with tracked changes. In Excel, delete change history by turning off workbook Sharing (Tools menu). In PowerPoint, you can delete all markup from the Reviewing toolbar as well.

3. In Tools, Options, Security (Word, Excel, and PowerPoint), check the option to Remove Personal Information .... on Save. This will remove author history and author-related file summary content; and remove author initials from comments and tracked changes.

4. If you use the File, Versions feature, delete any previous versions you’ve saved through that dialog box (you can delete all versions listed in that dialog box without affecting the currently open version of the document.)

5. Convert any embedded objects to pictures (especially Excel objects -- as an embedded Excel object embeds the entire workbook from which it originates). You can do this quickly by selecting the object and pressing Ctrl+Shift+F9 (Embedded objects are saved in your document as fields … that shortcut keystroke combination removes the automation from a field and leaves just its static, visible result (such as the text shown by a hyperlink or a page number field, or in this case, a picture of the embedded object).)

... Those are my top picks for the potentially private stuff you’ll probably want to get out of your documents. Depending on the level of diligence you need -- there might be others ... such as removing the Routing Slip from a document (if you've created one) when you don't want a recipient to know who's seen the document; deleting hyperlinks that contain locations you don’t want to share; etc. But, most of us really just care that stuff we've deleted from our documents is actually gone. And that's all there is to it.

- It's also good to keep in mind that there are several types of content that can pose a privacy risk, but aren't metadata at all... just stuff you might forget about, depending on your document layout and how you’re viewing the document:

Headers and footers may contain previously saved content that doesn’t appear in your document (for example, if a section of a Word document becomes shorter than a page and uses a continuous section break, existing headers or footers for that section would not be visible in the document because there’s no page for them).
Footnote content only appears on screen when you're working in Page Layout view.
Text formatted with the hidden attribute is only visible if your settings tell it to be (set that in Tools, Options, View).
One non-Word example of potentially hidden data is saved header\footer content for a PowerPoint slide that isn't being shown on the slide (View, Header and Footer in PowerPoint).

In all of these cases, just a little diligence in reviewing your document before you share it will do the trick and save you lots of stress.

So yes, there are several steps if you want to be sure that no information goes with a document except the content you expressly put in it. And, if you want shortcuts to getting that done, you’ve got several options:

- You can remove all privacy risk posed by document metadata in one step just by converting a document to an Acrobat PDF before sending it (lots of large companies do this!) Though you might have heard that the PDF technically retains some metadata … you've no need to worry unless, for example, you care that the recipient knows how much space after the paragraph you used for formatting! Converting to a PDF removes all types of previously deleted content. It's a great way to go if you own the full version of Acrobat, and the recipient won't need to edit the document.

- The free Microsoft Hidden Data Add-In download addresses a mix of metadata issues and some hidden data issues that aren’t actually metadata (as discussed above). (Here's a fun fact: the Office guys who developed this got some of their input on what to include in the tool via requests from the intelligence industry.) It can be a useful tool but take caution with it. Depending on the document, it can be rather time-consuming to run. Also, it doesn’t give you a ton of choices about what you want to leave or remove (I find it fairly over-eager in what it removes) – so be sure to make a copy of the original for backup.

- My favorite solution is just using a few macros - save time by quickly removing the metadata and hidden content you want to get rid of without going overboard. In fact, MODD contains a set of macros that help you do this for Word, Excel, and PowerPoint.

Of course, regardless of the approach you take to removing metadata, be sure you know what's getting removed before you do it and what affect that will have on your document …and saving a copy of your original for backup is always a good idea!

Finally, if you prefer to have the dog-and-pony show, there are very capable scrubbers out there. Just please don’t buy into the dramatic hype ... there's no need to fear your documents!

Posted by Stephanie

Trackback Pings

TrackBack URL for this entry:
http://www.arouet.net/cgi-bin/mt/mt-tb.cgi/14

Listed below are links to weblogs that reference The Whole Truth About Metadata in Office Documents:

» Metadata myths from Ed Bott - Windows (and Office) Expertise
Stephanie Krieger has an excellent debunking of some myths about the "hidden" parts of files created by Microsoft Office applications: The Whole Truth About Metadata in Office Documents: Some producers of add-in applications referred to as metadata scr... [Read More]

Tracked on September 3, 2004 02:03 AM

» Personal Information from Unofficial Microsoft Office Stuff
Word Clean it up If you have enabled the fast save feature, earlier versions of your document may still be readable. [Read More]

Tracked on September 3, 2004 02:54 AM

» refinance loan from refinance loan
refinance loan [Read More]

Tracked on March 28, 2006 07:06 AM

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?


Comments

And on an unrelated topic...
Very stylish site.
But my eyes are tired and I wish you'd kerned another point or two between the 'e' and the 't' in your logo. As it is, they blend into a 'd'

cheers!

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)