Make the most of metadata, or get rid of it

metadata

Thanks to the National Security Agency (NSA), we all care a bit more about metadata these days. The revelations about the information Verizon has been passing on to the NSA about phone calls are a great reminder that knowing things about a piece of information can be just as valuable as the information itself. The information the NSA gets from Verizon includes the telephone numbers on a call, the time and length of the call, the IMEI or other phone identifier, any calling cards used to place the call, the way the call is routed through the Verizon network and something called the trunk identifier, which shows where the call entered the cell phone network.

It doesn’t include the name or address of the callers, or the location of the call, but it doesn’t need to. The trunk identifier can trace the call to within a couple of miles.

The pattern of calls between numbers reveals the social network of people who are in contact. It also reports on who is passing information and where that information is going. This kind of traffic analysis was used to identify offenders in the London riots a couple of years ago, but military intelligence has been doing it for so long that both the Japanese invasion fleet heading to Pearl Harbor, and the British force in the Normandy landings, sent false radio signals to prevent traffic analysis.

Your users and developers are all generating metadata that could be useful, or that you might want to remove from files going outside the company in case it lets someone else put the pieces together.

The metadata we see most are things like hashtags on tweets and photos, locations in photographs and likes on Facebook posts. If you use a document management system (including SharePoint), automatically generated keywords help users find relevant documents (manually adding keywords is useful too, but in general, users tend not to do this with any consistency).

You can manage the metadata by creating useful hierarchies and classifications (categories of products, materials users in products, document types like proposals, specifications and presentations) and term sets (as well as the name, date and description of a document you might want to show contact details or department financial responsibility). And you can use those to organize search results.

SharePoint can find structured information like dates, stock tickers and company names in documents automatically; again, you can add your own dictionaries so it can pull out product names or company initiatives. Then you can build rules that prioritize different metadata for different user roles (so financial analysts see more documents with stock tickers in their results list than someone in the design team).

But it might also be useful to highlight documents that close colleagues or people you’ve searched for on SharePoint have looked at, something the new social features in SharePoint 2013 enable. Documents that have been read more often or updated frequently are probably more interesting than documents no one has ever looked at or gone back to edit.

You can use metadata to automatically protect important documents. The Dynamic Access Control introduced in Windows Server 2012 lets you do file classification. You can use that to make sure only people in the finance department can automatically open (or even see)  documents created by other users in finance. Think of it like the Data Leak Prevention rules you can set up in Exchange just for documents on your file server and based on metadata like what someone’s job is. When implemented, these rules stop anyone mailing out documents with credit card numbers and social security IDs.

Developers are interested in different metadata. The new “heads up display” in Visual Studio 2013 puts relevant metadata right inside the code editor, so you can select a function and see how many times it’s referenced elsewhere in the code. If that’s zero, you know at once that either you’ve forgotten to write a section of code or that you’ve found some dead code you can trim out.

Test-driven development, seeing whether the function you’ve just written means you pass or fail more tests, can improve your code quality, as can seeing that there isn’t a test for the code you just wrote. Just like the word count in a document or the error warnings in a spread sheet, seeing potential problems while you’re writing code, rather than a week later when you’re trying to remember why you wrote it, is very powerful.

Document protection

Given how much you can do with metadata, you may also want to make sure it’s stripped off documents that leave the company. A Word document stores details like all the people who’ve edited it, how long they’ve spent working on it, and if it’s been in SharePoint, which document library it was in (unlike PDFs, Office documents put some of their SharePoint metadata right into the document file). It might even have tracked changes that have been hidden rather than accepted.

office document inspector remove

Office has had the Document Inspector for several versions; you’ll find it under File, Properties (or File, Info in Office 2013). This scans the current document for a variety of metadata, tells you what it finds and lets you remove it. You can download free tools from Microsoft to do that for older versions of Office too. You can also right-click on a file in Explorer and choose Properties, Details, Remove Properties and Personal Information.

It’s all about finding balance

You need to strike a balance between security and letting people get their jobs done, so while you can use Group Policy to force Office to delete metadata from documents as soon they’re saved, that means people can’t use metadata to find files. You can get add-ins for Exchange and Outlook that can automatically strip out metadata from documents sent as attachments. If any information about how you create documents could be embarrassing for your company, that’s worthwhile.

word attachment warning

Or you could just use Information Rights Management to stop users from mailing out confidential documents, turn on Exchange Mail Tips to warn users when they’re sending an email message with an attachment to someone outside the company, and use Group Policy to set Word to warn users when they’re sending a document that has tracked changes and comments, which should take care of the problem for most businesses.

TrainSignal trial

Your trial includes access to our course on Configuring Exchange Server 2013 with SharePoint 2013!

Comments