DMS Insights from Cognidox

OfficeToPDF: open source PDF conversion for Microsoft Office 2007&2010

Written by Paul Walsh | 20 May, 2011

 

This week we released another project under an open source license.

It's called OfficeToPDF and it's a free command line tool that automates server-based PDF conversion for Microsoft Office 2007 and 2010 users. It requires an installation of either Office 2007 or 2010 to work.

Now, you may just read the words "PDF" and "conversion" and think we're out of our minds releasing yet another PDF tool, even if it is free. Calm down, we're not quite losing the plot. You need to focus on the words "automates" and "server-based". Most Office to PDF converter tools are intended as single-user desktop applications. OfficeToPDF is useful (and unique, as far as we can tell) if you want to automatically create PDF files on a server-wide basis and free individual users from an extra step of using the "Save as..." command on their Office files. These PDF files can then be stored and managed on a separate server. This can be useful if, for example, a department has a policy of only distributing PDF versions of documents to people outside the department.

Of course, this is one of the features of CogniDox. What we were able to do is to separate code we'd written and make it a stand-alone component. The fact that we use it daily means it's a robust offering. The fact we had to write it ourselves shows how unsuccessful we were in looking for an existing solution.

Many companies have developed PDF software that either attempt to parse the source Office document format and then render it to PDF, or print-spool the document using Office to a PDF printer device. When you render Office documents by parsing, you don't always get all the rendering subtleties that Office formats support. When printing, you don't always get features like hyperlinks that Office PDF output supports.

Basically, Office now has better PDF output capabilities than many competing PDF products. Office users will know that if you use Office 2007 you have to download and install a separate add-in called "Microsoft Save as PDF or XPS". In Office 2010 it comes built-in. It allows you quite simply to save a file as a PDF or XPS document. There are different options, but generally the PDF it produces is of a consistently high quality.

This is fine as long as every individual user has the time and presence of mind to save every file in both the Office format and as a PDF. Of course that won't happen. If you wanted to keep all of the documents produced by a team in a shared area (or even better, in a document management system) then you'd end up with a mixture of e.g. Word files and PDFs.

So, OfficeToPDF is run as a background process (via the command line) and can be programmed to convert every new file into a PDF. This takes away the need for an individual user to do this manually. Simple, but the only alternatives we could find were proprietary and had price tags over $10K for a site license. It's better to use Office for this with OfficeToPDF as the "wrapper" around it. We had no desire to enter the PDF market, so it's released as free open source.

It's worth adding that although the idea sounds fairly simple, it still required a brisk work-out with Microsoft's finest: the .NET Framework 4 and Visual Studio 2010 - to make it all work to plan. We still use a lot of other PDF-related tools inside CogniDox, to do things like security and watermarking, so there is plenty more 'special sauce' still in the bottle.

When it came to deciding where the software should be hosted, we opted for the CodePlex site. This was set up by Microsoft for open source projects. It fills a similar space as Sourceforge, Google Code and Github, but we think it's a more natural destination for anyone searching for Windows-based open source projects. When it came to choosing a license, we chose the Apache 2.0 license. This gives others permission to include OfficeToPDF in their projects without any obligations other than to retain our Cognidox Limited copyright statements. Because it's open source, you can study the code to make sure it does what you need, modify it if you want, and distribute it on afterwards.