Is there any future plans to enable content indexing within documents and PDFs? This is one of the best features of Google Drive, would be great to have this feature so our users and cut down on time writing in metadata.
Hi Daniel H.
Portfolio Server 10.2 can extract text from several document formats. This PDF shows what formats are supported:
Thanks for that useful document! So I’m noticing we’re running 10.1, so this is a new feature in 10.2?
Portfolio 10.1 can index and search document text from most, if not all, the same document formats. One thing that is new in 10.2 is the ability to index text from OCR layers in PDFs.
To use this feature (in any version of Portfolio), open your catalog in the Portfolio Desktop Client, then go to Catalog -> Advanced Options. Switch to the ‘Properties’ tab, then enable ‘Index Document Text’ and click on the ‘OK’ button.
Portfolio will now index text from documents that are added to this catalog after making that change. For documents that are already in your catalog: select their thumbnails and then go to Item -> Extract Properties.
To search for document text, go to Catalog -> Find Other -> Document Text (Win) or Edit -> Find -> Document Text (Mac).
Great, got it. Thought something like this would be on automatically, is there any downside for having this on? Does it keep reindexing when changes are made? Thanks for the awesome support.
Having this feature on will increase the filesize of your catalogs, that’s about the only downside. Portfolio won’t automatically re-index a file if changes are made to it. That would have to be done manually, by selecting the thumbnail and choosing Item -> Extract Properties.
Did the extract properties to the catalog and it crashes. I guess I have too many files.
That’s possible. Try it in smaller batches.
Is there a way for people on netpublish to search by text within a document?
No, the NP API doesn’t expose that data. If you’ve SQL catalogues you could always use ASP or PHP and try and write your own queries directly to the SQL database (I don’t know how this data is stored under the SQL catalogue schema). Indeed, I don’t think the text index is available to anything except the special find in the Desktop client.
Will this perhaps happen in the future? I don’t know how to write code. Thanks!
Extensis don’t publish roadmaps, so no one knows (or if they do, can’t say). If you need this feature it’s in your interest to make sure you submit a feature request direct to Extensis.
Thanks, where do I do that?
I was trying to find your account in our customer database, but it looks like you’re not in it. Can you go to: secure.extensis.com/en/support/c … upport.jsp and submit a ticket via the web form? Once you do, I will turn it into a feature request in our internal system.
I really like the ability to search across PDF metadata. I know how to do it via Desktop Client, but is there a way to do so via the Web Client interface?
Unfortunately, the Web Client doesn’t have this capability. I was able to find your account in our customer database, so I will file a feature request on your behalf. Our developers are continually adding new features to the Web Client, so your request is of value in helping them make priorities.