|
|
< Day Day Up > |
|
Hack 20 Index and Search Local PDF Collections on Windows
Teach Windows XP or 2000 how to search the full text of your PDF along with your other documents. Or, use Adobe Reader to search PDF only. Search is essential for utilizing document archives. Search can also find things where you might not have thought to look. The problem is that Windows search doesn't know how to read PDF files, by default. We present a couple of solutions. 2.7.1 Search PDF with Adobe ReaderThe free Adobe Reader 6.0 provides the easiest
solution. It enables you to perform searches across your entire PDF
collection (Edit Figure 2-5. Collection search results in Reader linking directly into the documents![]() The downside to Adobe Reader search is that it searches PDF documents only. 2.7.2 Index and Search PDF with Windows XP and 2000It makes sense to search across all file types from a single interface. Newer versions of Windows enable you to extend its built-in search feature to include PDF documents. With Windows 2000, all you need to do is install the freely available PDF IFilter from Adobe. With Windows XP, you must also apply a couple of workarounds. In both cases, you can use the Windows Indexing Service to speed up searches. The Windows Indexing Service is powerful but needs to be configured for best performance. The next section introduces you to the Indexing Service. We then discuss installing and troubleshooting Adobe's PDF IFilter. 2.7.3 Windows Indexing Service: Installation, Configuration, and DocumentationYou don't need Indexing Service to search your computer, but it can be handy. Queries run much faster, and you can use advanced search features such as Boolean operators (e.g., AND, OR, and NOT), metadata searches (e.g., @DocTitle Contains "pdf"), and pattern matching. The downside is that the Indexing Service always runs in the background, using resources to index new or updated documents. A little configuration ensures that you get the best performance. First off, do you have Indexing Service? If not, how do you install
it? Both questions are answered in the Windows Components
Wizard window. In Windows XP or 2000, open this wizard by selecting
Start Figure 2-6. Adding the Indexing Service component to XP or 2000![]() Access Indexing Service configuration and documentation from the Computer Management window, shown in Figure 2-7. Right-click My Computer and select Manage. In the left pane, unroll Services and Applications and then Indexing Service. Figure 2-7. The Computer Management window, where you configure the Indexing Service![]() Sometimes you must stop or start the Indexing Service. Right-click the Indexing Service node and select Stop or Start from the context menu. Under the Indexing
Service node you'll find index
catalogs, such as System. Add, delete, and
configure these catalogs so that they index only the directories you
need. For details on how to do this, I highly recommend the
documentation under Help
You still can search the directories you do not index by selecting
Start Before installing the PDF IFilter, create a special catalog for testing purposes. Put a few PDFs in its directory. Disable indexing on all other catalog directories by double-clicking these directories and selecting "Include in Index? No." This will simplify testing because indexing many documents can take a long time.
2.7.4 Prepare to Install PDF IFilter 5.0On Windows XP and 2000, you have two kinds of searches: indexed and unindexed. An indexed search relies on the Indexing Service, as we have discussed. An unindexed search takes a brute-force approach, scanning all files for your queried text, as shown in Figure 2-8. In both cases, the system uses filters to handle the numerous file types. These filters use the IFilter API to interface with the system. Figure 2-8. An unindexed search![]() A PDF IFilter is freely available from Adobe. Visit http://www.adobe.com/support/salesdocs/1043a.htm and download ifilter50.exe. Adobe's web page states that this PDF IFilter works only on servers. In fact, it works on XP Home Edition, too. If you run Windows 2000, you can install the PDF IFilter and it will work for both indexed and unindexed PDF searching. If you run Windows XP Home Edition and install the PDF IFilter (Version 5.0), you might need to disable the PDF IFilter for unindexed PDF searches. Unindexed searching of PDFs on XP Home Edition with the PDF IFilter can leave open file handles lying around, which will cause all sorts of problems. Visit http://www.pdfhacks.com/ifilter/ and download PDFFilt_FileHandleLeakFix.reg. We will use it in our installation instructions, later in this hack. This registry hack ensures that only the Indexing Service uses the PDF IFilter. After you apply this hack, PDFs will be treated like plain-text files during unindexed searches. You can undo this registry hack with PDFFilt_FileHandleLeakFix.uninstall.reg.
2.7.5 Install and Troubleshoot Adobe PDF IFilter 5.0On XP, installing the PDF IFilter might require a couple of registry hacks. First we'll install it, then we'll troubleshoot.
To test your index, don't select Start Figure 2-9. Testing your index with negative results![]() 2.7.5.1 PDF IFilter doesn't work with XP Indexing Service—workaroundPDF IFilter and Indexing Service don't see eye to eye on Windows XP. If querying indexed PDF yields empty sets, give this a try:
Your test query should now work, as shown in Figure 2-10. Figure 2-10. PDF indexed search success![]()
2.7.6 Using Start
|
|
|
< Day Day Up > |
|