Home |
Search |
Today's Posts |
|
#1
![]() |
|||
|
|||
![]()
Maybe not the right place but seems there are several web experts here.
Can web spiders read and harvest e-mail addresses from a pdf file ? Many users and folks like QRZ.com are using jpegs not ascii for listing e-mails -- this seems to work. So for pdf files without going to a jpeg --- are ascii text addresses harvestable ? Thanks -- CL -- I doubt, therefore I might be ! |
#2
![]() |
|||
|
|||
![]()
Caveat Lector wrote:
Maybe not the right place but seems there are several web experts here. Can web spiders read and harvest e-mail addresses from a pdf file ? Many users and folks like QRZ.com are using jpegs not ascii for listing e-mails -- this seems to work. So for pdf files without going to a jpeg --- are ascii text addresses harvestable ? Yes, in the sense that Optical Character Recognition (OCR) programs _can_ read text out of an image. In practice, it's not worth the spammers' or web spider operators' trouble -- or that's been my experience, anyway. YMMV. -- Mike Andrews, W5EGO Tired old sysadmin |
#3
![]() |
|||
|
|||
![]() |
#4
![]() |
|||
|
|||
![]()
Paul Rubin wrote:
(Mike Andrews) writes: So for pdf files without going to a jpeg --- are ascii text addresses harvestable ? Yes, in the sense that Optical Character Recognition (OCR) programs _can_ read text out of an image. In practice, it's not worth the spammers' or web spider operators' trouble -- or that's been my experience, anyway. PDF files contain the underlying text strings and search engines index them without OCR'ing. Whether spammers bother, I don't know. Hi, Paul. Long time no see. Depends on whether they're text-based PDF or image-based PDF. If I scan a page into a JPEG or TIFF and then convert that to PDF, it may not have any of the text as text, and I think it's improbable that it will. -- Mike Andrews, W5EGO Tired old sysadmin |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|