Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Old May 22nd 05, 01:40 AM
Caveat Lector
 
Posts: n/a
Default Question on web spiders

Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Thanks



--
CL -- I doubt, therefore I might be !







  #2   Report Post  
Old May 22nd 05, 03:35 AM
Mike Andrews
 
Posts: n/a
Default

Caveat Lector wrote:
Maybe not the right place but seems there are several web experts here.


Can web spiders read and harvest e-mail addresses from a pdf file ?


Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.


So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?


Yes, in the sense that Optical Character Recognition (OCR) programs _can_
read text out of an image. In practice, it's not worth the spammers' or
web spider operators' trouble -- or that's been my experience, anyway.

YMMV.

--
Mike Andrews, W5EGO

Tired old sysadmin
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 RadioBanter.
The comments are property of their posters.
 

About Us

"It's about Radio"

 

Copyright © 2017