Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Old May 22nd 05, 01:42 AM
Caveat Lector
 
Posts: n/a
Default Question on web spiders

Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Thanks

--
CL -- I doubt, therefore I might be !







  #2   Report Post  
Old May 22nd 05, 02:53 AM
bb
 
Posts: n/a
Default


Caveat Lector wrote:
Maybe not the right place but seems there are several web experts

here.

Can web spiders read and harvest e-mail addresses from a pdf file ?


Ask the Adobe people.

Many users and folks like QRZ.com are using jpegs not ascii for

listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses


harvestable ?

Thanks


Loads fewer spams since switching to this throwaway web name, despite
all of the heartfelt objections of loser K4YZ.

  #3   Report Post  
Old May 22nd 05, 06:51 AM
Chester
 
Posts: n/a
Default



Loads fewer spams since switching to this throwaway web name, despite
all of the heartfelt objections of loser K4YZ.



chuckle, snort, guffaw

Pot, kettle.


  #4   Report Post  
Old May 23rd 05, 08:42 AM
G. Doughty
 
Posts: n/a
Default

I don't think so. Actually, now that I think about it, Google uses
technology to generate hits on pdf's so it must be possible. Used to be it
couldn't be done but OCR programming has come a long way and could be
embedded in search engines. Unless the PDF files that are being found have
a tag for the metasearchers because they want to be found. If I recall,
when I search the net and get a pdf hit, the words in my search are still
highlighted in the document. It probably wouldn't be difficult to do the
same thing with a jpg.

Just my 2 cents.
Greg
ki4bbl


"Caveat Lector" wrote in message
news:vzQje.1452$Xh.1367@fed1read07...
Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Thanks

--
CL -- I doubt, therefore I might be !









Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 RadioBanter.
The comments are property of their posters.
 

About Us

"It's about Radio"

 

Copyright © 2017