RadioBanter

RadioBanter (https://www.radiobanter.com/)
-   Policy (https://www.radiobanter.com/policy/)
-   -   Question on web spiders (https://www.radiobanter.com/policy/71400-question-web-spiders.html)

Caveat Lector May 22nd 05 01:42 AM

Question on web spiders
 
Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Thanks

--
CL -- I doubt, therefore I might be !








bb May 22nd 05 02:53 AM


Caveat Lector wrote:
Maybe not the right place but seems there are several web experts

here.

Can web spiders read and harvest e-mail addresses from a pdf file ?


Ask the Adobe people.

Many users and folks like QRZ.com are using jpegs not ascii for

listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses


harvestable ?

Thanks


Loads fewer spams since switching to this throwaway web name, despite
all of the heartfelt objections of loser K4YZ.


Chester May 22nd 05 06:51 AM



Loads fewer spams since switching to this throwaway web name, despite
all of the heartfelt objections of loser K4YZ.



chuckle, snort, guffaw

Pot, kettle.



G. Doughty May 23rd 05 08:42 AM

I don't think so. Actually, now that I think about it, Google uses
technology to generate hits on pdf's so it must be possible. Used to be it
couldn't be done but OCR programming has come a long way and could be
embedded in search engines. Unless the PDF files that are being found have
a tag for the metasearchers because they want to be found. If I recall,
when I search the net and get a pdf hit, the words in my search are still
highlighted in the document. It probably wouldn't be difficult to do the
same thing with a jpg.

Just my 2 cents.
Greg
ki4bbl


"Caveat Lector" wrote in message
news:vzQje.1452$Xh.1367@fed1read07...
Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Thanks

--
CL -- I doubt, therefore I might be !











All times are GMT +1. The time now is 10:07 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
RadioBanter.com