RadioBanter

RadioBanter (https://www.radiobanter.com/)
-   Shortwave (https://www.radiobanter.com/shortwave/)
-   -   Question on web spiders (https://www.radiobanter.com/shortwave/71405-question-web-spiders.html)

Caveat Lector May 22nd 05 01:42 AM

Question on web spiders
 
Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Thanks

--
CL -- I doubt, therefore I might be !








Brian Hill May 22nd 05 02:15 AM


"Caveat Lector" wrote in message
news:8AQje.1453$Xh.738@fed1read07...
Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Thanks

--
CL -- I doubt, therefore I might be !



Programs like Adobe have search capability so I would think it's possible a
havester could use the same technique but the time to open such docs and go
through the search probably wouldn't be worth the effort to develop? This is
ascii and any address posted to usenet can be harvested.

B.H.




[email protected] May 27th 05 08:39 AM

In: 8AQje.1453$Xh.738@fed1read07, "Caveat Lector" wrote:
Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?


Yes, (well, most likely) If it can be exported to text, it can be
harvested. Take a look at googles "view as HTML" option for instance.

Not sure if spammers have resorted this far or not yet though...

Have a look at spamassassin if you want a good (free) spam detection
system.

Jamie
--
http://www.geniegate.com Custom web programming
(rot13) User Management Solutions


All times are GMT +1. The time now is 03:42 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
RadioBanter.com