About Us

May 22nd 05, 02:40 AM

Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Thanks

--
CL -- I doubt, therefore I might be !

May 22nd 05, 04:35 AM

Caveat Lector wrote:
Maybe not the right place but seems there are several web experts here.

Can web spiders read and harvest e-mail addresses from a pdf file ?

Many users and folks like QRZ.com are using jpegs not ascii for listing
e-mails -- this seems to work.

So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Yes, in the sense that Optical Character Recognition (OCR) programs _can_
read text out of an image. In practice, it's not worth the spammers' or
web spider operators' trouble -- or that's been my experience, anyway.

YMMV.

--
Mike Andrews, W5EGO

Tired old sysadmin

May 22nd 05, 04:43 AM

(Mike Andrews) writes:
So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Yes, in the sense that Optical Character Recognition (OCR) programs _can_
read text out of an image. In practice, it's not worth the spammers' or
web spider operators' trouble -- or that's been my experience, anyway.

PDF files contain the underlying text strings and search engines index
them without OCR'ing. Whether spammers bother, I don't know.

May 23rd 05, 03:46 AM

Paul Rubin wrote:
(Mike Andrews) writes:
So for pdf files without going to a jpeg --- are ascii text addresses
harvestable ?

Yes, in the sense that Optical Character Recognition (OCR) programs _can_
read text out of an image. In practice, it's not worth the spammers' or
web spider operators' trouble -- or that's been my experience, anyway.

PDF files contain the underlying text strings and search engines index
them without OCR'ing. Whether spammers bother, I don't know.

Hi, Paul. Long time no see.

Depends on whether they're text-based PDF or image-based PDF. If I scan
a page into a JPEG or TIFF and then convert that to PDF, it may not have
any of the text as text, and I think it's improbable that it will.

--
Mike Andrews, W5EGO

Tired old sysadmin

May 23rd 05, 03:50 AM

(Mike Andrews) writes:
Depends on whether they're text-based PDF or image-based PDF. If I scan
a page into a JPEG or TIFF and then convert that to PDF, it may not have
any of the text as text, and I think it's improbable that it will.

Oh, I see, yes that would be about the same as a TIFF, but I don't
understand why you'd bother.

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search
Display Modes
Switch to Linear Mode Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Question on web spiders	Caveat Lector	Antenna	9	May 25th 05 06:16 AM
Good morning or good evening depending upon your location. I want to ask you the most important question of your life. Your joy or sorrow for all eternity depends upon your answer. The question is: Are you saved? It is not a question of how good	[email protected]	Antenna	0	April 25th 05 04:43 AM
Good morning or good evening depending upon your location. I want to ask you the most important question of your life. Your joy or sorrow for all eternity depends upon your answer. The question is: Are you saved? It is not a question of how good	H. Adam Stevens, NQ5H	Antenna	2	April 24th 05 10:42 PM
Good morning or good evening depending upon your location. I want to ask you the most important question of your life. Your joy or sorrow for all eternity depends upon your answer. The question is: Are you saved? It is not a question of how good	Mike Coslo	Antenna	0	April 24th 05 01:12 AM
Question Pool vs Book Larnin'	Mike Coslo	Policy	24	July 22nd 04 06:50 AM

Menu

About Us