How to simulate browser?

cpm18
Posts: 35
Joined: Mon Apr 13, 2009 3:21 pm

How to simulate browser?

Post by cpm18 »

I'm using Vortex to fetch a page.
http://forums.pinnaclesys.com/forums/default.aspx

When I view this page with a browser, it loads the page as expected. But my vortex fetch redirects to an error page.
http://forums.pinnaclesys.com/error.htm ... fault.aspx

I considered the site was blocking me but after trying again with a proxy, I still get redirected to the error. So I am trying to determine if there is some way I can get the page to fetch with vortex in the same manner it does with a browser?
User avatar
jason112
Site Admin
Posts: 347
Joined: Tue Oct 26, 2004 5:35 pm

How to simulate browser?

Post by jason112 »

It's probably going based off the "User Agent" http header. Check out <urlcp useragent> to change it to something of a standard browser.
cpm18
Posts: 35
Joined: Mon Apr 13, 2009 3:21 pm

How to simulate browser?

Post by cpm18 »

That's what I figured, but changing the useragent has not made any difference.

Somehow the site is able to redirect my fetch to the error page but my browser goes to the correct page.
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

How to simulate browser?

Post by mark »

This makes it work for me:
<urlcp header "Accept-Language" "en-us,en;q=0.5">


Basically, sniff what a browser is sending and set urlcp options one at a time to match that to find what the server is looking for.
cpm18
Posts: 35
Joined: Mon Apr 13, 2009 3:21 pm

How to simulate browser?

Post by cpm18 »

Instead of starting a new thread, I figure I should throw this in with this one since it's a similar problem.

I'm trying to fetch from www.boatingforumz.com but the site appears to have some kind of detection process that is giving me an alternate output than my browser gets. The site looks normal when viewed from a browser but my fetch always has a $ret of...

hello. This is the page you requestd.

I've been trying to get vortex to simulate the fetch from my browser. I've tried different user agents, tried matching the header for accept-language, tried switching encoding, played around with cookies, and also tried switching http versions between 1.0 and 1.1 without any change.

Perhaps I just haven't got the right combination of settings but for whatever reason this one is stubborn.
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

How to simulate browser?

Post by mark »

Don't know what user agents you've tried but this works for me:
"Mozilla/4.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)"

The built-in default of
""Mozilla/2.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)"
seems to cause the server to return the alternate content.
cpm18
Posts: 35
Joined: Mon Apr 13, 2009 3:21 pm

How to simulate browser?

Post by cpm18 »

I wonder if there is more to it? I tried with that user agent as well as...
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 3.5.21022)

And I always get the useless data. I must be lacking or have some other setting which is problematic. I'll have to keep trying some other things.
cpm18
Posts: 35
Joined: Mon Apr 13, 2009 3:21 pm

How to simulate browser?

Post by cpm18 »

Actually, nevermind. I was able to get it working with the user agent listed above by Mark.
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

How to simulate browser?

Post by mark »

What's your Texis version? I tried 6.00.1272471692 20100428 and 5.01.1260557077 20091211.

There's also a possibility, if you've been hitting them a lot, that you've been blacklisted. Try doing the fetch from a different IP or network or through a proxy to see if that changes anything.

Try a network sniffer to compare what your browser is sending vs. what Texis is sending.
cpm18
Posts: 35
Joined: Mon Apr 13, 2009 3:21 pm

How to simulate browser?

Post by cpm18 »

I don't think this is a user agent issue but is somewhat related to the previous questions.

I'm trying to fetch this site http://www.quora.com/Dell. When I view the page source from my browser there is one part of the code which is not consistant with what my fetches get.

With the browser, I see...
<span class="timestamp">12:30am on Friday

When I do a fetch of the page, I get...
<span class="datetime" id="__w2_kO6E1QA_datespan">Insert a dynamic date here

There is then a whole bunch of javascript later on the page which is probably used to generate the date.

So I am wondering is vortex able to get the same source html that the browser is getting or will I have to find a way to execute the javascript through vortex to get the dates which appear in the browser html?
Post Reply