Javascript walking

Post Reply
robert.phillips
Posts: 12
Joined: Tue Mar 22, 2005 4:26 pm

Javascript walking

Post by robert.phillips »

I cannot get the crawler to recognize the links in the following HTML. The profile was created with the default settings. Javascript walking appears to be enabled by default. Will any settings make this work?

I'm assuming that it is not working because if the link is followed ,the page is regenereted with standard hyperlinks to files in the same directory that I would expect the crawler to index. I am also not seeing multiple hits in the web server log which leads me to believe it is not following the links.

In case it matters, this is being generated by an asp.net application that is using the LinkButton control.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<HTML>
<HEAD>
<title>WebForm1</title>
</HEAD>
<body>
<form name="Form1" method="post" action="WebForm1.aspx" id="Form1">
<input type="hidden" name="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" value="dDwtMjA0ODM4MjIwNzs7PohhKNmpjnpA8Ex+vrCDtuH0g9EQ" />

<script language="javascript" type="text/javascript">
<!--
function __doPostBack(eventTarget, eventArgument) {
var theform;
if (window.navigator.appName.toLowerCase().indexOf("microsoft") > -1) {
theform = document.Form1;
}
else {
theform = document.forms["Form1"];
}
theform.__EVENTTARGET.value = eventTarget.split("$").join(":");
theform.__EVENTARGUMENT.value = eventArgument;
theform.submit();
}
// -->
</script>

<a id="LinkButton1" href="javascript:__doPostBack('LinkButton1','')">Link Button 1</a>
<br>
<br>
<a id="Linkbutton2" href="javascript:__doPostBack('Linkbutton2','')">Link Button 2</a>
<br>
<br>
<a id="Linkbutton3" href="javascript:__doPostBack('Linkbutton3','')">Link Button 3</a>
</form>



</body>
</HTML>
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Javascript walking

Post by Kai »

Currently the JavaScript plugin does not report dynamic links generated by the form.submit() method, as the static link from the form action has already been reported. Even if they were reported, the links would be via the GET method, not POST.

You should at least be seeing a link to WebForm1.aspx, with no form vars, however.
sourceone
Posts: 47
Joined: Tue Mar 29, 2005 2:10 pm

Javascript walking

Post by sourceone »

I'm trying to crawl links within a page that was generated using ASP.NET. The links use the following javascript function to call the next page. I tried to retrieve the next page by setting the form variables and calling the Vortex function submit. This doesn't seem to work. Any ideas on how to crawl these links?

link:
<a HREF="javascript:doPostBack(5,1);"><u>Next 5</u></a>

javascript:
<script language="javascript">
<!--

function doPostBack(intIndex, intSortID)
{
document.PostForm.Index.value = intIndex;
document.PostForm.SortID.value = intSortID;
document.PostForm.submit();
}

// -->
</script>
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Javascript walking

Post by Kai »

With version 5+ of Vortex you can set the values of a form after <fetch>ing it, and then get the resulting modified form URL and content to use for <submit>ing. Ie. after fetching that page, you could call:

<!-- Set form values: -->
<urlcp domvalue "document.PostForm.Index.value" 5>
<urlcp domvalue "document.PostForm.SortID.value" 1>
<!-- Submit the modified form: -->
<urlinfo domvalue "document.PostForm.submitUrl">
<$u = $ret>
<urlinfo domvalue "document.PostForm.submitContent">
<$q = $ret>
<if "" neq $q>
<submit url=$u method=POST data=$q
content-type="application/x-www-form-urlencoded">
<else>
<fetch $u>
</if>

<urlcp domvalue "x" "y"> sets DOM value "x" to the value "y". <urlinfo domvalue "x"> retrieves the DOM value of "x". Only some DOM0 elements are supported. The DOM values .submitUrl and .submitContent, for forms, are non-standard Vortex DOM additions that return the URL and Content needed to submit the form.

(BTW this should be posted under Webinator or Texis Web Script).
sourceone
Posts: 47
Joined: Tue Mar 29, 2005 2:10 pm

Javascript walking

Post by sourceone »

Are there any workaround using version 4.02? If not, can I get the latest version?
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Javascript walking

Post by Kai »

For version 4-, you'd pretty much have to REX and parse out all the <INPUT> tags etc. and build the query string/content yourself, from <urlinfo rawdoc>. Kinda complicated.

Contact sales at Thunderstone about upgrading.
Post Reply