Renderer (HTML) throws Javascript errors

bigbabou · January 9, 2006, 5:02pm

back on plastic, today iÃ‚Â´m trying to retrieve images from the flickr.com page, but this page has done a lot of dynamic web creation, itÃ‚Â´s not so easy. Now i have utilized two html (network get) nodes and two html renderers. first brings access to a photo tag search, the second shows the page resulting of the click i do in the first renderer (so i can select a picture).

as far as good.

both renderers tap dance on my nerves with the internet explorer: skript error (soll dieses skript weiter ausgefÃƒÂ¼hrt werden?) bla bla

any ideas why and how to avoid?? (donÃ‚Â´t gimme the regExpr text for deleting javascript tags…)

greetinX, bÃ‚Â³

david · January 10, 2006, 11:07am

could you post the patch or a similar one?
I dont really get what you are planning to do. If you want to get images, you could try to parse the website with XPath(xml) therefore the source code has to be wellformed xml (= xhtml). If XPath dont work, try to use tidy(xml) before.

David

bigbabou · January 11, 2006, 10:24am

hi david,

thanks for the hint @ xpath. i´m diving into it right now, but still haven´t received any usable results. therefore i post the (i admit: poor) patch to crawl flickr by searching for a tag.

my target is to have an automated image retrieving mechanism.

flickr utilizes javascript to show the button for “original size” of the picture. how to follow this link? (this maybe off topic since 4v isn´t the 1st application that works the web but i´d like to stick on this piece of environment)

greetinX, b³

FlickrAccess.v4p (18.3 kB)

david · January 11, 2006, 1:39pm

Often websites using loose XHTML. Flickr uses unicode characters, wich are not allowed in XML. That is the reason why XPath(xml) can not query the site directly. After replacing one single character and tidying up the code with tidy(xml) you are now able to query.

See attached patch. it should do what you need.

Greetings
David

FlickrAccessWithXPath.v4p (18.3 kB)

bigbabou · January 11, 2006, 2:02pm

long term version : oh my god! i´m impressed. didn´t expect high quality interpretation of my need in acceptable time.

short term version : thank you.

that XPath is still a bit confusing for me, but it´s worth digging, as you prove to me.

but initial question remains: javascript interpretation? is it possible for the renderer?

oschatz · January 11, 2006, 2:22pm

After replacing one single character and tidying up the code with tidy(xml)

i’d like to log that as a feature request…
tidy (xml) should either add definitions in given charset for all undefined characters - or just strip them optionally. what do you think?

david · January 11, 2006, 3:45pm

i’d like to log that as a feature request…
tidy (xml) should either add definitions in given charset for all undefined characters - or just strip them optionally. what do you think?

definitely! i am going to fix that!

but initial question remains: javascript interpretation? is it possible for the renderer?

After testing a little bit, I found out, that it Javascript interpretation works actually properly. BUT as soon as .js files (or other ressources) are linked to a different location than the Base, the renderer won’t resolve them.
I know the renderer (HTML url) seems not to work in this release, but i guess it would do the job.

David