back on plastic, today iÃ‚Â´m trying to retrieve images from the flickr.com page, but this page has done a lot of dynamic web creation, itÃ‚Â´s not so easy. Now i have utilized two html (network get) nodes and two html renderers. first brings access to a photo tag search, the second shows the page resulting of the click i do in the first renderer (so i can select a picture).
as far as good.
both renderers tap dance on my nerves with the internet explorer: skript error (soll dieses skript weiter ausgefÃƒÂ¼hrt werden?) bla bla
could you post the patch or a similar one?
I dont really get what you are planning to do. If you want to get images, you could try to parse the website with XPath(xml) therefore the source code has to be wellformed xml (= xhtml). If XPath dont work, try to use tidy(xml) before.
thanks for the hint @ xpath. i´m diving into it right now, but still haven´t received any usable results. therefore i post the (i admit: poor) patch to crawl flickr by searching for a tag.
my target is to have an automated image retrieving mechanism.
Often websites using loose XHTML. Flickr uses unicode characters, wich are not allowed in XML. That is the reason why XPath(xml) can not query the site directly. After replacing one single character and tidying up the code with tidy(xml) you are now able to query.
i’d like to log that as a feature request…
tidy (xml) should either add definitions in given charset for all undefined characters - or just strip them optionally. what do you think?
definitely! i am going to fix that!
I know the renderer (HTML url) seems not to work in this release, but i guess it would do the job.