#clojure log - Feb 22 2008

The Joy of Clojure
Main Clojure site
Google Group
List of all logged dates

14:14 Chouser: rhickey: fyi, I adjusted xml.clj slightly to use tagsoup instead of java's sax parser, and it's working quite nicely.

14:15 rhickey: cool, want to put it on the group?

14:19 The author of TagSoup, John Cowan, is on the Clojure group

14:23 Chouser: heh, coo.

14:23 cool

14:23 well, what do you think of adding an optional parameter to xml.clj's parse to allow specifying a parser?

14:24 I haven't tried to do that yet, but I assume it would be easy

14:24 rhickey: Is that all it takes? sure

14:24 Chouser: ok, if I get that working, I'll post it to the group.

14:24 rhickey: great

14:51 Chouser: rhickey: there. what could be easier?

14:52 rhickey: thanks

16:08 Chouser: huh. I think I just found a bug in xml.clj

16:09 <td>some <b>bold</b> text</td> when parsed includes neither "some" nor "text", only "bold"

16:18 rhickey: I'll look at it

16:18 Chouser: ok, thanks. I can see the problem, but I'm not sure how best to fix it.

16:19 charachters can be called when *state* is :between, and usually that should be just fine.

16:20 startElement would have to handle pushing an *sb* like endElement does

16:23 rhickey: yes on the startElement

16:24 between is kind of broken notion, I put it in to deal with junk ws/nl stuff which I get from the SAX parser where no one would consider there to be interleaved text, and didn't want to create content entries for it

16:24 Chouser: ok

16:24 rhickey: I'll have to dump ws-only character content to avoid that

16:31 Chouser: well, I don't mind the whitespace for now.

16:31 I've got a sufficiently patched-up version I can proceed...

16:37 whee! Ok, so to do the equivalent of the xpath: //td[b = 'Listing #']/node()[position() = last()]

16:37 I can say: (seq-filter html flatten :td [:b "Listing #"] #(first (reverse (% :content))))

16:39 rhickey: seq-filter?

16:39 Chouser: where "flatten" is a function that means "//"

16:39 albino: rhickey: Are you the principal creator of clojure?

16:39 Chouser: Um, yeah, lousy name. All the names are lousy, but it works.

16:39 rhickey: yes

16:40 albino: rhickey: do you get paid to do it?

16:40 rhickey: no

16:41 albino: rhickey: does anyone else make core contributions are you pretty much on your own?

16:41 rhickey: just me

16:42 albino: rhickey: very impressive, thanks for letting me take some of your time

16:42 rhickey: sure

16:43 Chouser: seq-filter is a macro that mainly applies mapcat to each expr, passing the result to the next expr.

16:44 then sprinkle in a little sugar for tag names (:td), sub-queries ([...]), and content-matching for strings ("Listing #"), and you've got most of what you need for a flexible query system for xml.clj-produced vector/maps.

16:45 and it's all lazy

16:45 rhickey: neat

16:45 Chouser: yeah, once I actually use it a bit more so as to wear down the rough edges, I hope to share it.

16:45 Got any better ideas for the name?

16:46 mapcat->

16:59 rhickey: attempted fix for xml.clj is up

17:03 Chouser: thanks!

17:05 works for me, and thanks for including my little patch. :-)

Logging service provided by n01se.net