14:14 Chouser: rhickey: fyi, I adjusted xml.clj slightly to use tagsoup instead of java's sax parser, and it's working quite nicely.
14:15 rhickey: cool, want to put it on the group?
14:19 The author of TagSoup, John Cowan, is on the Clojure group
14:23 Chouser: heh, coo.
14:23 cool
14:23 well, what do you think of adding an optional parameter to xml.clj's parse to allow specifying a parser?
14:24 I haven't tried to do that yet, but I assume it would be easy
14:24 rhickey: Is that all it takes? sure
14:24 Chouser: ok, if I get that working, I'll post it to the group.
14:24 rhickey: great
14:51 Chouser: rhickey: there. what could be easier?
14:52 rhickey: thanks
16:08 Chouser: huh. I think I just found a bug in xml.clj
16:09 <td>some <b>bold</b> text</td> when parsed includes neither "some" nor "text", only "bold"
16:18 rhickey: I'll look at it
16:18 Chouser: ok, thanks. I can see the problem, but I'm not sure how best to fix it.
16:19 charachters can be called when *state* is :between, and usually that should be just fine.
16:20 startElement would have to handle pushing an *sb* like endElement does
16:23 rhickey: yes on the startElement
16:24 between is kind of broken notion, I put it in to deal with junk ws/nl stuff which I get from the SAX parser where no one would consider there to be interleaved text, and didn't want to create content entries for it
16:24 Chouser: ok
16:24 rhickey: I'll have to dump ws-only character content to avoid that
16:31 Chouser: well, I don't mind the whitespace for now.
16:31 I've got a sufficiently patched-up version I can proceed...
16:37 whee! Ok, so to do the equivalent of the xpath: //td[b = 'Listing #']/node()[position() = last()]
16:37 I can say: (seq-filter html flatten :td [:b "Listing #"] #(first (reverse (% :content))))
16:39 rhickey: seq-filter?
16:39 Chouser: where "flatten" is a function that means "//"
16:39 albino: rhickey: Are you the principal creator of clojure?
16:39 Chouser: Um, yeah, lousy name. All the names are lousy, but it works.
16:39 rhickey: yes
16:40 albino: rhickey: do you get paid to do it?
16:40 rhickey: no
16:41 albino: rhickey: does anyone else make core contributions are you pretty much on your own?
16:41 rhickey: just me
16:42 albino: rhickey: very impressive, thanks for letting me take some of your time
16:42 rhickey: sure
16:43 Chouser: seq-filter is a macro that mainly applies mapcat to each expr, passing the result to the next expr.
16:44 then sprinkle in a little sugar for tag names (:td), sub-queries ([...]), and content-matching for strings ("Listing #"), and you've got most of what you need for a flexible query system for xml.clj-produced vector/maps.
16:45 and it's all lazy
16:45 rhickey: neat
16:45 Chouser: yeah, once I actually use it a bit more so as to wear down the rough edges, I hope to share it.
16:45 Got any better ideas for the name?
16:46 mapcat->
16:59 rhickey: attempted fix for xml.clj is up
17:03 Chouser: thanks!
17:05 works for me, and thanks for including my little patch. :-)