#clojure log - Aug 10 2008

The Joy of Clojure
Main Clojure site
Google Group
List of all logged dates

11:03 achim_p: hi there

11:03 does anybody have recommendations for FP specific algorithms/data structurs texts?

11:05 arohner: that's a good question

11:05 I've heard of a few, but I don't know if they're any good

11:10 achim_p: i saw these being mentioned:

11:10 http://www.eecs.usma.edu/webs/people/okasaki/pubs.html#cup98

11:10 http://www.iro.umontreal.ca/~lapalme/Algorithms-functional.html

11:11 arohner: i was about to link to purely functional data structures

11:14 rhickey: how much work would it be to replace all references to a datastructure with a weak reference?

11:15 i.e. (def x (weak-ref x))

11:15 rhickey: arohner: for me to change vars?

11:15 arohner: not all vars

11:15 but at runtime to modify an existing var

11:16 rhickey: what is the use case?

11:16 arohner: implementing a DB :-)

11:17 so I can serialize parts of my dataset to disk

11:17 so replace a var with a weak ref to the var, and serialize it to disk

11:17 since the ref is now weak, the GC can collect it

11:17 rhickey: arohner: you can use weak references whenever you want, not sure how Clojure would get involved

11:18 arohner: I guess I'm asking if the compiler/runtime keeps any references to objects under the covers

11:18 rhickey: lots

11:18 arohner: so my one more weak ref wouldn't do anything in the whole GC scheme

11:18 because the object won't get GC'd until all of the strong refs are gone

11:18 rhickey: not sure what you mean, you can put a weak ref in a var, that will be the only ref

11:19 arohner: oh, duh.

11:19 rhickey: but data structures share structure profusely

11:20 arohner: because of things like conj?

11:22 rhickey: just the nature of persistent data structures - object identity isn't such a strong notion, but that won't really effect a weak-ref scheme for gc, ust note that part of you data structure may still be strongly ref-ed from another that shares structure

11:23 arohner: is there any way to identify what shares structure, or a way to split the two?

11:23 rhickey: no, but I would wonder why you would need to do so

11:24 arohner: so I can have a clear understanding of when something is able to be serialized to disk and GC'd

11:26 I want this to happen automatically based on memory usage

11:26 rhickey: the data structures are immutable, you can serialize at any time

11:28 sounds like you want SoftReferences

11:29 arohner: ah, you're right

11:30 you covered the step I missed earlier, that the compiler/runtime can't hold any hidden refs to objects, or else normal GC wouldn't happen

14:27 Chouser: The clojure.set operations, by virtue of returning sets, are eager. However, if they returned seqs many could be lazy. I wonder if this would generally be of any benefit.

14:30 rhickey: Chouser: which ones?

14:32 Chouser: I haven't picked trhough all of them, but union, difference, and intersection each only need to access one of their inputs randombly -- the other input and the output could be lazy seqs.

14:34 rhickey: The idea in set is that these operations can be composed efficiently, as soon as they return seqs then they no longer are sets, and would have to be made into sets for the next operation

14:35 but the ops union etc would be interesting as seq fns too, I agree, but not instead

14:36 Chouser: sure. I still don't know if there would often be any use or benefit to the seq forms. It was just an observation.

14:36 (I'm still reading Tar Pit)

14:37 rhickey: Chouser: do you like it?

14:37 Chouser: very much -- it's stimulating a lot of thought.

14:38 arohner: what is Tar Pit?

14:38 oh, the paper that was linked on groups the other day

14:38 Chouser: yes. still looking for the link.

14:38 arohner: yeah, I'm reading it too, I just didn't remember what it was called :-)

14:38 Chouser: :-)

14:39 drewr: arohner: http://groups.google.com/group/clojure/msg/5512f4d7be55edc7

14:41 Chouser: I think the ICFP problem would be an easy fit for separating as the paper describes.

14:42 The functional business of deriving useful state from the textual input felt pretty clean to me in Clojure.

14:43 I wasn't thinking too clearly about how to store that state, so that's where my solution starts to get messy.

14:43 And finally the "logic" part for deciding what commands to send is a tiny part of my final code base -- it is also rather dumb and yet still overly complicated.

14:44 I'm fascinated by the thought of replacing the logic part with some set of rules to be given to a rule engine.

14:44 rhickey: Right, that's the last piece for Clojure

14:45 Chouser: This paper is certainly helping me see why you'd choose to work on that next. I was rather surprised when you first mentioned it.

14:45 rhickey: I went with Agents/Refs and STM for the 'world' state, vs relaTIONAL

14:46 I don't know that the authors ever delivered a system that realized their vision

14:52 Chouser: I still feel like I need an efficient mechanism for backing a collection to disk, specifically a collection in a ref.

14:53 STM makes it pretty easy to correctly snapshot an in-memory collection to disk (for example with prn), and of course you can read in such a collection from disk at startup.

14:54 but it seems a shame to write out the whole collection when you, say, add an item to a set. And the whole thing breaks down if your data is too large for RAM.

14:59 rhickey: Chouser: you have to be careful about attaching disk IO to transactions

15:01 I'd like to figure out a way for the disk version of data structures to share structure like the in-memory versions - seems like a good fit for Terracotta though

Logging service provided by n01se.net