13:55 Chouser: Here's a fun challenge (I'll try to do it myself if I get some time later)...
13:56 Implement MapReduce (and of course the standard wordcount example) for SMP (shared memory, not a cluster) in Clojure.
13:57 I'm betting the MapReduce "framework" on top of agents is no more than 10 lines of code, and the wordcount example would be another 3.
13:57 rhickey: Chouser: link?
13:58 Chouser: to MapReduce?
13:59 rhickey: I've read the MapReduce papers, wondering if there was a benchmark thingy
13:59 or implementation in another language...
13:59 Chouser: oh, no. I just made up the challenge because a friend was being impressed by this: http://
14:11 albino: Is that close to the widefinder description?
15:16 Chouser: is there some reason pmap manages its own thread pool instead of using agents?
15:17 I'm not smart enough to know *how* it would use agents, I'm just assuming that's possible... somehow...
15:47 rhickey: Chouser: pmap may have predated agents
15:47 Chouser: ah, ok.
16:25 nsinghal: I have a question
16:25 http://
16:28 Chouser: your initial example works for me, replacing (new Socket) with (throw (new Exception))
16:32 nsinghal: Thats the question that it works that way - see the example at the end. Why with socket it doesnt work?
16:34 (import '(java.net Socket))
16:34 (defn connect [host port eatex]
16:34 (try
16:34 (new Socket host port)
16:34 (catch Throwable ex
16:34 (if eatex nil (throw ex)))))
16:34 (connect "127.0.0.1" 3432 true)
16:38 Chouser: oh, sorry, I missed that that was the point of the final example. I thought it had to do with the number of arguments.
16:54 well, I'm stumped.
16:55 rhickey: probably a bug with local clearing prior to tail call - looking at it now
17:18 nsinghal: fix is up
17:19 nsinghal: thx
17:31 MarkJP: how would I do a sleep in the first binding of a let...recur loop
17:31 don't ask
17:32 I could have a flag I guess
17:42 Chouser: inside the body of the loop, or up at the top?
17:48 MarkJP: I wanted to do it at the top for now I just added a new sleep flag to the let so the recur passes in a false
17:48 the let binds it to true the first time
17:49 Chouser: (loop [_ (Thread.sleep 3) x [:a :b :c]] (prn x) (when x (recur _ (rest x))))
17:49 (loop [_ (Thread.sleep 3) x [:a :b :c]] (prn x) (when x (recur _ (rest x))))
17:49 oops, sorry for the double-post
17:50 MarkJP: cool is that what _ does?
17:50 Chouser: ...and for the fact that it doesn't work.
17:50 oh, no, it does work. (Thread.sleep 3000) is a bit more noticable, though.
17:51 MarkJP: cool thanks
17:51 Chouser: No, _ is just a symbol. You could call it "blank" or "dummy" instead.
17:52 then again, I don't see how that's better than just (do (Thread.sleep 3000) (loop ...))
22:25 so here's my attempt at map-reduce: http://
22:26 Also my first real attempt at using agents.
22:31 I wonder if ref-toggle will be useful for solving other problems.
22:31 * rhickey is still trying to understand it
22:34 Chouser: ref-toggle, or the whole ugly mess?
22:35 I don't really know how to comment code like this.
22:35 rhickey: Chouser: isn't there a partitioning between map and reduce, that makes each reduce job work on related data?
22:38 I think the agents doing mapping and interacting with reducer is too coupled
22:39 Chouser: in map-reduce3?
22:39 rhickey: and 2
22:40 number of agents should be independent of number of docs
22:40 Chouser: cool. Help me understand why. The number of agents is already independent of the number of processors -- so what should determine the number of agents?
22:41 rhickey: make a cycle of agents, distribute docs to each round robin...
22:41 await them
22:41 concat results, send to partitioner
22:41 distribute partitioned results to agent with reduce function
22:41 Chouser: won't the thread pool cycle through the agents for me? Why should I write that logic again?
22:41 rhickey: await
22:41 merge results
22:42 more agents means more independent results to merge, should be a setting
22:44 Chouser: every hash produced by a map run has to be merged, regardless of whether the agent producing it has produced other hashes already.
22:44 rhickey: would you have an agent per word?
22:45 Chouser: No, because there's no work to be done for a single word. I have one agent per parallizable map run.
22:46 The drawback of my solution is what, too much contention for the final-hash?
22:47 rhickey: I don't think you are capturing the mapreduce model
22:48 Chouser: hm, it's been a while since I've read the paper. Perhaps I should look at that again.
22:48 rhickey: partition input, divide among compute resources, partition intermediate results, divide reduction among compute resources, merge final result
22:48 Chouser: I see, and it's the partitioning at each step that I'm punting on.
22:49 rhickey: having a settable number of agents allows you to model compute resources, independent of how Clojure distributes to processors
22:50 but it's likely numagents == numprocessors is optimal for this, since even in map they are doing combining (of counts)
22:53 (assoc hash word (inc (get (hash word) 0))) more idiomatic
22:53 oops: (assoc hash word (inc (get hash word 0)))
22:54 Chouser: I don't like inc. Another keyword to learn, when (+ 1 ...) is just as succinct (usually) and more obvious.
22:54 (get hash word 0) is nice.
22:54 rhickey: the first partition is mechanical, the second logical
22:55 inc may be faster
22:55 Chouser: hm.
22:59 rhickey: second partition can be made mechanical, use rem of hash of key to index reducing agents
22:59 Chouser: well, it was a lot of fun to write. I'll have to re-read the paper to understand the purpose of the partitioning.
23:00 rhickey: you need to end up with one result per key, so you need to send all of the data for a particular key to the same reducing agent
23:01 otherwise the final merger is doing reducing
23:01 Chouser: Aaaaa...
23:02 should the passed-in reducer then not be operating on two hashes, but on two values (ints in this case)?
23:03 the reducer should just be +
23:03 rhickey: if you look at it as data flows...
23:03 docs flow into maps...
23:04 key/val maps flow out of maps into partition...
23:04 key/val pairs flow from partition to reducers
23:05 maps flow out of reducers and get concatenated as final result
23:05 Chouser: I think that Qt API I posted mislead me. The reduce function in Google's paper takes a key and a seq of values. Qt's takes two hashes.
23:08 rhickey: key and seq of values means partition is doing more work, so batch is all together, makes sense for the distributed case
23:08 the main thing is to keep the stages/steps separate
23:10 Chouser: if the seq only had 2 items, that would match your "key/val pairs flow ... to reducers" description, right?
23:11 rhickey: let's say there were 3 mapping agents
23:12 the first found "the" 1 time, the second 2 times and the third 3 times
23:12 one partitioning model determines that the "the" reducer is reducer # 2
23:13 when it get the result from mapper one, it sends ["the" 1] to reducer #2
23:13 later when it gets the results from mapper two, it sends ["the" 2] to reducer #2
23:13 etc
23:14 another model has it gather all the results from the mappers and send ["the" 1 2 3] to reducer #2
23:14 foones: Hi!
23:14 I'm trying to understand Clojure's support for concurrency
23:14 I think I've understand what agents and refs are
23:14 Chouser: Ok. And that's when I think I should have one agent for each word until I read your Socratic question from 30 minues ago.
23:15 foones: but I'm not sure as to how I can execute two pieces of code in parallel
23:16 rhickey: foones: (send agent1 ajob) (send agent2 ajob) (await agent1 agent2)
23:17 Chouser: rhickey: bedtime for me. Thanks for walking through this with me, I really appreciate it.
23:17 foones: Aha, and if in one of the jobs I need to read input (e.g. user input)
23:17 rhickey: sure
23:17 foones: it won't wait for it to end?
23:18 rhickey: foones: for blocking jobs use send-off, and don't await for it. It can put its result in a ref when they arrive
23:19 foones: Ahh, I understand
23:19 Thanks for your help
23:24 gomer-nyc: hi, does anybody have experience with returning proxies from a macro?
23:25 or rather, *a* proxy from a macro?
23:35 Chouser: gomer-nyc: you missed rhickey by 5 minutes.
23:35 And I'm going to bed. :-/ Maybe try the google discussion group?
23:36 gomer-nyc: yeah, good idea; I've searched but not much around proxies & macros that I'm looking for. maybe I'll post a q there. good night :-)