#clojure log - Apr 13 2008

The Joy of Clojure
Main Clojure site
Google Group
IRC
List of all logged dates

0:01 drewr: Golly, if I want to get the mtime of a file, Google tells me I need to use the Tomcat FileInfo class. Is there a better way?

0:02 Surely there's something in java.io.*.

0:06 abrooks: drewr: Java boldly refuses to acknowledge that there is an underlying platform. What you're looking for may be there but I suspect not.

0:08 jonathan__: hmmm, see this --> http://www.bmsi.com/java/posix/docs/posix.File.html

0:11 but it looks like java.io.File can get you the last modified --> http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html

0:11 abrooks: posix.* is not part of the Java distribution from anyone. :(

0:19 drewr: Heh, lastModified()... ugh.

0:19 jonathan__: === mtime ?

0:20 drewr: What about the other things you might need to know? inode, symlink, etc.?

0:22 abrooks: Java is its own platform. It's not a good platform for building system tools without third-party classes (JNI based).

0:23 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4042001 (symlink support)

0:23 jonathan__: yeah, I work in "enterprise" software, and we'd typically never need stuff like that ... sadly we use C++, if only we could use Java

0:23 abrooks: There are lots of RFIs for platform support.

0:24 drewr: This philosophy never made sense to me. So many problems have been solved by operating systems that you shouldn't have to re-solve. :-)

0:24 jonathan__: RFI?

0:24 abrooks: The GNU Classpath project is extending some base classes. It would be nice if they'd support posix-y-gnu-ish interfaces.

0:25 jonathan__: RFE, sorry. Request For Enhancement.

0:29 drewr: I'm looking at Clojure for migrating some data concurrently between SQL Server and Postgres with JDBC. *That* should be well-supported.

0:29 abrooks: That would be Java's domain. :)

0:29 jonathan__: Ok, I don't know about pg, but the jtds 1.2 driver works like a champ with sql server

0:30 and the pure java Oracle thin drivers rock also

0:30 drewr: jonathan__: Awesome, thanks.

0:31 I've used the thin driver for ORA before.

0:31 It did work well.

0:31 jonathan__: I tried and tried but *strangely*, the MS driver for SQL Server completely failed to connect

0:31 * drewr researches pg options

0:31 jonathan__: </sarcasm>

0:31 drewr: Hey, of course, http://jdbc.postgresql.org/.

0:34 Wonder what the best way of approaching this would be. Have agents bite off a chunk of rows and each work independently?

0:34 jonathan__: What are you trying to do?

0:35 drewr: We've got massive amounts of data that comes off our telecom platform, which only talks SQL Server.

0:36 In order to do manipulate it and report on it, we bring it over to PG.

0:36 The process for doing that is extremely slow.

0:37 I think that doing it concurrently will speed things up.

0:39 jonathan__: What's the fastest that pg will slurp in data? Can you generate a bulk insert file? Or are you using other methods?

0:39 (assuming pg supports stuff like that)

0:40 drewr: I've only tried DTS with SQL Server so far.

0:40 It's dog-slow.

0:40 Literally days to get a single dump.

0:41 That's why I'm going to write something that's more efficient, but if I do it sequentially I'm afraid I'll have the same problem.

0:41 ...doing 100 or 1000 rows at a time.

0:44 jonathan__: so you use DTS to generate data to a text file?

0:44 drewr: So my n�ive idea is to have a pointer to the current row that gets updated in a Clojure transaction every time an agent grabs his dataset.

0:44 jonathan__: No, it moves it straight into PG.

0:44 s/moves/copies/

0:50 jonathan__: Sounds like the overhead of using DTS/ODBC(?) may be the problem, rather than being sequential ... but obviously I could be totally wrong

0:50 heh

0:52 drewr: True, it could be. I need to profile it better to see where the bottleneck is.

0:56 jonathan__: Assuming round-tripping is the problem, I'd be looking to try and generate something that could be read by the copy command ... http://www.commandprompt.com/community/pgdocs8/sql-copy

0:59 Hopefully SQL server should be able to spit out CSV files at 10s of k rows a sec

1:00 versus 200 rows a sec which sounds like what you may be seeing

1:00 drewr: That's probaby the ballpark

1:00 I don't really want to generate intermediate data, but I may have to.

1:02 jonathan__: yeah, escaping text data can be a pain etc ...

1:03 which reminds me, does emit escape data yet ... *my* version does :)

1:16 Chouser: Extremely primitive log of this channel for the past couple months: http://n01se.net/chouser/clojure-log/

1:17 Let me know if you see any data errors. The format obviously needs improvement.

1:17 drewr: Chouser: Cool, thanks.

1:17 I'm off to bed. Thanks for the brainstorming guys.

1:18 jonathan__: cool, should there be a notice that the channel is archived? or is that pretty common for irc?

1:19 Chouser: jonathan__: I dunno. To suggest that anything said here is private seems a bit of a stretch though.

1:20 it's not automatically updated yet. Hopefully I can add that tomorrow.

1:20 rhickey already mentioned he liked the idea. I guess if people have objections I can take the pages back down.

1:21 Past my bedtime. Later!

1:21 jonathan__: cheers

20:14 Chouser: http://n01se.net/chouser/clojure-log/2008-04-13.html

20:15 that's the IRC log for the last couple months.

20:15 rhickey: cool

20:15 Chouser: I think it'd be most useful if we can get Google to index it.

20:16 rhickey: are you interested in hosting it at clojure.org, or should I let google have at it on my own domain?

20:17 rhickey: clojure.org maps to sf right now

20:17 Chouser: ok, that's fine. It's just html and js file, no cgis or servlets or anything.

20:17 or n01se.net/clojure_log is fine with me too, just thought I'd ask.

20:17 rhickey: I'd have to get some automated way to upload it regularly

20:18 Chouser: yeah, rsync over ssh would be preferred (that's how I'm getting it onto n01se), but ftp or whatever is fine too.

20:20 rhickey: Let me think about it - still catching up, was away this weekend

20:25 Chouser: np

20:25 and no rush either

Logging service provided by n01se.net