1. 首页
  2. IT资讯

h2o-sparkling 使用

环境: CentOS 6.2 h2o-sparking 是h2o与spark结合的产物,用于机器学习这一方面,它可在spark环境中使用h2o拥有的机器学习包。 安装如下 : git clone https://github.com/0xdata/h2o-sparkling.git cd h2o-sparking sbt assembly 运行测试: [cloudera@localhost h2o-sparkling]$ sbt -mem 500 “run –local” [info] Loading project definition from /home/cloudera/h2o-sparkling/project [info] Set current project to h2o-sparkling-demo (in build file:/home/cloudera/h2o-sparkling/) [info] Running water.sparkling.demo.SparklingDemo –local 03:41:11.030 main INFO WATER: —– H2O started —– 03:41:11.046 main INFO WATER: Build git branch: (unknown) 03:41:11.047 main INFO WATER: Build git hash: (unknown) 03:41:11.047 main INFO WATER: Build git describe: (unknown) 03:41:11.047 main INFO WATER: Build project version: (unknown) 03:41:11.047 main INFO WATER: Built by: ‘(unknown)’ 03:41:11.047 main INFO WATER: Built on: ‘(unknown)’ 03:41:11.048 main INFO WATER: Java availableProcessors: 1 03:41:11.077 main INFO WATER: Java heap totalMemory: 3.87 gb 03:41:11.077 main INFO WATER: Java heap maxMemory: 3.87 gb 03:41:11.078 main INFO WATER: Java version: Java 1.6.0_31 (from Sun Microsystems Inc.) 03:41:11.078 main INFO WATER: OS version: Linux 2.6.32-220.23.1.el6.x86_64 (amd64) 03:41:11.381 main INFO WATER: Machine physical memory: 4.83 gb 03:41:11.393 main INFO WATER: ICE root: ‘/tmp/h2o-cloudera’ 03:41:11.438 main INFO WATER: Possible IP Address: eth1 (eth1), 192.168.56.101 03:41:11.439 main INFO WATER: Possible IP Address: eth0 (eth0), 10.0.2.15 03:41:11.439 main INFO WATER: Possible IP Address: lo (lo), 127.0.0.1 03:41:11.669 main WARN WATER: Multiple local IPs detected: + /192.168.56.101 /10.0.2.15 + Attempting to determine correct address… + Using /10.0.2.15 03:41:11.929 main INFO WATER: Internal communication uses port: 54322 + Listening for HTTP and REST traffic on http://10.0.2.15:54321/ 03:41:12.912 main INFO WATER: H2O cloud name: ‘cloudera’ 03:41:12.913 main INFO WATER: (v(unknown)) ‘cloudera’ on /10.0.2.15:54321, discovery address /230.63.2.255:58943 03:41:12.913 main INFO WATER: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555): + 1. Open a terminal and run ‘ssh -L 55555:localhost:54321 cloudera@10.0.2.15’ + 2. Point your browser to http://localhost:55555 03:41:12.954 main INFO WATER: Cloud of size 1 formed [/10.0.2.15:54321 (00:00:00.000)] 03:41:12.954 main INFO WATER: Log dir: ‘/tmp/h2o-cloudera/h2ologs’ prostate 03:41:20.369 main INFO WATER: Running demo with following configuration: DemoConf(prostate,true,RDDExtractor@file,true) 03:41:20.409 main INFO WATER: Demo configuration: DemoConf(prostate,true,RDDExtractor@file,true) 03:41:21.830 main INFO WATER: Data : data/prostate.csv 03:41:21.831 main INFO WATER: Table: prostate_table 03:41:21.831 main INFO WATER: Query: SELECT * FROM prostate_table WHERE capsule=1 03:41:21.831 main INFO WATER: Spark: LOCAL 03:41:21.901 main INFO WATER: Creating LOCAL Spark context. 03:41:34.616 main INFO WATER: RDD result has: 153 rows 03:41:34.752 main INFO WATER: Going to write RDD into /tmp/rdd_null_6.csv 03:41:36.099 FJ-0-1 INFO WATER: Parse result for rdd_data_6 (153 rows): 03:41:36.136 FJ-0-1 INFO WATER: C1: numeric min(6.000000) max(378.000000) 03:41:36.140 FJ-0-1 INFO WATER: C2: numeric min(1.000000) max(1.000000) constant 03:41:36.146 FJ-0-1 INFO WATER: C3: numeric min(47.000000) max(79.000000) 03:41:36.152 FJ-0-1 INFO WATER: C4: numeric min(0.000000) max(2.000000) 03:41:36.158 FJ-0-1 INFO WATER: C5: numeric min(1.000000) max(4.000000) 03:41:36.161 FJ-0-1 INFO WATER: C6: numeric min(1.000000) max(2.000000) 03:41:36.165 FJ-0-1 INFO WATER: C7: numeric min(1.400000) max(139.700000) 03:41:36.169 FJ-0-1 INFO WATER: C8: numeric min(0.000000) max(73.400000) 03:41:36.176 FJ-0-1 INFO WATER: C9: numeric min(5.000000) max(9.000000) 03:41:37.457 main INFO WATER: Extracted frame from Spark: 03:41:37.474 main INFO WATER: {id,capsule,age,race,dpros,dcaps,psa,vol,gleason}, 2.8 KB + Chunk starts: {0,83,} + Rows: 153 03:41:37.482 #ti-UDP-R INFO WATER: Orderly shutdown command from /10.0.2.15:54321 [success] Total time: 44 s, completed Aug 4, 2014 3:41:37 AM

本地集群运行: [cloudera@localhost h2o-sparkling]$ sbt -mem 100 “run –remote” [info] Loading project definition from /home/cloudera/h2o-sparkling/project [info] Set current project to h2o-sparkling-demo (in build file:/home/cloudera/h2o-sparkling/) [info] Running water.sparkling.demo.SparklingDemo –remote 03:25:42.306 main INFO WATER: —– H2O started —– 03:25:42.309 main INFO WATER: Build git branch: (unknown) 03:25:42.309 main INFO WATER: Build git hash: (unknown) 03:25:42.309 main INFO WATER: Build git describe: (unknown) 03:25:42.309 main INFO WATER: Build project version: (unknown) 03:25:42.309 main INFO WATER: Built by: ‘(unknown)’ 03:25:42.309 main INFO WATER: Built on: ‘(unknown)’ 03:25:42.310 main INFO WATER: Java availableProcessors: 4 03:25:42.316 main INFO WATER: Java heap totalMemory: 3.83 gb 03:25:42.316 main INFO WATER: Java heap maxMemory: 3.83 gb 03:25:42.316 main INFO WATER: Java version: Java 1.6.0_31 (from Sun Microsystems Inc.) 03:25:42.317 main INFO WATER: OS version: Linux 2.6.32-220.23.1.el6.x86_64 (amd64) 03:25:42.383 main INFO WATER: Machine physical memory: 4.95 gb 03:25:42.384 main INFO WATER: ICE root: ‘/tmp/h2o-cloudera’ 03:25:42.389 main INFO WATER: Possible IP Address: eth1 (eth1), 192.168.56.101 03:25:42.389 main INFO WATER: Possible IP Address: eth0 (eth0), 10.0.2.15 03:25:42.389 main INFO WATER: Possible IP Address: lo (lo), 127.0.0.1 03:25:42.587 main WARN WATER: Multiple local IPs detected: + /192.168.56.101 /10.0.2.15 + Attempting to determine correct address… + Using /10.0.2.15 03:25:42.650 main INFO WATER: Internal communication uses port: 54322 + Listening for HTTP and REST traffic on http://10.0.2.15:54321/ 03:25:43.906 main INFO WATER: H2O cloud name: ‘cloudera’ 03:25:43.906 main INFO WATER: (v(unknown)) ‘cloudera’ on /10.0.2.15:54321, discovery address /230.63.2.255:58943 03:25:43.907 main INFO WATER: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555): + 1. Open a terminal and run ‘ssh -L 55555:localhost:54321 cloudera@10.0.2.15’ + 2. Point your browser to http://localhost:55555 03:25:43.920 main INFO WATER: Cloud of size 1 formed [/10.0.2.15:54321 (00:00:00.000)] 03:25:43.921 main INFO WATER: Log dir: ‘/tmp/h2o-cloudera/h2ologs’ prostate 03:25:46.985 main INFO WATER: Running demo with following configuration: DemoConf(prostate,false,RDDExtractor@file,true) 03:25:46.991 main INFO WATER: Demo configuration: DemoConf(prostate,false,RDDExtractor@file,true) 03:25:48.000 main INFO WATER: Data : data/prostate.csv 03:25:48.000 main INFO WATER: Table: prostate_table 03:25:48.000 main INFO WATER: Query: SELECT * FROM prostate_table WHERE capsule=1 03:25:48.001 main INFO WATER: Spark: REMOTE 03:25:48.024 main INFO WATER: Creating REMOTE (spark://localhost:7077) Spark context. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:1 failed 4 times, most recent failure: TID 7 on host 192.168.56.101 failed for unknown reason Driver stacktrace: 03:26:07.151 main INFO WATER: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) 03:26:07.151 main INFO WATER: at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) 03:26:07.151 main INFO WATER: at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) 03:26:07.152 main INFO WATER: at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 03:26:07.152 main INFO WATER: at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 03:26:07.152 main INFO WATER: at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) 03:26:07.152 main INFO WATER: at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) 03:26:07.152 main INFO WATER: at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) 03:26:07.153 main INFO WATER: at scala.Option.foreach(Option.scala:236) 03:26:07.153 main INFO WATER: at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) 03:26:07.153 main INFO WATER: at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207) 03:26:07.153 main INFO WATER: at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 03:26:07.155 main INFO WATER: at akka.actor.ActorCell.invoke(ActorCell.scala:456) 03:26:07.155 main INFO WATER: at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) 03:26:07.156 main INFO WATER: at akka.dispatch.Mailbox.run(Mailbox.scala:219) 03:26:07.156 main INFO WATER: at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) 03:26:07.157 main INFO WATER: at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 03:26:07.158 main INFO WATER: at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 03:26:07.158 main INFO WATER: at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 03:26:07.162 main INFO WATER: at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 03:26:07.172 #ti-UDP-R INFO WATER: Orderly shutdown command from /10.0.2.15:54321 [success] Total time: 27 s, completed Aug 4, 2014 3:26:07 PM

运行失败, 目前还无法定位问题所在。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16582684/viewspace-1244889/,如需转载,请注明出处,否则将追究法律责任。

主题测试文章,只做测试使用。发布者:布吉卡,转转请注明出处:http://www.cxybcw.com/193287.html

联系我们

13687733322

在线咨询:点击这里给我发消息

邮件:1877088071@qq.com

工作时间:周一至周五,9:30-18:30,节假日休息

QR code