istratorzhengxq /cygdrive/e/download/java/hadoop/hadoop-0.20.1/bin 成功后,会在 C:temphadoopdfsname 下生成两个目录和系列文件: ├—current │ edits │ fsimage │ fstime │ VERSION └—image fsimage3.2. 如何验证启动成功 可在 Cygwin 提示符中键入:./hadoop dsfadmin –report 如果得到如下报告,说明正常: “Datanodes available: 1 1 total 0 dead”说明:有 1 个好用。
正是我的配置情况。
4. 使用及开发4.1. 常用命令行使用方法 命令 说明 结果./hadoop dfs –mkdir zhengxq 创建名为 zhengxq 的目录./hadoop dfs –put .rar zhengxq 把当前目录下所有的 rar 文件上 传到 zhengxq 目录./hadoop dfs –put ../.gz zhengxq 把上一目录下所有的 gz 文件上 传到 zhengxq 目录./hadoop dfs –ls zhengxq 列出 zhengxq 目录下的文件 输出文件列表信息需要说明的是:存储的目录下其实并没有真正的 zhengxq 文件夹。
我通过 tree /F 输出如下:C:.├—dfs│ ├—data│ │ │ in_use.lock│ │ │ storage│ │ ││ │ ├—current│ │ │ blk_-7873866343985077599│ │ │ blk_-7873866343985077599_1001.meta│ │ │ blk_6677526│ │ │ blk_6677526_1002.meta│ │ │ dncp_block_verification.log.curr│ │ │ VERSION│ │ ││ │ ├—detach│ │ └—tmp│ ├—name│ │ │ in_use.lock│ │ ││ │ ├—current│ │ │ edits│ │ │ fsimage│ │ │ fstime│ │ │ VERSION│ │ ││ │ └—image│ │ fsimage│ ││ └—namesecondary│ │ in_use.lock│ ││ ├—current│ │ edits│ │ fsimage│ │ fstime│ │ VERSION│ ││ ├—image│ │ fsimage│ ││ └—previous.checkpoint│ edits │ fsimage │ fstime │ VERSION │ └—mapred4.2. 可以通过浏览器访问 输入:http://localhost:50070/dfshealth.
jsp,即可看到如下界面: 点击“Browse the filesystem”并一路点下去,可以看到我们上传的文件: 可以通过点击“Live Node”看到活动的 NameNode:5. FAQ5.1. 为什么启动不了? 可在HADOOP_HOME/logs 下查看日志。
发现错误:E:downloadjavahadoophadoop-0.20.1logshadoop-Administrator-namenode-zhengxq.log1894:2009-11-09 09:45:17250 INFO org.apache.hadoop.hdfs.server.common.Storage: Storagedirectory C:temphadoopdfsname does not exist. 奇怪:我没配置 C:temphadoopdfsname 这个目录啊,居然要,先按要求建立了再说。
结论:可行。
2009-11-09 09:49:52328 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem:FSNamesystem initialization failed.java.io.IOException: NameNode is not formatted. 这个错误说明需要格式化,键入:./hadoop namenode -format 在 hadoop-Administrator-jobtracker-zhengxq.log 中启动会出现:2009-11-09 10:00:46281 FATAL org.apache.hadoop.mapred.JobTracker:java.lang.RuntimeException: Not a host:port pair: local at org.apache.hadoop.net.NetUtils.createSocketAddrNetUtils.java:136 at org.apache.hadoop.net.NetUtils.createSocketAddrNetUtils.java:123 at org.apache.hadoop.mapred.JobTracker.getAddressJobTracker.java:1804 at org.apache.hadoop.mapred.JobTracker.JobTracker.java:1576 at org.apache.hadoop.mapred.JobTracker.startTrackerJobTracker.java:180 at org.apache.hadoop.mapred.JobTracker.startTrackerJobTracker.java:172 at org.apache.hadoop.mapred.JobTracker.mainJobTracker.java:3699 请参考“5.2 配置文件可能导致的问题”5.2. 配置文件可能导致的问题 曾经按照网上的一些文档配置了如下完整的 core-site.xml 而忽略了 mapred-site.xml,走了些弯路,具体可参考上面正确的配置步骤:fs.default.namehdfs://zhengxq:9000The name of the default file system. Either the literal string local or a host:port forDFS.mapred.job.trackerzhengxq:9001The host and port that the MapReduce job tracker runs at. If local then jobs arerun in-process as a single map andreduce task.hadoop.tmp.dirc:/temp/hadoopA base for other temporary directories.dfs.name.dirc:/temp/hadoop/nameDetermines where on the local filesystem the DFS name node should store the nametable. If this is a comma-delimited list of directories then the name table is replicated in all of thedirectories for redundancy.dfs.data.dirc:/temp/hadoop/dataDetermines where on the local filesystem an DFS data node should store its blocks.If this is a comma-delimited list of directories then data will be stored in all named directoriestypically on different devices. Directories that do not exist are ignored.dfs.replication1Default block replication. The actual number of replications can be specified whenthe file is created. The default isused if replication is not specified in create time.5.3. JobTracker 一直有问题?2009-11-09 10:15:24812 FATAL org.apache.hadoop.mapred.JobTracker:java.lang.IllegalArgumentException: Pathname /c:/temp/hadoop/mapred/system fromhdfs://zhengxq:9000/c:/temp/hadoop/mapred/system is not a valid DFS filename. atorg.apache.hadoop.hdfs.DistributedFileSystem.getPathNameDistributedFileSystem.java:158 at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusDistributedFileSystem.java:252 at org.apache.hadoop.mapred.JobTracker.JobTracker.java:1670 at org.apache.hadoop.mapred.JobTracker.startTrackerJobTracker.java:180 at org.apache.hadoop.mapred.JobTracker.startTrackerJobTracker.java:172 at org.apache.hadoop.mapred.JobTracker.mainJobTracker.java:3699 难道是 WINDOWS 的硬伤?一直没搞定。
5.4. 如何验证配置并启动成功? 可在 Cygwin 提示符中键入:./hadoop dfs –report 看到了如下信息: ./hadoop dfs -report09/11/09 10:42:37 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 0 times.09/11/09 10:42:39 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 1 times.09/11/09 10:42:41 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 2 times.09/11/09 10:42:42 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 3 times.09/11/09 10:42:44 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 4 times.09/11/09 10:42:46 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 5 times.09/11/09 10:42:48 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 6 times.09/11/09 10:42:50 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 7 times.09/11/09 10:42:52 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 8 times.09/11/09 10:42:54 INFO ipc.Client: Retrying connect to server: zhengxq/192.168.129.138:9000.Already tried 9 times.Bad connection to FS. command aborted. 说明 SSH 需要配置。
请参考“2.1 配置 SSH”。
6. 常用命令 命令 参数及说明./start-all.sh 启动 hadoop./stop-all.sh 停止 hadoop./hadoop dsfadmin –report 查看运行状况7. 参考 官方 QuickStart http://hadoop.apache.org/common/docs/current/quickstart.html 用 Hadoop 进行分布式并行编程 第 1 部分 http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/index.html Hadoop on windows with Eclipse http://ebiquity.umbc.edu/Tutorials/Hadoop/0520-20Setup20SSHD.html Hadoop 中的集群配置和使用技巧 http://www.infoq.com/cn/articles/hadoop-config-tip http://www.infoq.com/cn/articles/hadoop-intro http://www.infoq.com/cn/articles/hadoop-process-develop hadoop-0.20.1 部署手记 http://sery.blog.51cto.com/10037/214271 Hadoop FAQ http://wiki.apache.org/hadoop/FAQ
上一篇:
健康保险理赔系统的保单数据处理设计与实现
下一篇:
多年来只想说一句,我不怪你