记一次hadoop集群中hdfs dfs命令不可用的问题及解决方案

Seven_20172021-6-15119

一、问题描述

由于目前hadoop数仓和传统数仓是采用hdfs hdfs -put(hadoop fs -put)方式入到hdfs路径,非java方法。执行put命令报错如下:

ERR>2021-06-11 09:34:30,000 WARN hdfs.DFSUtilClient: Namenode for hacluster remains unresolved for ID 608. Check your hdfs-site.xml file to ensure namenodes are configured properly.
ERR>2021-06-11 09:34:30,002 WARN hdfs.DFSUtilClient: Namenode for hacluster remains unresolved for ID 609. Check your hdfs-site.xml file to ensure namenodes are configured properly.
...
ERR>2021-06-11 09:38:39,284 INFO retry.RetryInvocationHandler: Invalid host name: local host is: (unknown); destination host is: "-":25000; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over -:25000 after 19 failover attempts. Trying to failover after sleeping for 10276ms.
ERR>2021-06-11 09:38:49,560 WARN ha.BlackListingFailoverProxyProvider: All proxies are added to blacklist: [-:25000, -:25000] ,hence clearing blackListing
ERR>test: Invalid host name: local host is: (unknown); destination host is: "-":25000; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
ERR>/*.sh: line 13: kill: (20551) - No such process

从报错信息可以获取的信息有

1.hdfs-site.xml没有配置对应的namenode
2.namenode的dns域名unknown
3.本机访问ip被列入黑名单blacklist

二、处理过程

1.检查/etc/hosts 文件有没有配置对应的DNS(无异常)
2.检查hdfs-site.xml有没有配置正确的namenode(无异常)
3.ping主节点的NameNode的IP(无异常)
4.telnet主节点的NameNode的IP 25000端口(无异常)
5.ping/telnet主节点NameNode的域名(IP对应的网址)--  异常

三、解决方案

一看ping主节点IP通,但是域名不通,就很奇怪...且部分主机出现这种情况

集群配置和黑名单都是正常报错,因为连不上就会列如黑名单,重新连上黑名单即失效,所以是个误导报错。
解决:原因为网络升级割接,影响部分DNS
于是屏蔽对应主机的DN服务器地址:/etc/resolve文件里的DNS服务器地址。
测试ping、telnet主节点域名,均可访问。至此问题解决。

最新回复 (2)
  • deTrident2021-6-15
    引用2
    Hadoop 2还是3版
  • 楼主Seven_20172021-6-16
    引用3
    deTrident 发表于 2021-6-15 23:59
    Hadoop 2还是3版

    3.1.*版本了
  • 游客
    4
返回