This section lists various network related issues that may be encountered with service discovery, how to diagnose them, and how to resolve them.
A node requires that service discovery functions properly at least locally on the machine where the node is installed.
If another program has the service discovery UDP port opened exclusively, node installation will fail with a message similar to:
$ epadmin install node --nodename mynode.mycluster [mynode.mycluster] Installing node [mynode.mycluster] DEVELOPMENT executables [mynode.mycluster] File shared memory [mynode.mycluster] 4 concurrent allocation segments [mynode.mycluster] Host name fig.local [mynode.mycluster] Container tibco/sb [mynode.mycluster] Starting container services [mynode.mycluster] Loading node configuration [mynode.mycluster] Auditing node security Service discovery verification failed: Could not start the discovery service listener on port 54321, network error: SWSocket::initServer: Call to 'bind' failed: Address already in use [errno:98]. $
Resolution: either choose
another port for service discovery using the
--discoveryport
option (see the section called “install command”), or find and terminate the program
that is using the port.
If an invalid port number is specified for service discovery
using the --discoveryport
option (see the section called “install command”), node installation will fail with a
message similar to:
$ epadmin install node --nodename mynode.mycluster --discoveryport 1 [mynode.mycluster] Installing node [mynode.mycluster] DEVELOPMENT executables [mynode.mycluster] File shared memory [mynode.mycluster] 4 concurrent allocation segments [mynode.mycluster] Host name fig.local [mynode.mycluster] Container tibco/sb [mynode.mycluster] Starting container services [mynode.mycluster] Loading node configuration [mynode.mycluster] Auditing node security Service discovery verification failed: Could not start the discovery service listener on port 1, network error: SWSocket::initServer: Call to 'bind' failed: Permission denied [errno:13]. $
Resolution: choose another unused UDP port in the range of 1024 to 65535.
If the name of the node being installed (either by default or
specified using the --nodename
) is already in use
by another node using the same service discovery port, node
installation will failure with a message similar to:
$ epadmin install node --nodename mynode.mycluster [mynode.mycluster] Installing node install of node mynode.mycluster using discovery port 54321 failed: the service name is already in use by service address fig.local:35883 $
Resolution: either stop and remove the other node, or choose a different node name. See the section called “install command”
Service discovery uses UDP broadcast packets for making
discovery requests, and socket to socket UDP packets for responses. If
these packets are being filtered or dropped by the operating system,
or by routers in between the node and epadmin
,
discovery requests will not be seen by the discovery server running
within the node.
When UDP packets are being filtered or dropped on the machine
where the node is installed, epadmin install node
will fail with a message similar to:
$ epadmin install node --nodename mynode.mycluster [mynode.mycluster] Installing node [mynode.mycluster] DEVELOPMENT executables [mynode.mycluster] File shared memory [mynode.mycluster] 4 concurrent allocation segments [mynode.mycluster] Host name fig.local [mynode.mycluster] Container tibco/sb [mynode.mycluster] Starting container services [mynode.mycluster] Loading node configuration [mynode.mycluster] Auditing node security Service discovery verification failed: Service discovery did not find any results $
Resolution: ensure that UDP packets are not being filtered on the port being used by the discovery service.
During node installation local service discovery verification is done. Failures cause the installation to fail (see the section called “Node installation failures”).
Service discovery verification may also be done as a stand-alone
epadmin
command, with or without any nodes installed,
using the epadmin verify services
command. See the section called “verify command”.
By default, the epadmin verify services
command runs both a discovery server and a discovery client locally.
The client makes a request, and verifies that it receives the expected
response.
$ epadmin verify services Service discovery is functioning properly locally. $
The verification server may be run independently of the client. Run the server in one terminal:
$ epadmin verify services --mode server Service discovery server started. Interrupt to exit.
![]() | |
The server doesn't return until interrupted. |
In another terminal run a verification client:
$ epadmin verify services --mode client Service discovery is functioning properly locally. $
![]() | |
The verification client may be run multiple times using the same verification server. |
Start the server on one machine:
$ hostname fig.local $ epadmin verify services --mode server Service discovery server started. Interrupt to exit.
Run the client on another machine:
$ hostname fuyu.local $ epadmin verify services --mode client Service discovery is functioning properly locally. $
The --discoveryport
,
--discoveryhosts
, and
--discoverytimeout
global options are honored by
the epadmin verify services
command. See the section called “Global parameters”.
The --debug
global option also effects the
verify services command and is shown in the next section.
The --debug
global option enables the output
of debug tracing for the service discovery verification server and
client.
Start the verification server with the
--debug
global option:
$ epadmin --debug verify services --mode server 2018-05-23 15:02:19.159215|DSV|INFO |5214|discovery.cpp(288)|SWDiscovery::Discovery for service x.y.zz.y, type test-type, address test-address, started on port 54321 Service discovery server started. Interrupt to exit.
The trace indicates that the server has successfully started listening on the default discovery port, and contains the x.y.zz.y service.
Run the verification client with the --debug
global option:
$ epadmin --debug verify services --mode client 2018-05-23 15:07:41.095600|DSV|DEBUG|6225|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:3:10.240.6.255/6225/3/0:x.y.zz.y:test-type:: on 10.240.6.255:54321 2018-05-23 15:07:41.095656|DSV|DEBUG|6225|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:3:255.255.255.255/6225/4/0:x.y.zz.y:test-type:: on 255.255.255.255:54321 2018-05-23 15:07:41.095818|DSV|DEBUG|6225|client.cpp(674)|Client getResults matched response: PDU:A5:2:DiscoverServicesResponse:4:10.240.6.255/6225/3/0:x.y.zz.y:test-type:test-address: from 10.240.6.56:46180 Service discovery is functioning properly locally.
The trace shows the client sending two service discovery broadcast requests, looking for the service name (x.y.zz.y) and the service type (test-type). The first request is sent on the broadcast address for the current host name (in this case 10.240.6.255), and a second requests goes out on the localhost interface. (255.255.255.255).
2018-05-23 15:07:41.095600|DSV|DEBUG|6225|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:3:10.240.6.255/6225/3/0:x.y.zz.y:test-type:: on 10.240.6.255:54321 2018-05-23 15:07:41.095656|DSV|DEBUG|6225|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:3:255.255.255.255/6225/4/0:x.y.zz.y:test-type:: on 255.255.255.255:54321
The client trace then shows it having received a response from the verification server:
2018-05-23 15:07:41.095818|DSV|DEBUG|6225|client.cpp(674)|Client getResults matched response: PDU:A5:2:DiscoverServicesResponse:4:10.240.6.255/6225/3/0:x.y.zz.y:test-type:test-address: from 10.240.6.56:46180
The verification server terminal shows the server receiving the two requests and sending responses to each of them:
018-05-23 15:07:41.095708|DSV|DEBUG|6186|discovery.cpp(412)|Discovery test-address received: PDU:A5:2:DiscoverServicesRequest:3:10.240.6.255/6225/3/0:x.y.zz.y:test-type:: from 10.240.6.56:38912 2018-05-23 15:07:41.095730|DSV|DEBUG|6186|util.cpp(365)|Discovery sending response to 10.240.6.56:38912 : PDU:A5:2:DiscoverServicesResponse:4:10.240.6.255/6225/3/0:x.y.zz.y:test-type:test-address: 2018-05-23 15:07:41.095793|DSV|DEBUG|6186|discovery.cpp(412)|Discovery test-address received: PDU:A5:2:DiscoverServicesRequest:3:255.255.255.255/6225/4/0:x.y.zz.y:test-type:: from 10.240.6.56:47951 2018-05-23 15:07:41.095805|DSV|DEBUG|6186|util.cpp(365)|Discovery sending response to 10.240.6.56:47951 : PDU:A5:2:DiscoverServicesResponse:4:255.255.255.255/6225/4/0:x.y.zz.y:test-type:test-address:
In the example above, the client returned successfully after receiving the first matching response, because its request contained a fully qualified node name (see the Service Names section of the TIBCO StreamBase® Runtime Architects Guide).
When the service discovery request does not specify a fully
qualified node name, then the client will wait for the full discovery
timeout period (default: 1 second, see
--discoverytimeout
in the section called “Global parameters”). The client discards duplicate
responses, which is shown below, running a standard epadmin display
services command talking to the still running verification server from
above.
$ epadmin --debug display services --servicetype test-type 2018-05-23 18:30:09.331217|DSV|DEBUG|13417|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:2:10.240.6.255/13417/3/0::test-type:: on 10.240.6.255:54321 2018-05-23 18:30:09.331250|DSV|DEBUG|13417|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:2:255.255.255.255/13417/4/0::test-type:: on 255.255.255.255:54321 2018-05-23 18:30:09.331314|DSV|DEBUG|13417|client.cpp(674)|Client getResults matched response: PDU:A5:2:DiscoverServicesResponse:4:10.240.6.255/13417/3/0:x.y.zz.y:test-type:test-address: from 10.240.6.56:37287 2018-05-23 18:30:09.331375|DSV|DEBUG|13417|client.cpp(674)|Client getResults matched response: PDU:A5:2:DiscoverServicesResponse:4:255.255.255.255/13417/4/0:x.y.zz.y:test-type:test-address: from 10.240.6.56:46595 2018-05-23 18:30:09.331382|DSV|DEBUG|13417|results.cpp(300)|Discarding duplicate response from x.y.zz.y:test-type:test-address Service Name = x.y.zz.y Service Type = test-type Network Address = dtm://test-address $
In cases where UDP packet filtering is suspected, debug tracing can be used to determine if the discovery server is receiving the requests and responding, and if the client is receiving the responses.
Start a verification discovery server, with debug tracing enabled:
$ epadmin --debug verify services --mode server 2018-05-24 09:45:52.884984|DSV|INFO |8504|discovery.cpp(288)|SWDiscovery::Discovery for service x.y.zz.y, type test-type, address test-address, started on port 54321 Service discovery server started. Interrupt to exit.
In another terminal, on the same machine, or another machine if debugging cross-machine service discovery, run a verification discovery client, with debug tracing enabled. These traces show the client successfully sending two discovery requests but not receiving any responses:
$ epadmin --debug verify services --mode client 2018-05-24 09:55:01.533096|DSV|DEBUG|8521|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:3:10.240.6.255/8521/3/0:x.y.zz.y:test-type:: on 10.240.6.255:54321 2018-05-24 09:55:01.533141|DSV|DEBUG|8521|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:3:255.255.255.255/8521/4/0:x.y.zz.y:test-type:: on 255.255.255.255:54321 Service discovery verification failed: Service discovery did not find any results
Nothing was output in the discovery server terminal, showing that it didn't receive either of the requests.