SSH tests do not work on Ubuntu 16.04

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
KS-Soft
Posts: 12869
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

try without ampersand, should work

Code: Select all

nohup /path/to/binary -parameters

Regards
Alex
v.yesakov
Posts: 13
Joined: Wed Jun 29, 2016 12:13 pm

Post by v.yesakov »

KS-Soft wrote:try without ampersand, should work

Code: Select all

nohup /path/to/binary -parameters

Regards
Alex
This will not work. Removing ampersand would keep process running in foreground, this is not the very last command in our script, so this would effectively prevent script to complete the rest of checks.

V.
KS-Soft
Posts: 12869
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

What script?
If HostMonitor starts "nohup /path/to/binary -parameters" command then I do not understand where is the script.

Regards
Alex
v.yesakov
Posts: 13
Joined: Wed Jun 29, 2016 12:13 pm

Post by v.yesakov »

KS-Soft wrote:What script?
If HostMonitor starts "nohup /path/to/binary -parameters" command then I do not understand where is the script.

Regards
Alex
What's not working:
Background processes do not survive. Basically HM logs into system, checks if certain process is running and if not starts new instance using
The actual command is not just as simple as "nohup program &".

It's shell script designed with many pre- checks and post- checks for the "program". It checks if "program" is already running and within specified parameters and then makes a decision to start or not to start another instance of "program". If another instance needs to be started script checks if there was error starting "program", records various information like PID of this new instance and sends some other data to main database. After all is done this script returns status code to HM so operator can see if there was a problem or not.

V.
KS-Soft
Posts: 12869
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

May be you can split it into 2 scripts?
Make 2 tests, one will start "checking" script and wait for result, another dependent test will start 2nd script (start process).

Regards
Alex
v.yesakov
Posts: 13
Joined: Wed Jun 29, 2016 12:13 pm

Post by v.yesakov »

KS-Soft wrote:May be you can split it into 2 scripts?
Make 2 tests, one will start "checking" script and wait for result, another dependent test will start 2nd script (start process).

Regards
Alex
Assuming I split this single test into two, we have approximately 350 servers, each with 3 to 8 instances of this test managing different processes. It will take enormous amount of time to change them all.

V.
KS-Soft
Posts: 12869
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

So you have around 2000 tests related to this script (single script on single server)?
Then we have to find out what is wrong with this command
nohup /path/to/binary -parameters &

may be this version of "nohup" does not redirect output?
Then may be process "binary" terminates by SIGPIPE (when writing to a terminal which does not exist anymore because script terminated and session closed)?
Have you tried command like
nohup /path/to/binary -parameters >>logout.txt 2>>logerror.txt &

or may be better
setsid /path/to/binary -parameters >>logout.txt 2>>logerror.txt &

Regards
Alex
v.yesakov
Posts: 13
Joined: Wed Jun 29, 2016 12:13 pm

Post by v.yesakov »

KS-Soft wrote:So you have around 2000 tests related to this script (single script on single server)?
It's same script on multiple servers. around 1500 tests in total.
KS-Soft wrote:Then we have to find out what is wrong with this command
nohup /path/to/binary -parameters &

may be this version of "nohup" does not redirect output?
Then may be process "binary" terminates by SIGPIPE (when writing to a terminal which does not exist anymore because script terminated and session closed)?
Have you tried command like
nohup /path/to/binary -parameters >>logout.txt 2>>logerror.txt &

or may be better
setsid /path/to/binary -parameters >>logout.txt 2>>logerror.txt &

Regards
Alex
It's already redirected, same way you suggest. Including not just STDOUT and STDERR, but also STDIN:

/usr/bin/nohup /usr/local/bin/binary -parameters >> $HOME/logs/${instancel}.stdout 2>> $HOME/logs/${instancel}.stderr < /dev/null &

And remind you: it's working on Ubuntu 12.04 without any issues (without "SSHMode=1").

Maybe I should try SSHMode=1 on old servers to, to eliminate possible difference with "nohup" . I'll post results later today.

I'll try with "setsid" too.

V.
v.yesakov
Posts: 13
Joined: Wed Jun 29, 2016 12:13 pm

Post by v.yesakov »

v.yesakov wrote:Maybe I should try SSHMode=1 on old servers to, to eliminate possible difference with "nohup" . I'll post results later today.
SSHMode=1 does not work on "old" server. Process does not survive.
v.yesakov wrote:I'll try with "setsid" too.
"setsid" method works without "SSHMode=1", but does not work "new" way, same as "nohup"

I do not know how to attach files here, so I will post two SSH logs. Maybe it helps...

Both logs taken from Ubuntu 12.04 with the only difference in SSHMODE.

LOG1, "old" style, worked fine:

Code: Select all

debug1: Forked child 9889.
Set /proc/self/oom_score_adj to 0
debug1: rexec start in 5 out 5 newsock 5 pipe 7 sock 8
debug1: inetd sockets after dupping: 3, 3
Connection from IP.AD.RE.SS port 3042
debug1: Client protocol version 2.0; client software version KS_SSH
debug1: no match: KS_SSH
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1.10
debug1: permanently_set_uid: 105/65534 [preauth]
debug1: list_hostkey_types: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256 [preauth]
debug1: SSH2_MSG_KEXINIT sent [preauth]
debug1: SSH2_MSG_KEXINIT received [preauth]
debug1: kex: client->server aes256-cbc hmac-sha1 none [preauth]
debug1: kex: server->client aes256-cbc hmac-sha1 none [preauth]
debug1: expecting SSH2_MSG_KEXDH_INIT [preauth]
debug1: SSH2_MSG_NEWKEYS sent [preauth]
debug1: expecting SSH2_MSG_NEWKEYS [preauth]
debug1: SSH2_MSG_NEWKEYS received [preauth]
debug1: KEX done [preauth]
debug1: userauth-request for user alliance service ssh-connection method none [preauth]
debug1: attempt 0 failures 0 [preauth]
reverse mapping checking getaddrinfo for IP.AD.RE.SS.domain.tld [IP.AD.RE.SS] failed - POSSIBLE BREAK-IN ATTEMPT!
debug1: PAM: initializing for "alliance"
debug1: PAM: setting PAM_RHOST to "IP.AD.RE.SS"
debug1: PAM: setting PAM_TTY to "ssh"
debug1: userauth-request for user alliance service ssh-connection method password [preauth]
debug1: attempt 1 failures 0 [preauth]
debug1: PAM: password authentication accepted for alliance
debug1: do_pam_account: called
Accepted password for alliance from IP.AD.RE.SS port 3042 ssh2
debug1: monitor_read_log: child log fd closed
debug1: monitor_child_preauth: alliance has been authenticated by privileged process
debug1: PAM: establishing credentials
pam_unix(sshd:session): session opened for user alliance by (uid=0)
User child is on pid 10037
debug1: SELinux support disabled
debug1: PAM: establishing credentials
debug1: permanently_set_uid: 1000/1000
debug1: Entering interactive session for SSH2.
debug1: server_init_dispatch_20
debug1: server_input_channel_open: ctype session rchan 0 win 16384 max 16384
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 10038
debug1: session_exit_message: session 0 channel 0 pid 10038
debug1: session_exit_message: release channel 0
Received disconnect from IP.AD.RE.SS: 11: session close
debug1: do_cleanup
debug1: do_cleanup
debug1: PAM: cleanup
debug1: PAM: closing session
pam_unix(sshd:session): session closed for user alliance
debug1: PAM: deleting credentials
LOG2, SSHMode=1, did not work:

Code: Select all

debug1: Forked child 9735.
Set /proc/self/oom_score_adj to 0
debug1: rexec start in 5 out 5 newsock 5 pipe 7 sock 8
debug1: inetd sockets after dupping: 3, 3
Connection from IP.AD.RE.SS port 4150
debug1: Client protocol version 2.0; client software version KS_SSH1030
debug1: no match: KS_SSH1030
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1.10
debug1: permanently_set_uid: 105/65534 [preauth]
debug1: list_hostkey_types: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256 [preauth]
debug1: SSH2_MSG_KEXINIT sent [preauth]
debug1: SSH2_MSG_KEXINIT received [preauth]
debug1: kex: client->server blowfish-cbc hmac-sha1 none [preauth]
debug1: kex: server->client blowfish-cbc hmac-sha1 none [preauth]
debug1: expecting SSH2_MSG_KEXDH_INIT [preauth]
debug1: SSH2_MSG_NEWKEYS sent [preauth]
debug1: expecting SSH2_MSG_NEWKEYS [preauth]
debug1: SSH2_MSG_NEWKEYS received [preauth]
debug1: KEX done [preauth]
debug1: userauth-request for user alliance service ssh-connection method none [preauth]
debug1: attempt 0 failures 0 [preauth]
reverse mapping checking getaddrinfo for IP.AD.RE.SS.domain.tld [IP.AD.RE.SS] failed - POSSIBLE BREAK-IN ATTEMPT!
debug1: PAM: initializing for "alliance"
debug1: PAM: setting PAM_RHOST to "IP.AD.RE.SS"
debug1: PAM: setting PAM_TTY to "ssh"
debug1: userauth-request for user alliance service ssh-connection method password [preauth]
debug1: attempt 1 failures 0 [preauth]
debug1: PAM: password authentication accepted for alliance
debug1: do_pam_account: called
Accepted password for alliance from IP.AD.RE.SS port 4150 ssh2
debug1: monitor_read_log: child log fd closed
debug1: monitor_child_preauth: alliance has been authenticated by privileged process
debug1: PAM: establishing credentials
pam_unix(sshd:session): session opened for user alliance by (uid=0)
User child is on pid 9882
debug1: SELinux support disabled
debug1: PAM: establishing credentials
debug1: permanently_set_uid: 1000/1000
debug1: Entering interactive session for SSH2.
debug1: server_init_dispatch_20
debug1: server_input_channel_open: ctype session rchan 1 win 131072 max 32768
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_req: channel 0 request pty-req reply 0
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_new: session 0
debug1: SELinux support disabled
debug1: session_pty_req: session 0 alloc /dev/pts/1
debug1: server_input_channel_req: channel 0 request exec reply 0
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug1: Setting controlling tty using TIOCSCTTY.
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 9883
debug1: session_exit_message: session 0 channel 0 pid 9883
debug1: session_exit_message: release channel 0
debug1: session_by_tty: session 0 tty /dev/pts/1
debug1: session_pty_cleanup: session 0 release /dev/pts/1
debug1: server_input_channel_req: channel 0 request exit-status reply 0
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exit-status
debug1: session_by_channel: session 0 channel 0
debug1: session_close_by_channel: channel 0 child 0
debug1: session_close: session 0 pid 0
debug1: channel 0: free: server-session, nchannels 1
Connection closed by IP.AD.RE.SS
debug1: do_cleanup
Transferred: sent 2112, received 1104 bytes
Closing connection to IP.AD.RE.SS port 4150
debug1: PAM: cleanup
debug1: PAM: closing session
pam_unix(sshd:session): session closed for user alliance
debug1: PAM: deleting credentials
KS-Soft
Posts: 12869
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

And remind you: it's working on Ubuntu 12.04 without any issues (without "SSHMode=1").
I assume there are a lot of changes between Ubuntu 12.04 and 16.04 but this question should be addressed to Ubuntu developers.
Also I think you have different versions of ssh servers on these systems.

We made many changes in HostMonitor code as well, but we do not see how these changes may effect nohup program.
HostMonitor establishes SSH session - this part works fine.
Then it sends 'exec' command with parameter you have specified for the test - this part works fine as well, right? SSH server starts your script.

Then your script does not work as you expect. Why? We have no idea.
HostMonitor does not execute your script, Ubuntu does.

However we were able to reproduce the problem and I think we found solution - add sleep command
e.g. the following script does not work correctly (when ssh session closes, ping process terminates)

Code: Select all

nohup ping -c 50 8.8.8.8 >>/home/test/log/pinglog.txt &
while next script works fine
(sleep for 2 sec allows ping to work in backround for 50 sec, process works fine when ssh session closed)

Code: Select all

nohup ping -c 50 8.8.8.8 >>/home/test/log/pinglog.txt &
sleep 2s
Regards
Alex
v.yesakov
Posts: 13
Joined: Wed Jun 29, 2016 12:13 pm

Post by v.yesakov »

KS-Soft wrote:
And remind you: it's working on Ubuntu 12.04 without any issues (without "SSHMode=1").
I assume there are a lot of changes between Ubuntu 12.04 and 16.04
Also I think you have different versions of ssh servers on these systems.
Two debug logs from SSHD in my post above were taken from the very same server, running Ubuntu 12.04.5 x64. Not two different machines. The only difference is "SSHMode=1" added to RMA config in between.
KS-Soft wrote:We made many changes in HostMonitor code as well, but we do not see how these changes may effect nohup program.
This is the puzzle we have to solve together somehow, because it affects old servers as well as new ones.
KS-Soft wrote:However we were able to reproduce the problem and I think we found solution - add sleep
It will do as temporary workaround, but it's definitely not a solution.
KS-Soft wrote:while next script works fine (sleep for 2 sec allows ping to work in backround for 50 sec, process works fine when ssh session closed)

Code: Select all

nohup ping -c 50 8.8.8.8 >>/home/test/log/pinglog.txt &
sleep 2s
Yes, I confirm: adding "sleep" at the end of script helps. Process survives after HM closes SSH session.

V.
v.yesakov
Posts: 13
Joined: Wed Jun 29, 2016 12:13 pm

Post by v.yesakov »

v.yesakov wrote:Yes, I confirm: adding "sleep" at the end of script helps. Process survives after HM closes SSH session.
Anyway I think SSH must be fixed on HostMonitor site, because new SSH mode breaks backwards compatibility.

V.
KS-Soft
Posts: 12869
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

We have this task in the list but we cannot consider it as high priority task.
We spent a week and we still do not see what exactly should be changed, will return to this task later...

Regards
Alex
Post Reply