KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

SSH tests do not work on Ubuntu 16.04
Goto page Previous  1, 2
 
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting
View previous topic :: View next topic  
Author Message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Fri May 05, 2017 12:28 pm    Post subject: Reply with quote

try without ampersand, should work
Code:
nohup /path/to/binary -parameters

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
v.yesakov



Joined: 29 Jun 2016
Posts: 13

PostPosted: Fri May 05, 2017 2:11 pm    Post subject: Reply with quote

KS-Soft wrote:
try without ampersand, should work
Code:
nohup /path/to/binary -parameters

Regards
Alex


This will not work. Removing ampersand would keep process running in foreground, this is not the very last command in our script, so this would effectively prevent script to complete the rest of checks.

V.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Fri May 05, 2017 2:37 pm    Post subject: Reply with quote

What script?
If HostMonitor starts "nohup /path/to/binary -parameters" command then I do not understand where is the script.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
v.yesakov



Joined: 29 Jun 2016
Posts: 13

PostPosted: Fri May 05, 2017 3:25 pm    Post subject: Reply with quote

KS-Soft wrote:
What script?
If HostMonitor starts "nohup /path/to/binary -parameters" command then I do not understand where is the script.

Regards
Alex


Quote:
What's not working:
Background processes do not survive. Basically HM logs into system, checks if certain process is running and if not starts new instance using


The actual command is not just as simple as "nohup program &".

It's shell script designed with many pre- checks and post- checks for the "program". It checks if "program" is already running and within specified parameters and then makes a decision to start or not to start another instance of "program". If another instance needs to be started script checks if there was error starting "program", records various information like PID of this new instance and sends some other data to main database. After all is done this script returns status code to HM so operator can see if there was a problem or not.

V.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Fri May 05, 2017 4:17 pm    Post subject: Reply with quote

May be you can split it into 2 scripts?
Make 2 tests, one will start "checking" script and wait for result, another dependent test will start 2nd script (start process).

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
v.yesakov



Joined: 29 Jun 2016
Posts: 13

PostPosted: Mon May 08, 2017 9:03 am    Post subject: Reply with quote

KS-Soft wrote:
May be you can split it into 2 scripts?
Make 2 tests, one will start "checking" script and wait for result, another dependent test will start 2nd script (start process).

Regards
Alex


Assuming I split this single test into two, we have approximately 350 servers, each with 3 to 8 instances of this test managing different processes. It will take enormous amount of time to change them all.

V.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Mon May 08, 2017 4:38 pm    Post subject: Reply with quote

So you have around 2000 tests related to this script (single script on single server)?
Then we have to find out what is wrong with this command
nohup /path/to/binary -parameters &

may be this version of "nohup" does not redirect output?
Then may be process "binary" terminates by SIGPIPE (when writing to a terminal which does not exist anymore because script terminated and session closed)?
Have you tried command like
nohup /path/to/binary -parameters >>logout.txt 2>>logerror.txt &

or may be better
setsid /path/to/binary -parameters >>logout.txt 2>>logerror.txt &

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
v.yesakov



Joined: 29 Jun 2016
Posts: 13

PostPosted: Tue May 09, 2017 7:48 am    Post subject: Reply with quote

KS-Soft wrote:
So you have around 2000 tests related to this script (single script on single server)?


It's same script on multiple servers. around 1500 tests in total.

KS-Soft wrote:
Then we have to find out what is wrong with this command
nohup /path/to/binary -parameters &

may be this version of "nohup" does not redirect output?
Then may be process "binary" terminates by SIGPIPE (when writing to a terminal which does not exist anymore because script terminated and session closed)?
Have you tried command like
nohup /path/to/binary -parameters >>logout.txt 2>>logerror.txt &

or may be better
setsid /path/to/binary -parameters >>logout.txt 2>>logerror.txt &

Regards
Alex


It's already redirected, same way you suggest. Including not just STDOUT and STDERR, but also STDIN:

/usr/bin/nohup /usr/local/bin/binary -parameters >> $HOME/logs/${instancel}.stdout 2>> $HOME/logs/${instancel}.stderr < /dev/null &

And remind you: it's working on Ubuntu 12.04 without any issues (without "SSHMode=1").

Maybe I should try SSHMode=1 on old servers to, to eliminate possible difference with "nohup" . I'll post results later today.

I'll try with "setsid" too.

V.
Back to top
View user's profile Send private message
v.yesakov



Joined: 29 Jun 2016
Posts: 13

PostPosted: Tue May 09, 2017 8:15 am    Post subject: Reply with quote

v.yesakov wrote:
Maybe I should try SSHMode=1 on old servers to, to eliminate possible difference with "nohup" . I'll post results later today.


SSHMode=1 does not work on "old" server. Process does not survive.

v.yesakov wrote:
I'll try with "setsid" too.


"setsid" method works without "SSHMode=1", but does not work "new" way, same as "nohup"

I do not know how to attach files here, so I will post two SSH logs. Maybe it helps...

Both logs taken from Ubuntu 12.04 with the only difference in SSHMODE.

LOG1, "old" style, worked fine:
Code:

debug1: Forked child 9889.
Set /proc/self/oom_score_adj to 0
debug1: rexec start in 5 out 5 newsock 5 pipe 7 sock 8
debug1: inetd sockets after dupping: 3, 3
Connection from IP.AD.RE.SS port 3042
debug1: Client protocol version 2.0; client software version KS_SSH
debug1: no match: KS_SSH
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1.10
debug1: permanently_set_uid: 105/65534 [preauth]
debug1: list_hostkey_types: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256 [preauth]
debug1: SSH2_MSG_KEXINIT sent [preauth]
debug1: SSH2_MSG_KEXINIT received [preauth]
debug1: kex: client->server aes256-cbc hmac-sha1 none [preauth]
debug1: kex: server->client aes256-cbc hmac-sha1 none [preauth]
debug1: expecting SSH2_MSG_KEXDH_INIT [preauth]
debug1: SSH2_MSG_NEWKEYS sent [preauth]
debug1: expecting SSH2_MSG_NEWKEYS [preauth]
debug1: SSH2_MSG_NEWKEYS received [preauth]
debug1: KEX done [preauth]
debug1: userauth-request for user alliance service ssh-connection method none [preauth]
debug1: attempt 0 failures 0 [preauth]
reverse mapping checking getaddrinfo for IP.AD.RE.SS.domain.tld [IP.AD.RE.SS] failed - POSSIBLE BREAK-IN ATTEMPT!
debug1: PAM: initializing for "alliance"
debug1: PAM: setting PAM_RHOST to "IP.AD.RE.SS"
debug1: PAM: setting PAM_TTY to "ssh"
debug1: userauth-request for user alliance service ssh-connection method password [preauth]
debug1: attempt 1 failures 0 [preauth]
debug1: PAM: password authentication accepted for alliance
debug1: do_pam_account: called
Accepted password for alliance from IP.AD.RE.SS port 3042 ssh2
debug1: monitor_read_log: child log fd closed
debug1: monitor_child_preauth: alliance has been authenticated by privileged process
debug1: PAM: establishing credentials
pam_unix(sshd:session): session opened for user alliance by (uid=0)
User child is on pid 10037
debug1: SELinux support disabled
debug1: PAM: establishing credentials
debug1: permanently_set_uid: 1000/1000
debug1: Entering interactive session for SSH2.
debug1: server_init_dispatch_20
debug1: server_input_channel_open: ctype session rchan 0 win 16384 max 16384
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 10038
debug1: session_exit_message: session 0 channel 0 pid 10038
debug1: session_exit_message: release channel 0
Received disconnect from IP.AD.RE.SS: 11: session close
debug1: do_cleanup
debug1: do_cleanup
debug1: PAM: cleanup
debug1: PAM: closing session
pam_unix(sshd:session): session closed for user alliance
debug1: PAM: deleting credentials


LOG2, SSHMode=1, did not work:
Code:

debug1: Forked child 9735.
Set /proc/self/oom_score_adj to 0
debug1: rexec start in 5 out 5 newsock 5 pipe 7 sock 8
debug1: inetd sockets after dupping: 3, 3
Connection from IP.AD.RE.SS port 4150
debug1: Client protocol version 2.0; client software version KS_SSH1030
debug1: no match: KS_SSH1030
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1.10
debug1: permanently_set_uid: 105/65534 [preauth]
debug1: list_hostkey_types: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256 [preauth]
debug1: SSH2_MSG_KEXINIT sent [preauth]
debug1: SSH2_MSG_KEXINIT received [preauth]
debug1: kex: client->server blowfish-cbc hmac-sha1 none [preauth]
debug1: kex: server->client blowfish-cbc hmac-sha1 none [preauth]
debug1: expecting SSH2_MSG_KEXDH_INIT [preauth]
debug1: SSH2_MSG_NEWKEYS sent [preauth]
debug1: expecting SSH2_MSG_NEWKEYS [preauth]
debug1: SSH2_MSG_NEWKEYS received [preauth]
debug1: KEX done [preauth]
debug1: userauth-request for user alliance service ssh-connection method none [preauth]
debug1: attempt 0 failures 0 [preauth]
reverse mapping checking getaddrinfo for IP.AD.RE.SS.domain.tld [IP.AD.RE.SS] failed - POSSIBLE BREAK-IN ATTEMPT!
debug1: PAM: initializing for "alliance"
debug1: PAM: setting PAM_RHOST to "IP.AD.RE.SS"
debug1: PAM: setting PAM_TTY to "ssh"
debug1: userauth-request for user alliance service ssh-connection method password [preauth]
debug1: attempt 1 failures 0 [preauth]
debug1: PAM: password authentication accepted for alliance
debug1: do_pam_account: called
Accepted password for alliance from IP.AD.RE.SS port 4150 ssh2
debug1: monitor_read_log: child log fd closed
debug1: monitor_child_preauth: alliance has been authenticated by privileged process
debug1: PAM: establishing credentials
pam_unix(sshd:session): session opened for user alliance by (uid=0)
User child is on pid 9882
debug1: SELinux support disabled
debug1: PAM: establishing credentials
debug1: permanently_set_uid: 1000/1000
debug1: Entering interactive session for SSH2.
debug1: server_init_dispatch_20
debug1: server_input_channel_open: ctype session rchan 1 win 131072 max 32768
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_req: channel 0 request pty-req reply 0
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_new: session 0
debug1: SELinux support disabled
debug1: session_pty_req: session 0 alloc /dev/pts/1
debug1: server_input_channel_req: channel 0 request exec reply 0
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug1: Setting controlling tty using TIOCSCTTY.
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 9883
debug1: session_exit_message: session 0 channel 0 pid 9883
debug1: session_exit_message: release channel 0
debug1: session_by_tty: session 0 tty /dev/pts/1
debug1: session_pty_cleanup: session 0 release /dev/pts/1
debug1: server_input_channel_req: channel 0 request exit-status reply 0
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exit-status
debug1: session_by_channel: session 0 channel 0
debug1: session_close_by_channel: channel 0 child 0
debug1: session_close: session 0 pid 0
debug1: channel 0: free: server-session, nchannels 1
Connection closed by IP.AD.RE.SS
debug1: do_cleanup
Transferred: sent 2112, received 1104 bytes
Closing connection to IP.AD.RE.SS port 4150
debug1: PAM: cleanup
debug1: PAM: closing session
pam_unix(sshd:session): session closed for user alliance
debug1: PAM: deleting credentials
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Tue May 09, 2017 8:43 am    Post subject: Reply with quote

Quote:
And remind you: it's working on Ubuntu 12.04 without any issues (without "SSHMode=1").

I assume there are a lot of changes between Ubuntu 12.04 and 16.04 but this question should be addressed to Ubuntu developers.
Also I think you have different versions of ssh servers on these systems.

We made many changes in HostMonitor code as well, but we do not see how these changes may effect nohup program.
HostMonitor establishes SSH session - this part works fine.
Then it sends 'exec' command with parameter you have specified for the test - this part works fine as well, right? SSH server starts your script.

Then your script does not work as you expect. Why? We have no idea.
HostMonitor does not execute your script, Ubuntu does.

However we were able to reproduce the problem and I think we found solution - add sleep command
e.g. the following script does not work correctly (when ssh session closes, ping process terminates)
Code:
nohup ping -c 50 8.8.8.8 >>/home/test/log/pinglog.txt &

while next script works fine
(sleep for 2 sec allows ping to work in backround for 50 sec, process works fine when ssh session closed)
Code:
nohup ping -c 50 8.8.8.8 >>/home/test/log/pinglog.txt &
sleep 2s


Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
v.yesakov



Joined: 29 Jun 2016
Posts: 13

PostPosted: Tue May 09, 2017 3:00 pm    Post subject: Reply with quote

KS-Soft wrote:
Quote:
And remind you: it's working on Ubuntu 12.04 without any issues (without "SSHMode=1").


I assume there are a lot of changes between Ubuntu 12.04 and 16.04
Also I think you have different versions of ssh servers on these systems.


Two debug logs from SSHD in my post above were taken from the very same server, running Ubuntu 12.04.5 x64. Not two different machines. The only difference is "SSHMode=1" added to RMA config in between.

KS-Soft wrote:
We made many changes in HostMonitor code as well, but we do not see how these changes may effect nohup program.


This is the puzzle we have to solve together somehow, because it affects old servers as well as new ones.

KS-Soft wrote:
However we were able to reproduce the problem and I think we found solution - add sleep


It will do as temporary workaround, but it's definitely not a solution.

KS-Soft wrote:
while next script works fine (sleep for 2 sec allows ping to work in backround for 50 sec, process works fine when ssh session closed)
Code:
nohup ping -c 50 8.8.8.8 >>/home/test/log/pinglog.txt &
sleep 2s


Yes, I confirm: adding "sleep" at the end of script helps. Process survives after HM closes SSH session.

V.
Back to top
View user's profile Send private message
v.yesakov



Joined: 29 Jun 2016
Posts: 13

PostPosted: Mon May 15, 2017 3:04 pm    Post subject: Reply with quote

v.yesakov wrote:
Yes, I confirm: adding "sleep" at the end of script helps. Process survives after HM closes SSH session.


Anyway I think SSH must be fixed on HostMonitor site, because new SSH mode breaks backwards compatibility.

V.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Wed May 17, 2017 6:25 am    Post subject: Reply with quote

We have this task in the list but we cannot consider it as high priority task.
We spent a week and we still do not see what exactly should be changed, will return to this task later...

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting All times are GMT - 6 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index