Hi,
Got a lot of tests againt different servers.
Yesterday, some CPU, diskspace and service tests failed. Unknown username, access denied etc. Nothing changed on AHM or the server.
Connection using a wildcard at Connection Manager
Couldn't see anything in the securitylog at the monitored system.
Then I changed to IP-address instead of DNS name. Not FQDN. Now it works. But ping test keeps running against DNS name..
Any ideas?
AHM 7.11 running on W2k3 without antivirus
Target W2K3-R2
Hostname vs IP-address
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Could you provide more information please?
- What Windows do you use? Service Pack?
- Is the problem happened against one particular server or do you see the errors on a bunch of servers?
- Do you see any error messages in "Event Viewer" applet?
Looks like the problem is caused by some windows limitations.
Below are several registry settings, that may increase the max limit for concurrent TCP connections:
1. There is a parameter that limits the maximum number of connections that TCP may have open simultaneously.
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
TcpNumConnections = 0x00fffffe (Default = 16,777,214)
Note a 16 Million connection limit sounds very promising, but there are other parameters (See below), which keeps us from ever reaching this limit.
When a client makes a connect() call to make a connection to a server, then the client invisible/implicit bind the socket to a local dynamic (anonymous, ephemeral, short-lived) port number. The default range for dynamic ports in Windows is 1024 to 5000, thus giving 3977 outbound concurrent connections for each IP Address. It is possible to change the upper limit with this DWORD registry key:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
MaxUserPort = 5000 (Default = 5000, Max = 65534)
Even when not having 3977 concurrent connections for each IP Address, then it is still possible to run out of available port numbers or TCB's. This can happen if quickly opening and closing connections, because after a connection is "closed" it enters the state TIME_WAIT, and will continue to occupy the port number for 4 minutes (2*Maximum Segment Live, MSL) before it is actually removed. This behavior is specified in RFC 793, and prevents attempts to reconnect to the same party, before the old socket is recognized as closed at both sides. It is possible to change how long a socket should be in TIME_WAIT state before it can be re-used freely:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \services \Tcpip \Parameters]
TcpTimedWaitDelay = 120 (Default = 240 secs, Range = 30-300)
Note with Win2k the reuse of sockets have been changed, so when reaching the limit of more than 1000 connections in TIME-WAIT state, then it starts to mark sockets that have been in TIME_WAIT state for morethan 60 secs as free. It is possible to configure this limit:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \services \Tcpip \Parameters]
MaxFreeTWTcbs = 1000 (Default = 1000 sockets)
Note with Win2k3 SP1 the reuse of sockets have been changed, so when it has to re-use sockets in TIME_WAIT state, then it checks whether the other party is different from the old socket. Eliminating the need to fiddle with (TcpTimedWaitDelay) and (MaxFreeTWTcbs) any more.
If using an application protocol that doesn't implement timeout checking, but relies on the TCPIP timeout checking without specifying how often it should be done, then it is possible to get connections that "never" closes, if the remote host disconnects without closing the connection properly. The TCPIP timeout checking is by default done every 2 hour, by sending a keep alive packet. It is possible to change how often TCPIP should check the connections (Affects all TCPIP connections):
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \services \Tcpip \Parameters]
KeepAliveTime = 1800000 (Default = 7,200,000 milisecs)
For each connection a TCP Control Block (TCB - Data structure using 0.5 KB pagepool and 0.5 KB non-pagepool) is maintained. The TCBs are pre-allocated and stored in a table, to avoid spending time on allocating/deallocating the TCBs every time connections are created/closed. The TCB Table enables reuse/caching of TCBs and improves memory management, but the static size limits how many connections TCP can support simultaneously (Active + TIME_WAIT). Configure the size of the TCB Table with this DWORD registry key:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
MaxFreeTcbs = 2000 (Default = RAM dependent, but usual Pro = 1000, Srv=2000)
To make lookups in the TCB table faster a hash table has been made, which is optimized for finding a certain active connection. If the hash table is too small compared to the total amount of active connections, then extra CPU time is required to find a connection. Configure the size of the hash table with this DWORD registry key (Is allocated from pagepool memory): [HKEY_LOCAL_MACHINE \System \CurrentControlSet \services \Tcpip \Parameters]
MaxHashTableSize = 512 (Default = 512, Range = 64-65536)
Also, we recommend to change MinFreeConnections and MaxFreeConnections values [HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\LanmanServer\Parameters] according to this article:
http://support.microsoft.com/kb/909262
Beside that, I think, command "netsh winsock reset" might help. You should execute it from command line prompt. Please note: restart is required after command's execution.
Regards,
Max
- What Windows do you use? Service Pack?
- Is the problem happened against one particular server or do you see the errors on a bunch of servers?
- Do you see any error messages in "Event Viewer" applet?
Looks like the problem is caused by some windows limitations.
Below are several registry settings, that may increase the max limit for concurrent TCP connections:
1. There is a parameter that limits the maximum number of connections that TCP may have open simultaneously.
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
TcpNumConnections = 0x00fffffe (Default = 16,777,214)
Note a 16 Million connection limit sounds very promising, but there are other parameters (See below), which keeps us from ever reaching this limit.
When a client makes a connect() call to make a connection to a server, then the client invisible/implicit bind the socket to a local dynamic (anonymous, ephemeral, short-lived) port number. The default range for dynamic ports in Windows is 1024 to 5000, thus giving 3977 outbound concurrent connections for each IP Address. It is possible to change the upper limit with this DWORD registry key:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
MaxUserPort = 5000 (Default = 5000, Max = 65534)
Even when not having 3977 concurrent connections for each IP Address, then it is still possible to run out of available port numbers or TCB's. This can happen if quickly opening and closing connections, because after a connection is "closed" it enters the state TIME_WAIT, and will continue to occupy the port number for 4 minutes (2*Maximum Segment Live, MSL) before it is actually removed. This behavior is specified in RFC 793, and prevents attempts to reconnect to the same party, before the old socket is recognized as closed at both sides. It is possible to change how long a socket should be in TIME_WAIT state before it can be re-used freely:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \services \Tcpip \Parameters]
TcpTimedWaitDelay = 120 (Default = 240 secs, Range = 30-300)
Note with Win2k the reuse of sockets have been changed, so when reaching the limit of more than 1000 connections in TIME-WAIT state, then it starts to mark sockets that have been in TIME_WAIT state for morethan 60 secs as free. It is possible to configure this limit:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \services \Tcpip \Parameters]
MaxFreeTWTcbs = 1000 (Default = 1000 sockets)
Note with Win2k3 SP1 the reuse of sockets have been changed, so when it has to re-use sockets in TIME_WAIT state, then it checks whether the other party is different from the old socket. Eliminating the need to fiddle with (TcpTimedWaitDelay) and (MaxFreeTWTcbs) any more.
If using an application protocol that doesn't implement timeout checking, but relies on the TCPIP timeout checking without specifying how often it should be done, then it is possible to get connections that "never" closes, if the remote host disconnects without closing the connection properly. The TCPIP timeout checking is by default done every 2 hour, by sending a keep alive packet. It is possible to change how often TCPIP should check the connections (Affects all TCPIP connections):
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \services \Tcpip \Parameters]
KeepAliveTime = 1800000 (Default = 7,200,000 milisecs)
For each connection a TCP Control Block (TCB - Data structure using 0.5 KB pagepool and 0.5 KB non-pagepool) is maintained. The TCBs are pre-allocated and stored in a table, to avoid spending time on allocating/deallocating the TCBs every time connections are created/closed. The TCB Table enables reuse/caching of TCBs and improves memory management, but the static size limits how many connections TCP can support simultaneously (Active + TIME_WAIT). Configure the size of the TCB Table with this DWORD registry key:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
MaxFreeTcbs = 2000 (Default = RAM dependent, but usual Pro = 1000, Srv=2000)
To make lookups in the TCB table faster a hash table has been made, which is optimized for finding a certain active connection. If the hash table is too small compared to the total amount of active connections, then extra CPU time is required to find a connection. Configure the size of the hash table with this DWORD registry key (Is allocated from pagepool memory): [HKEY_LOCAL_MACHINE \System \CurrentControlSet \services \Tcpip \Parameters]
MaxHashTableSize = 512 (Default = 512, Range = 64-65536)
Also, we recommend to change MinFreeConnections and MaxFreeConnections values [HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\LanmanServer\Parameters] according to this article:
http://support.microsoft.com/kb/909262
Beside that, I think, command "netsh winsock reset" might help. You should execute it from command line prompt. Please note: restart is required after command's execution.
Regards,
Max