SITE TO SITE IPSEC VPN PHASE-1 AND PHASE-2 TROUBLESHOOTING STEPS , NEGOTIATIONS STATES AND MESSAGES MM_WAIT_MSG
(Image Source – www.Techmusa.com)
Network Troubleshooting is an art and site to site vpn Troubleshooting is one of my favorite network job.I believe other networking folks like the same. The first and most important step of troubleshooting is diagnosing the issue, isolate the exact issue without wasting time.
In this article i wanted to describe the steps of Troubleshooting a site-to-site VPN tunnel, most of vpn appliances provide the Plenty of debugging information for engineer to diagnose the issue.
I love to work on CLI (command line) and cisco Firewall is my favorite and have successfully created vpn tunnels including Cisco ASA, SonicWALL, Cyberoam, Checkpoint, Palo-Alto and lots more. As a network engineer, it doesn’t matter what vpn device you are using at each end of the vpn site. While creating vpn tunnels, we generally encounter common issue and as a set of rules’, there are basically few checks that you need to validate for when a tunnel fails to establish.
There are Four most common issue we generally face while setting up vpn tunnel.
- Phase 1 (ISAKMP) security associations fail
- Phase 2 (IPsec) security associations fail
- VPN Tunnel is established, but not traffic passing through
- Intermittent vpn flapping and disconnection
Most of time, the remote end tunnel may be configured by a different engineer, so ensure that Phase-1 and Phase-2 configuration should be identical of both side of the tunnel. It would be helpful if we can use a common vpn template and exchange the Phase-1 and Phase-2 SA (security associations) information between both parties before setting up the vpn tunnel.
Phase 1 (ISAKMP) security associations fail
The first step to take when Phase-1 of the tunnel not comes up. Make sure your encryption setting, authentication, hashes, and lifetime etc. should be same for both ends of the tunnel for the phase 1 proposal.
Here’s a quick checklist of phase-1 (ISAKMP)
- ISAKMP parameters match exactly.
- Pre-shared-keys match exactly.
- External route to the peer address or Peer IP should be reachable/ping from your Firewall.
- Enable ISAKMP on the outside interfaces.
- ESP traffic permitted through the outside interface
- UDP port 500 open on the outside ACL
- Some situations UDP port 4500 need to open for the outside.
IPsec VPN Messages Type #MM_WAIT_MSG
ISAKMP (IKE Phase 1) Negotiations States and Messages MM_WAIT_MSG
MM_WAIT_MSG2 – Initiator sent encryption, hashes and DH ( Diffie–Hellman) to responder and Awaiting initial reply from other end gateway. If Initiator stuck at MM_WAIT_MSG2 means the remote end is not responding to Initiator. This could be happening due to the following reason.
- Routing issue at remote end
- Remote end does not have configured ISAKMP enabled on the outside.
- remote gateway ip is incorrect
- Firewall is blocking connectivity somewhere between the two
- Firewall blocking ISAKMP (usually UDP port 500)
- Remote end peer is down
MM_WAIT_MSG3 – Initiator Received back its IKE policy to the Receiver. Initiator sends encryption, hash, DH and IKE policy details to create initial contact. Initiator will wait at MM_WAIT_MSG2 until it hears back from Receiver. Tunnel stuck at MM_WAIT_MSG3 due to the following reason.
- Mismatch in device vendors
- Firewall in the way
- ASA version mismatch
- No return route to the initiating device
MM_WAIT_MSG4 – Now the Initiator has received the IKE policy and sends the Pre-Shared-Key to Receiver. Now Initiator will stay at MM_WAIT_MSG4 until it gets a Pre-Shared-Key back from Receiver. If the receiver is does not have configured tunnel group or Pre-Shared-Key the initiator will stay at MM_WAIT_MSG4.
There are following reason that tunnel stuck at MM_WAIT_MSG4
- Missing a tunnel group
- Pre-Shared-Key mismatched at Receiver end.
MM_WAIT_MSG5 – Initiator Received its Pre-Shared-Key hash from Receiver. If receiver has a tunnel group and PSK configured for the initiators peer address, it sends its PSK hash to the initiator. If PSKs don’t match, receiver will stay at MM_WAIT_MSG5.There are following reason that tunnel stuck at MM_WAIT_MSG5
- Initiator sees the Pre-Shared-Key do not match
- NAT-T on and should be off
MM_WAIT_MSG6 – Initiator see if Pre-Shared-Key hashes match. If Pre-Shared-Key match, Initiator state becomes MM_ACTIVE and acknowledge to receiver. If Pre-Shared-Key does not match, initiator stays at MM_WAIT_MSG6. There are following reason that tunnel stuck at MM_WAIT_MSG6
- Pre-Shared-Key don’t match
- NAT-T on and should be off
Note -: if the state intermediately goes to MM_WAIT_MSG6 and tunnel gets rest that means phase 1 completed but phase 2 getting fail to establish the IPsec connection. Check IPSEC phase 2 settings matches of both the end of the tunnel.
AM_ACTIVE – Receiver received MM_ACTIVE acknowledge from Initiator and it becomes MM_ACTIVE.ISAKMP SA negotiations are now completed and Phase 1 has successfully completed.
Phase 2 (IPsec) security associations fail
Once the Phase 1 negotiations have established and you are falling into IPsec phase 2. There are a few different set of things need to be checked.
- Check the phase 2 proposal encryption algorithm, authentication algorithm or hash, and lifetime are the same on both sides.
- Check VPN Encryption Domain (Local and remote subnet) should be identical.
- Check correct ACL should binding with Crypto Map
- Check Firewall Inside local route to reach inside hosted network/servers
- Make sure remote subnet should not overlap with your local Lan
Check NAT Exemption. - Check the PFS (perfect forward secrecy) if you are using.
- Make sure the tunnel is bound to the public facing interface (crypto map outside_map interface outside)
After the above check and validation, Now If you have both phase 1 and phase 2 successful established and vpn tunnel is reported as up. Ensure traffic is passing through the vpn tunnel. Initiates some traffic (ICMP Traffic ) from inside the host or run packet tracer from firewall to originate traffic to bring the phase-2 up and see the Packet encap and Packet decap happing.
VPN Tunnel is established, but traffic not passing through
If the traffic not passing thru the vpn tunnel or packet #pkts encaps and #pkts decaps not happing as expected. These numbers tell us how many packets have traversed the IPSec tunnel and verifies that we are receiving traffic back from the remote end of the VPN tunnel. There is couple of things that you need to check.
- Check firewall policies and routing.
- Run packet tracker from Firewall and check vpn traffic flow.
- Check Firewall Inside local route to reach inside hosted network/servers
- Make sure remote subnet should not overlap with your local Lan
- Make sure new vpn policy should not overlap with existing policy.
vpn-Firewall# sh crypto ipsec sa peer 90.1.1.1
peer address: 90.1.1.1
Crypto map tag: Outside_Map, seq num: 90, local addr: 200.100.0.1access-list Test_vpn extended permit ip 172.16.10.0/24 192.168.0.0/24
local ident (addr/mask/prot/port): (172.16.10.0/255.255.255.0/0/0)
remote ident (addr/mask/prot/port): (192.168.10.0/255.255.255.0/0/0)
current_peer: 90.1.1.1#pkts encaps: 294486, #pkts encrypt: 294485, #pkts digest: 294485
#pkts decaps: 306851, #pkts decrypt: 306851, #pkts verify: 306851
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 294486, #pkts comp failed: 0, #pkts decomp failed: 0
#pre-frag successes: 0, #pre-frag failures: 0, #fragments created: 0
#PMTUs sent: 0, #PMTUs rcvd: 0, #decapsulated frgs needing reassembly: 0
#send errors: 0, #recv errors: 3416
Verify #pkts encaps and #pkts decaps
All of the above steps should resolve vpn tunnel issues that you are experiencing. If the vpn tunnel still not establish and traffic not passing , We recommend to try a different set of encryption settings. There may be something strange incompatibilities issue encounters with different vendor devices. Also check the latest release notes for firmware version of your VPN appliance. (If you have already upgraded any firmware to the latest version). Finally, check the knowledgebase and get vendor inputs for your specific appliance as it may provide further suggestions/assistance.
Intermittent vpn flapping and discontinuation
Sometimes it is crazy that vpn tunnel state is going up and down constantly and users getting frustrated due to connection drop with the servers.
There are couple of reasons that vpn tunnel is getting dropped and it start all of sudden even you have not made any change in the vpn tunnel.
In this case, you need to check following things listed as below -:
- Make sure there is no change done at remote end which you are not being notified.
- Re-validate the encryption domain (Local and Remote subnet in the vpn) both end should have identical match and exact CIDR.
- Re-check the Phase-1 and Phase-2 Lifetime settings at both ends of the tunnel (Phase-1 life time should be higher than Phase-2)
- Check the DPD (Dead Peer Detection) setting (If you are using different vendor firewall DPD should be disabled.)
- Check configuration in detail and make sure Peer IP should not be NATTED.
- Make sure internet link should be stable and there is no intermittent drop in the connectivity.
Phase 1 (IKEv1) and Phase 2 (IPsec) Configuration Steps-:
Phase 1 (IKEv1) Configuration
Complete the below mentioned steps for the Phase 1 configuration:
In this example we are using CLI mode in order to enable IKEv1 on the outside interface:
crypto ikev1 enable outside
Create an IKEv1 Phase-1 policy that defines the authentication , encryption , hashing, DH group(Diffie-Hellman) and lifetime
crypto ikev1 policy 1
authentication pre-share
encryption aes
hash sha
group 2
lifetime 86400
Phase 2 (IPsec) Configuration
Complete these steps for the Phase 2 configuration:
Create an access list which defines the traffic to be encrypted and through the tunnel. In this example, the source traffic of interesting subnet would be from the 172.16.100.0/24 subnet to the 192.168.10.0/24. It can contain multiple entries if there are multiple subnets involved between the sites.
object network Obj_172.16.100.0
subnet 172.16.100.0 255.255.255.0object network Obj_192.168.10.0
subnet 192.168.10.0 255.255.255.0
Note -: In ASA Versions 8.4 and later, objects or object groups can be created for the networks, subnets, host IP addresses.Here we have Created two objects group that have the local and remote subnets and use them for both the crypto Access Control List (ACL) and the NAT statements.
access-list test_vpn extended permit ip object Obj_172.16.100.0 object Obj_192.168.10.0
NAT Exemption Or NO NAT
nat (inside,outside) 1 source static Obj_172.16.100.0 Obj_172.16.100.0 destination static Obj_192.168.10.0 Obj_192.168.10.0 no-proxy-arp route-lookup
(Note -: Make sure that VPN traffic is not subjected to any other NAT rule.)
Configure the IKEv1 Transform Set. Same an identical Transform Set must be created on the remote end as well.
crypto ipsec ikev1 transform-set myset esp-aes esp-sha-hmac
Configure the crypto map, which contains the Following components:
- Peer IP address
- Access list
- Transform Set
- An optional Perfect Forward Secrecy (PFS) setting, which creates a new pair of Diffie-Hellman keys which used to protect the data (both sides must be PFS-enabled)
crypto map outside_map 10 match address test_vpn
crypto map outside_map 10 set peer 90.1.1.1
crypto map outside_map 10 set ikev1 transform-set myset
crypto map outside_map 10 set pfs
Create a tunnel group under the IPsec attributes and configure the peer IP address and IPSec vpn tunnel pre-shared key
tunnel-group 90.1.1.1 type ipsec-l2l
tunnel-group 90.1.1.1 ipsec-attributes
ikev1 pre-shared-key cisco
Apply the crypto map on the outside interface:
crypto map outside_map interface outside
VPN Troubleshooting and Verification Command
VPN-Firewall# sh crypto isakmp sa | b 90.1.1.1
5 IKE Peer: 90.1.1.1
Type : L2L Role : responder
Rekey : no State : MM_ACTIVE
VPN-Firewall# sh crypto ipsec sa peer 90.1.1.1
peer address: 90.1.1.1
Crypto map tag: Outside_Map, seq num: 90, local addr: 200.100.0.1access-list Test_vpn extended permit ip 172.16.10.0/24 192.168.10.0/24
local ident (addr/mask/prot/port): (172.16.10.0/255.255.255.0/0/0)
remote ident (addr/mask/prot/port): (192.168.10.0/255.255.255.0/0/0)
current_peer: 90.1.1.1#pkts encaps: 294486, #pkts encrypt: 294485, #pkts digest: 294485
#pkts decaps: 306851, #pkts decrypt: 306851, #pkts verify: 306851
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 294486, #pkts comp failed: 0, #pkts decomp failed: 0
#pre-frag successes: 0, #pre-frag failures: 0, #fragments created: 0
#PMTUs sent: 0, #PMTUs rcvd: 0, #decapsulated frgs needing reassembly: 0
#send errors: 0, #recv errors: 3416local crypto endpt.: 200.100.0.1, remote crypto endpt.: 90.1.1.1
path mtu 1500, ipsec overhead 58, media mtu 1500
current outbound spi: A12ACD06
current inbound spi : ADA4ACB9
VPN-Firewall# sh vpn-sessiondb detail l2l | b 90.1.1.1
Connection : 90.1.1.1
Index : 48142 IP Addr : 90.1.1.1
Protocol : IKE IPsec
Encryption : 3DES Hashing : SHA1
Bytes Tx : 82449639 Bytes Rx : 262643640
Login Time : 16:26:32 EDT Tue Jul 11 2017
Duration : 11d 14h:16m:29s
IKE Tunnels: 1
IPsec Tunnels: 4IKE:
Tunnel ID : 48142.1
UDP Src Port : 500 UDP Dst Port : 500
IKE Neg Mode : Main Auth Mode : preSharedKeys
Encryption : 3DES Hashing : SHA1
Rekey Int (T): 86400 Seconds Rekey Left(T): 39341 Seconds
D/H Group : 2
Filter Name :IPsec:
Tunnel ID : 48142.2
Local Addr : 172.16.10.0/255.255.255.255/0/0
Remote Addr : 192.168.10.0/255.255.255.255/0/0
Encryption : 3DES Hashing : SHA1
Encapsulation: Tunnel
Rekey Int (T): 28800 Seconds Rekey Left(T): 6219 Seconds
Rekey Int (D): 4608000 K-Bytes Rekey Left(D): 4606645 K-Bytes
Idle Time Out: 30 Minutes Idle TO Left : 29 Minutes
Bytes Tx : 20200839 Bytes Rx : 65481714
Pkts Tx : 294551 Pkts Rx : 306920
Very helpful website. Thanks techmusa for helping me in vpn troubleshooting .
That’s a well twitter article – a lot of useful information. Thank you for sharing.
Your section regarding VPN flapping help us resolve a real odd issue. We discovered that the Liftetime for phase 1 and phase 2 matched. We increased the Lifetime for phase 1 which resolved our issue. Our Cisco and Check Point to logs were ‘erroneously (?)’ complaining about phase 1 authentication (PSK mismatch). Your troubleshooting notes are excellent, much appreciated!