It is widely understood that in a SIP INVITE transaction the ACK for a response other than 2xx is part of the same transaction as INVITE and the ACK for a 2xx response is a separate transaction. Every one knows this as a fact because it is written almost exactly as above in the 3261.
Not many people, however, understand why this is the case, which could be the key to better design.
ACK for a 2xx response is not generated by the transaction layer but by the UAC core and it may contains an answer to the offer received in the 2xx response.
Also since it is the goal of SIP designers to keep the transaction layer independent of the Transaction User (TU) layer, so the behavior of transaction layer on receipt of 2xx response is same in both UA and Proxy case – terminate the transaction and send the response to TU where if it is a UA it will generate an ACK and if it is a Proxy it will forward the response.
Any retransmission of the 2xx response is also dealt by the TU layer and not Transaction layer as the transaction is terminated on receipt of first 2xx response.
It is also possible that the INVITE may result in more than one 2xx response and since the first 2xx destroys the client transaction, all responses are directly dealt with by the the TU.
Because of the fact that the original transaction is destroyed to allow TU to deal with the 2xx response, the ACK is sent as part of a separate transaction.
ACK for 2xx is a subsequent request within a dialog like any other in-dialog request, except for the way the numeric value of CSeq is used, which is still the same as the original INVITE. This piece alone associates the ACK with the original INVITE at both the UAs, as there may be more than one INVITE transactions (re-INVITEs) going within the same dialog.
All this is fine and I hope it makes sense. The next logical question is if ACK is really a separate transaction then why does it not have a response?
Well, for once it is not required, because if ACK is not received by the UAS it will retransmit the 2xx response, note that this is true for any transport even reliable, as it ensures that the offer answer exchange is completed end to end.
You may have worked with RFC 3262 which introduces the reliable provisional responses. The concept is an extension of INVITE-2xx-ACK exchange in that the INVITE-ReliableProvisional-PRACK exchange makes the response reliable. However, PRACK in this case unlike ACK *does* have a response of its own. You may ask if a response for this type of transaction is not required then why is it required for PRACK, well theoretically the same arguments as for the ACK also holds for PRACK and a response is not required for reliability. However, since RFC 3262 is an extension of base SIP protocol and since base SIP protocol accords the special status only to ACK requests, it would have been a big compatibility issue for the existing infrastructure.
Coming back to ACK to a 2xx response – This is really a special transaction unlike any other; in fact in creation and sending of this the transaction machinery is hardly involved.
If you carefully read RFC 3261 then you would get a sense that there is no client transaction at the UAC and no server transaction at the UAS associated with this ACK request but the reason I say “hardly” involved is because this ACK has to have a Via header with a proper branch parameter. Remember branch parameter is required to identify a transaction. The next question to ask is - if there are no real transactions involved then why bother with a branch parameter?
The reason is that even though there is no transaction involved and the ACK to 2xx passes through the transaction layer at the UAS like any other request, only when no matching transaction is found is it handed over to TU. While at the UAC it is just a message with a new branch identifier which make it sound like a transaction but does not need to pass through the transaction layer. In fact it is directly passed to transport by TU and retransmitted when the TU gets a 2xx retransmission.
An absence of branch (if it were allowed by SIP) would have resulted in the same behavior but like almost everywhere in SIP, generality rules.
Let me close this with a tangent, to which I will come back some other time with more details.
If SIP protocol definition were an object oriented system then while designing PRACK request they could have made PRACK a subclass of ACK and so the proxies in between would not have had to learn new behavior and PRACK could have done without a response.
Unfortunately SIP (and most other protocols) is not described as object oriented system in some DSL, they are instead written in English prose.
IFIP (International Federation for Information Processing) is doing some work in the area of object oriented protocol specifications but that work is unfortunately not easily accessible.
I would have loved to see a SIP RFC besides being described in English as it is now, also accompany an interaction / behavior definition in a machine consumable form which could have been used to automatically generate parsers, state machines and higher order objects to manifest the behavior. This is very attractive but extremely difficult to realize in practice, but as I said I will revisit this thought again.