WebSockets
WebSockets are a powerful communication protocol that enables real-time, two-way communication between applications. They are commonly used in applications such as chat systems, gaming platforms, and other scenarios that require immediate data exchange.
A WebSocket server is an application that listens on a TCP port and follows the RFC 6455 protocol, which defines the WebSocket standard. You can implement a WebSocket server using any server-side programming language that supports Berkeley sockets (also known as BSD sockets or POSIX sockets).
The key advantages of using WebSocket for communication are:
**Two-way Communication: ** With WebSockets, the server can initiate communication and send messages to clients at any time. Unlike traditional HTTP requests, where clients have to poll the server regularly for updates, WebSockets allow clients to establish a connection and listen for incoming messages from the server. This enables real-time updates and eliminates the need for constant polling.
**Lower Overhead per Message: ** In scenarios with high traffic between the client and server, WebSockets have lower overhead per message compared to traditional HTTP requests. Once the WebSocket connection is established, messages can be sent over the existing TCP connection without the need to establish a new connection for each message. In contrast, HTTP requests require multiple round trips for connection establishment, which adds overhead, especially when using SSL/TLS.
**Higher Scalability: ** Due to the lower overhead and elimination of client polling, WebSocket-based applications can achieve higher scalability. The reduced network traffic and server load allow a single server to serve a larger number of connected clients. However, WebSocket scalability also depends on factors such as server capacity and resource availability.
**Stateful Connections: ** WebSocket connections are inherently stateful, meaning you can store and manage connection-specific state on the server without relying on cookies or session IDs. While stateless connections are commonly used in modern development practices, there are situations where having a stateful connection simplifies the application logic.
The key disadvantages of using WebSocket for communication are:
Scalability Challenges:
At large scales, particularly with hundreds of thousands of connected clients, special server configurations and optimizations may be necessary to handle the increased WebSocket connections efficiently.
Client and Toolset Support:
Not all clients and toolsets provide the same level of support for WebSockets as they do for HTTP requests. Ensure that the clients you are targeting can effectively work with WebSockets before adopting them.
Server Environment Limitations: Some server environments, particularly less expensive or shared hosting environments, may not support the long-running server processes required to handle WebSocket connections. Verify that your server environment can handle WebSocket connections effectively.
Use Case Specificity:. If your application requires progress notifications sent to clients, you have options to achieve this using either long-running HTTP connections or WebSockets. WebSockets might be more convenient in such cases. However, if you only need WebSockets for a specific activity's short duration, using WebSockets solely for pushing data and relying on HTTP requests for regular request/response activities could provide the best trade-off.
How WebSocket Works
The WebSocket handshake:
The handshake is the "Web" in WebSockets. It's the bridge from HTTP to WebSockets and it uses the same process/path as any HTTP request to establish a connection (8 back and forward calls between client and server).
Client handshake request
First the client will try to establish http connection with the server. All the http rules apply here. Client can send different headers like cookies, origin, authorization ... Example:
GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Server handshake response
When the server receives the handshake request, it should send back a special response that indicates that the protocol will be changing from HTTP to WebSocket. Example:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
The Sec-WebSocket-Accept header is important in that the server must derive it from the Sec-WebSocket-Key that the client sent to it. To get it, concatenate the client's Sec-WebSocket-Key and the string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" together (it's a "magic string"), take the SHA-1 hash of the result, and return the base64 encoding of that hash. All the http rules still apply here. Server can respond with all http headers. Like cookies, cache-control ...
NOTE: The server must keep track of clients' sockets, so we don't keep handshaking again.
Connection established: Exchanging data frames
Client or the server can choose to send a message at any time. To extract the information from these so-called "frames" of data the same specific format is being used. Data going from the client to the server is masked using XOR encryption (with a 32-bit key).
Frame format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
Based on this format decoding/encoding of transferred payloads happen.
Pings and Pongs: The Heartbeat of WebSockets
At any point after the handshake, either the client or the server can choose to send a ping to the other party. When the ping is received, the recipient must send back a pong as soon as possible. You can use this to make sure that the client is still connected, for example. A ping or pong is just a regular frame, but it's a control frame. Pings have an opcode of 0x9, and pongs have an opcode of 0xA. When you get a ping, send back a pong with the exact same Payload Data as the ping (for pings and pongs, the max payload length is 125). You might also get a pong without ever sending a ping; ignore this if it happens.
Closing the connection
To close a connection either the client or server can send a control frame with data containing a specified control sequence to begin the closing handshake
You can find a standalone Java server and simple Javascript client here: https://gist.github.com/patrikbego/447c40f8eaee709d7a6a2d2cadbfd046 https://gist.github.com/patrikbego/acd1eb069195c78d9db7edb8a0eb6a3c
Refs: https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers https://stackoverflow.com/questions/29925955/what-are-the-pitfalls-of-using-websockets-in-place-of-restful-http https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_a_WebSocket_server_in_Java