Introduction
This is a summary of HTTP-related content after reading Xiao Jia’s book “HTTP Packet Sniffing Practice,” focusing mainly on message structures. (Note: Reading the book and writing this took 5 days).
HTTP Message Structure
There are two types of HTTP messages: Request and Response.
HTTP Request
An HTTP request consists of three parts: the Request Line, the Request Headers, and the Body.
- The first line (Request Line) contains the Method, URI, and protocol version. Example:
GET https://blog.yexca.net/ HTTP/2 - The second part is the Headers.
- The third part is the Body.
Note: There is an empty line between the Headers and the Body.
HTTP Response
The structure of an HTTP response is basically the same as a request, also divided into three parts: the Response Line (Status Line), the Response Headers, and the Body.
- The first line contains the protocol version, status code, and status message. Example: HTTP/2 200
- The second part is the Headers.
- The third part is the Body.
Note: There is an empty line between the Headers and the Body.
HTTP Methods and Status Codes
URL Format
URL stands for Uniform Resource Locator. It’s used to describe the address of a resource on the Internet.
The basic format of a URL is:
schema://host[:port#]/path/.../[?query-string][#anchor]
| Attribute | Description |
|---|---|
| schema | Specifies the underlying protocol (e.g., http, https, ftp) |
| host | The IP address or domain name of the HTTP server |
| port# | The default port for HTTP is 80; it can be omitted if using the default. Otherwise, it must be specified |
| path | The path to the resource |
| query-string | Data sent to the HTTP server |
| anchor | An internal page hyperlink |
HTTP Request Methods
| No. | Method | Description |
|---|---|---|
| 1 | GET | Requests a specific resource and returns the entity body |
| 2 | HEAD | Similar to GET, but the response has no body; used to retrieve headers |
| 3 | POST | Submits data to be processed (e.g., form submission or file upload). Data is included in the body. May create new resources or modify existing ones |
| 4 | PUT | Replaces the content of the target resource with the uploaded data |
| 5 | DELETE | Requests the server to delete the specified resource |
GET vs. POST
- GET data is appended to the URL, separated by a
?(query-string, key-value pairs), and parameters are connected with&. POST puts data in the HTTP message Body. - GET has size limits (due to browser URL length limits). POST has no size limits for data.
- In code, GET typically uses
Request.QueryStringto get variables, while POST usesRequest.Form.
HTTP Status Codes
Status codes are in the HTTP response. The server uses them to tell the client what happened. They are divided into five categories.
| Status Code | Defined Range | Category |
|---|---|---|
| 1XX | 100–101 | Informational: Request received, continue processing |
| 2XX | 200–206 | Success: Request successfully received, understood, and accepted |
| 3XX | 300–305 | Redirection: Further action needed to complete the request |
| 4XX | 400–415 | Client Error: Request has bad syntax or cannot be fulfilled |
| 5XX | 500–505 | Server Error: Server failed to fulfill a valid request |
Common Status Codes
| Name | Meaning |
|---|---|
| 200 | OK: Server successfully processed the request |
| 301/302 | Moved Permanently / Found (Redirect): The URL has moved. The response should contain a Location URL |
| 304 | Not Modified: Client’s cached version is up to date; use the cache |
| 404 | Not Found: Resource not found |
| 401 | Unauthorized: Authentication required |
| 501 | Not Implemented (Source mentions Internal Server Error): Server encountered an error |
206 (Partial Content)
The 206 status code means the server successfully processed a partial GET request. This is used for resumable downloads or online video streaming.
Example of a video request:
- Browser sends GET with Header: Range: bytes=5303296-5336063.
- Server returns 206 with Header: Content-Range: bytes 5303296-5336063/12129376.
301 vs. 302 (Redirects)
After receiving 301 or 302, the browser makes a new request to the URL in the Location header.
- 301: Old address is permanently moved. Search engines transfer authority to the new URL. (e.g., switching domains).
- 302: Old address still exists; the redirect is temporary. (e.g., redirecting to a login page).
304 (Not Modified)
Indicates the cached version is still valid and can be used.
400 (Bad Request)
Syntax error in the client request. The server cannot understand it (e.g., malformed form data or corrupted Cookies).
401 (Unauthorized)
Authentication error. Used when HTTP Basic Authentication is required but the Authorization header is missing or invalid.
404 (Not Found)
Server cannot find the resource. Can also be used to hide a resource (refuse request without reason). For example, BV1AB4y1D7Ft is only visible if logged in and favorited; otherwise, it returns 404.
403 (Forbidden)
The server understood the request but refuses to authorize it. Unlike 401, re-authenticating makes no difference.
500 (Internal Server Error)
Generic server error. Could be code bugs, DB connection issues, uncaught exceptions, or null pointers.
503 (Server Unavailable)
Server is temporarily unable to handle the request (overloaded or down for maintenance).
Full list of status codes
Visit: HTTP Status Codes - Runoob
HTTP Headers
Headers use the “key: value” format, one per line.
Cache-related Headers
Both requests and responses use headers for caching. Caching allows retrieving files from local storage instead of the original server.
Cookie
A type of HTTP cache/storage. Key-value format (e.g., ip_country=CN). Browsers send them via the “Cookie” header; servers set them via “Set-Cookie”.
Accept
Media types the client can handle. Accept: text/html means the browser wants HTML. * is a wildcard.
Accept-Encoding
Related to compression. The browser tells the server which compression algorithms it supports (e.g., Accept-Encoding: gzip, deflate).
Accept-Language
The languages the client understands (e.g., Accept-Language: en-US,en;q=0.8,zh-CN;q=0.6). Note: language $\neq$ character set.
User-Agent
Identifies the client’s OS, browser version, engine, etc. Example: User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0.
Referer
Indicates the source page the user came from. Used for analytics and hotlink protection.
Connection
In HTTP/1.1, Connection: Keep-Alive is default. It keeps the TCP connection open for multiple requests.
Host
Specifies the target host and port. Port 80 is usually hidden.
HTTP Caching
Caching exists at the browser, server, and proxy levels. It reduces redundant data, saves time, and lowers server load.
Judging Cache Freshness
The server checks cache validity in two ways:
- Last-Modified / If-Modified-Since: Browser sends the last modification time. If the file hasn’t changed, the server returns 304.
- ETag / If-None-Match: Browser sends a unique hash (ETag) of the file content.
Cache Headers
- Request Headers
| Name | Description |
|---|---|
| Cache-Control: max-age=0 | Force revalidation |
| If-Modified-Since | Last modification time of the cached file |
| If-None-Match | ETag value of the cached file |
| Cache-Control: no-cache | Do not use cache |
| Pragma: no-cache | Do not use cache (Legacy) |
- Response Headers
| Name | Description |
|---|---|
| Cache-Control: public | Can be cached by public proxies/users |
| Cache-Control: private | Cache only for specific user |
| Cache-Control: no-cache | Must validate with server before using cache |
| Cache-Control: no-store | Never cache (for sensitive data) |
| Cache-Control: max-age=60 | Cache expires in 60s (relative) |
| Date | Time response was sent |
| Expires | Absolute expiration time |
| Last-Modified | Last modification time on server |
| Etag | Unique hash of the server file |
Note: Cache-Control takes precedence over Expires.
ETag
Entity Tag. A hash string representing the file state. It solves issues where Last-Modified isn’t precise enough (it only goes down to the second) or where modification times change but content doesn’t.
Forcing No Cache
Ctrl+Shift+R forces a refresh, adding Cache-Control: no-cache to the request.
Direct Cache Use
Typing the URL in the address bar usually results in a “cache hit” if valid, without even checking the server.
Compression and URL Encoding
HTTP compression reduces the size of text content (HTML, JS, CSS) during transfer.
Compression Process
- Browser sends Accept-Encoding: gzip, deflate.
- Server generates response, compresses the Body with gzip, updates Content-Length, and adds Content-Encoding: gzip.
- Browser receives response and decompresses it.
Note: Browsers usually don’t compress requests.
Encoding Types
| Encoding | Description |
|---|---|
| gzip | GNU zip |
| compress | UNIX compress |
| deflate | zlib format |
| identity | No encoding (Default) |
gzip is the most efficient and widely used.
Deep Dive into Cookies
HTTP is stateless. Each request is independent. Sessions solve this by maintaining state between the browser and server.
Sessions vs. Cookies
- Server creates a session and sends the Session ID to the browser.
- Browser stores the ID and sends it back in subsequent requests.
- Server identifies the user via the ID.
Cookies are the mechanism browsers use to store this ID.
What are Cookies?
Small data stored as key=value pairs, separated by semicolons. Primarily used for authentication, user preferences, and ad tracking. Some regions (EU) have laws requiring user consent for cookies.
Cookie Attributes
- Expires: When it expires. If omitted, it’s a session cookie (deleted when browser closes).
- Path: The scope of the cookie.
/means the whole site. - HttpOnly: Prevents JavaScript from reading the cookie. Essential for security against XSS.
Categories
- Session Cookie: Temporary, stored in memory.
- Persistent Cookie: Stored on disk with an expiration date. Used for “auto-login.”
HTTP Basic Authentication
Used by some desktop apps and routers. The client sends username:password encoded in Base64 via the Authorization header.
Process:
- Server returns 401 and a WWW-Authenticate header.
- Browser prompts for credentials.
- Browser sends Base64 encoded credentials.
Disadvantages
- Stateless: Every request must be authenticated.
- Insecure: Base64 is easily reversed. Must use HTTPS.
- No Logout: Cannot log out without closing the browser or clearing history.
- Vulnerable to replay attacks.
Digest Authentication
An improved version of Basic Auth. It uses hashes (MD5) and a “nonce” (number used once) from the server to prevent password sniffing and replay attacks.