HTTP Learning

📢 This article was translated by gemini-3-flash-preview

Introduction

This is a summary of HTTP-related content after reading Xiao Jia’s book “HTTP Packet Sniffing Practice,” focusing mainly on message structures. (Note: Reading the book and writing this took 5 days).

HTTP Message Structure

There are two types of HTTP messages: Request and Response.

HTTP Request

An HTTP request consists of three parts: the Request Line, the Request Headers, and the Body.

The first line (Request Line) contains the Method, URI, and protocol version. Example: GET https://blog.yexca.net/ HTTP/2
The second part is the Headers.
The third part is the Body.

Note: There is an empty line between the Headers and the Body.

HTTP Response

The structure of an HTTP response is basically the same as a request, also divided into three parts: the Response Line (Status Line), the Response Headers, and the Body.

The first line contains the protocol version, status code, and status message. Example: HTTP/2 200
The second part is the Headers.
The third part is the Body.

Note: There is an empty line between the Headers and the Body.

HTTP Methods and Status Codes

URL Format

URL stands for Uniform Resource Locator. It’s used to describe the address of a resource on the Internet.

The basic format of a URL is:

schema://host[:port#]/path/.../[?query-string][#anchor]

Attribute	Description
schema	Specifies the underlying protocol (e.g., http, https, ftp)
host	The IP address or domain name of the HTTP server
port#	The default port for HTTP is 80; it can be omitted if using the default. Otherwise, it must be specified
path	The path to the resource
query-string	Data sent to the HTTP server
anchor	An internal page hyperlink

HTTP Request Methods

No.	Method	Description
1	GET	Requests a specific resource and returns the entity body
2	HEAD	Similar to GET, but the response has no body; used to retrieve headers
3	POST	Submits data to be processed (e.g., form submission or file upload). Data is included in the body. May create new resources or modify existing ones
4	PUT	Replaces the content of the target resource with the uploaded data
5	DELETE	Requests the server to delete the specified resource

GET vs. POST

GET data is appended to the URL, separated by a ? (query-string, key-value pairs), and parameters are connected with &. POST puts data in the HTTP message Body.
GET has size limits (due to browser URL length limits). POST has no size limits for data.
In code, GET typically uses Request.QueryString to get variables, while POST uses Request.Form.

HTTP Status Codes

Status codes are in the HTTP response. The server uses them to tell the client what happened. They are divided into five categories.

Status Code	Defined Range	Category
1XX	100–101	Informational: Request received, continue processing
2XX	200–206	Success: Request successfully received, understood, and accepted
3XX	300–305	Redirection: Further action needed to complete the request
4XX	400–415	Client Error: Request has bad syntax or cannot be fulfilled
5XX	500–505	Server Error: Server failed to fulfill a valid request

Common Status Codes

Name	Meaning
200	OK: Server successfully processed the request
301/302	Moved Permanently / Found (Redirect): The URL has moved. The response should contain a `Location` URL
304	Not Modified: Client’s cached version is up to date; use the cache
404	Not Found: Resource not found
401	Unauthorized: Authentication required
501	Not Implemented (Source mentions Internal Server Error): Server encountered an error

206 (Partial Content)

The 206 status code means the server successfully processed a partial GET request. This is used for resumable downloads or online video streaming.

Example of a video request:

Browser sends GET with Header: Range: bytes=5303296-5336063.
Server returns 206 with Header: Content-Range: bytes 5303296-5336063/12129376.

301 vs. 302 (Redirects)

After receiving 301 or 302, the browser makes a new request to the URL in the Location header.

301: Old address is permanently moved. Search engines transfer authority to the new URL. (e.g., switching domains).
302: Old address still exists; the redirect is temporary. (e.g., redirecting to a login page).

304 (Not Modified)

Indicates the cached version is still valid and can be used.

400 (Bad Request)

Syntax error in the client request. The server cannot understand it (e.g., malformed form data or corrupted Cookies).

401 (Unauthorized)

Authentication error. Used when HTTP Basic Authentication is required but the Authorization header is missing or invalid.

404 (Not Found)

Server cannot find the resource. Can also be used to hide a resource (refuse request without reason). For example, BV1AB4y1D7Ft is only visible if logged in and favorited; otherwise, it returns 404.

403 (Forbidden)

The server understood the request but refuses to authorize it. Unlike 401, re-authenticating makes no difference.

500 (Internal Server Error)

Generic server error. Could be code bugs, DB connection issues, uncaught exceptions, or null pointers.

503 (Server Unavailable)

Server is temporarily unable to handle the request (overloaded or down for maintenance).

Full list of status codes

Visit: HTTP Status Codes - Runoob

HTTP Headers

Headers use the “key: value” format, one per line.

Both requests and responses use headers for caching. Caching allows retrieving files from local storage instead of the original server.

A type of HTTP cache/storage. Key-value format (e.g., ip_country=CN). Browsers send them via the “Cookie” header; servers set them via “Set-Cookie”.

Accept

Media types the client can handle. Accept: text/html means the browser wants HTML. * is a wildcard.

Accept-Encoding

Related to compression. The browser tells the server which compression algorithms it supports (e.g., Accept-Encoding: gzip, deflate).

Accept-Language

The languages the client understands (e.g., Accept-Language: en-US,en;q=0.8,zh-CN;q=0.6). Note: language $\neq$ character set.

User-Agent

Identifies the client’s OS, browser version, engine, etc. Example: User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0.

Referer

Indicates the source page the user came from. Used for analytics and hotlink protection.

Connection

In HTTP/1.1, Connection: Keep-Alive is default. It keeps the TCP connection open for multiple requests.

Host

Specifies the target host and port. Port 80 is usually hidden.

HTTP Caching

Caching exists at the browser, server, and proxy levels. It reduces redundant data, saves time, and lowers server load.

Judging Cache Freshness

The server checks cache validity in two ways:

Last-Modified / If-Modified-Since: Browser sends the last modification time. If the file hasn’t changed, the server returns 304.
ETag / If-None-Match: Browser sends a unique hash (ETag) of the file content.

Cache Headers

Request Headers

Name	Description
Cache-Control: max-age=0	Force revalidation
If-Modified-Since	Last modification time of the cached file
If-None-Match	ETag value of the cached file
Cache-Control: no-cache	Do not use cache
Pragma: no-cache	Do not use cache (Legacy)

Response Headers

Name	Description
Cache-Control: public	Can be cached by public proxies/users
Cache-Control: private	Cache only for specific user
Cache-Control: no-cache	Must validate with server before using cache
Cache-Control: no-store	Never cache (for sensitive data)
Cache-Control: max-age=60	Cache expires in 60s (relative)
Date	Time response was sent
Expires	Absolute expiration time
Last-Modified	Last modification time on server
Etag	Unique hash of the server file

Note: Cache-Control takes precedence over Expires.

ETag

Entity Tag. A hash string representing the file state. It solves issues where Last-Modified isn’t precise enough (it only goes down to the second) or where modification times change but content doesn’t.

Forcing No Cache

Ctrl+Shift+R forces a refresh, adding Cache-Control: no-cache to the request.

Direct Cache Use

Typing the URL in the address bar usually results in a “cache hit” if valid, without even checking the server.

Compression and URL Encoding

HTTP compression reduces the size of text content (HTML, JS, CSS) during transfer.

Compression Process

Browser sends Accept-Encoding: gzip, deflate.
Server generates response, compresses the Body with gzip, updates Content-Length, and adds Content-Encoding: gzip.
Browser receives response and decompresses it.

Note: Browsers usually don’t compress requests.

Encoding Types

Encoding	Description
gzip	GNU zip
compress	UNIX compress
deflate	zlib format
identity	No encoding (Default)

gzip is the most efficient and widely used.

Deep Dive into Cookies

HTTP is stateless. Each request is independent. Sessions solve this by maintaining state between the browser and server.

Sessions vs. Cookies

Server creates a session and sends the Session ID to the browser.
Browser stores the ID and sends it back in subsequent requests.
Server identifies the user via the ID.

Cookies are the mechanism browsers use to store this ID.

What are Cookies?

Small data stored as key=value pairs, separated by semicolons. Primarily used for authentication, user preferences, and ad tracking. Some regions (EU) have laws requiring user consent for cookies.

Expires: When it expires. If omitted, it’s a session cookie (deleted when browser closes).
Path: The scope of the cookie. / means the whole site.
HttpOnly: Prevents JavaScript from reading the cookie. Essential for security against XSS.

HTTP Basic Authentication

Used by some desktop apps and routers. The client sends username:password encoded in Base64 via the Authorization header.

Process:

Server returns 401 and a WWW-Authenticate header.
Browser prompts for credentials.
Browser sends Base64 encoded credentials.

Disadvantages

Stateless: Every request must be authenticated.
Insecure: Base64 is easily reversed. Must use HTTPS.
No Logout: Cannot log out without closing the browser or clearing history.
Vulnerable to replay attacks.

Digest Authentication

An improved version of Basic Auth. It uses hashes (MD5) and a “nonce” (number used once) from the server to prevent password sniffing and replay attacks.

Introduction

HTTP Message Structure

HTTP Request

HTTP Response

HTTP Methods and Status Codes

URL Format

HTTP Request Methods

GET vs. POST

HTTP Status Codes

Common Status Codes

206 (Partial Content)

301 vs. 302 (Redirects)

304 (Not Modified)

400 (Bad Request)

401 (Unauthorized)

404 (Not Found)

403 (Forbidden)

500 (Internal Server Error)

503 (Server Unavailable)

Full list of status codes

HTTP Headers

Cache-related Headers

Cookie

Accept

Accept-Encoding

Accept-Language

User-Agent

Referer

Connection

Host

HTTP Caching

Judging Cache Freshness

Cache Headers

ETag

Forcing No Cache

Direct Cache Use

Compression and URL Encoding

Compression Process

Encoding Types

Deep Dive into Cookies

Sessions vs. Cookies

What are Cookies?

Cookie Attributes

Categories

HTTP Basic Authentication

Disadvantages

Digest Authentication

References