HTTP Learning

📢 This article was translated by gemini-3-flash-preview

Introduction

This is a summary of HTTP-related content after reading Xiao Jia’s book “HTTP Packet Sniffing Practice,” focusing mainly on message structures. (Note: Reading the book and writing this took 5 days).

HTTP Message Structure

There are two types of HTTP messages: Request and Response.

HTTP Request

An HTTP request consists of three parts: the Request Line, the Request Headers, and the Body.

  • The first line (Request Line) contains the Method, URI, and protocol version. Example: GET https://blog.yexca.net/ HTTP/2
  • The second part is the Headers.
  • The third part is the Body.

Note: There is an empty line between the Headers and the Body.

HTTP Response

The structure of an HTTP response is basically the same as a request, also divided into three parts: the Response Line (Status Line), the Response Headers, and the Body.

  • The first line contains the protocol version, status code, and status message. Example: HTTP/2 200
  • The second part is the Headers.
  • The third part is the Body.

Note: There is an empty line between the Headers and the Body.

HTTP Methods and Status Codes

URL Format

URL stands for Uniform Resource Locator. It’s used to describe the address of a resource on the Internet.

The basic format of a URL is:

schema://host[:port#]/path/.../[?query-string][#anchor]

AttributeDescription
schemaSpecifies the underlying protocol (e.g., http, https, ftp)
hostThe IP address or domain name of the HTTP server
port#The default port for HTTP is 80; it can be omitted if using the default. Otherwise, it must be specified
pathThe path to the resource
query-stringData sent to the HTTP server
anchorAn internal page hyperlink

HTTP Request Methods

No.MethodDescription
1GETRequests a specific resource and returns the entity body
2HEADSimilar to GET, but the response has no body; used to retrieve headers
3POSTSubmits data to be processed (e.g., form submission or file upload). Data is included in the body. May create new resources or modify existing ones
4PUTReplaces the content of the target resource with the uploaded data
5DELETERequests the server to delete the specified resource

GET vs. POST

  1. GET data is appended to the URL, separated by a ? (query-string, key-value pairs), and parameters are connected with &. POST puts data in the HTTP message Body.
  2. GET has size limits (due to browser URL length limits). POST has no size limits for data.
  3. In code, GET typically uses Request.QueryString to get variables, while POST uses Request.Form.

HTTP Status Codes

Status codes are in the HTTP response. The server uses them to tell the client what happened. They are divided into five categories.

Status CodeDefined RangeCategory
1XX100–101Informational: Request received, continue processing
2XX200–206Success: Request successfully received, understood, and accepted
3XX300–305Redirection: Further action needed to complete the request
4XX400–415Client Error: Request has bad syntax or cannot be fulfilled
5XX500–505Server Error: Server failed to fulfill a valid request

Common Status Codes

NameMeaning
200OK: Server successfully processed the request
301/302Moved Permanently / Found (Redirect): The URL has moved. The response should contain a Location URL
304Not Modified: Client’s cached version is up to date; use the cache
404Not Found: Resource not found
401Unauthorized: Authentication required
501Not Implemented (Source mentions Internal Server Error): Server encountered an error

206 (Partial Content)

The 206 status code means the server successfully processed a partial GET request. This is used for resumable downloads or online video streaming.

Example of a video request:

  1. Browser sends GET with Header: Range: bytes=5303296-5336063.
  2. Server returns 206 with Header: Content-Range: bytes 5303296-5336063/12129376.

301 vs. 302 (Redirects)

After receiving 301 or 302, the browser makes a new request to the URL in the Location header.

  • 301: Old address is permanently moved. Search engines transfer authority to the new URL. (e.g., switching domains).
  • 302: Old address still exists; the redirect is temporary. (e.g., redirecting to a login page).

304 (Not Modified)

Indicates the cached version is still valid and can be used.

400 (Bad Request)

Syntax error in the client request. The server cannot understand it (e.g., malformed form data or corrupted Cookies).

401 (Unauthorized)

Authentication error. Used when HTTP Basic Authentication is required but the Authorization header is missing or invalid.

404 (Not Found)

Server cannot find the resource. Can also be used to hide a resource (refuse request without reason). For example, BV1AB4y1D7Ft is only visible if logged in and favorited; otherwise, it returns 404.

403 (Forbidden)

The server understood the request but refuses to authorize it. Unlike 401, re-authenticating makes no difference.

500 (Internal Server Error)

Generic server error. Could be code bugs, DB connection issues, uncaught exceptions, or null pointers.

503 (Server Unavailable)

Server is temporarily unable to handle the request (overloaded or down for maintenance).

Full list of status codes

Visit: HTTP Status Codes - Runoob

HTTP Headers

Headers use the “key: value” format, one per line.

Both requests and responses use headers for caching. Caching allows retrieving files from local storage instead of the original server.

A type of HTTP cache/storage. Key-value format (e.g., ip_country=CN). Browsers send them via the “Cookie” header; servers set them via “Set-Cookie”.

Accept

Media types the client can handle. Accept: text/html means the browser wants HTML. * is a wildcard.

Accept-Encoding

Related to compression. The browser tells the server which compression algorithms it supports (e.g., Accept-Encoding: gzip, deflate).

Accept-Language

The languages the client understands (e.g., Accept-Language: en-US,en;q=0.8,zh-CN;q=0.6). Note: language $\neq$ character set.

User-Agent

Identifies the client’s OS, browser version, engine, etc. Example: User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0.

Referer

Indicates the source page the user came from. Used for analytics and hotlink protection.

Connection

In HTTP/1.1, Connection: Keep-Alive is default. It keeps the TCP connection open for multiple requests.

Host

Specifies the target host and port. Port 80 is usually hidden.

HTTP Caching

Caching exists at the browser, server, and proxy levels. It reduces redundant data, saves time, and lowers server load.

Judging Cache Freshness

The server checks cache validity in two ways:

  1. Last-Modified / If-Modified-Since: Browser sends the last modification time. If the file hasn’t changed, the server returns 304.
  2. ETag / If-None-Match: Browser sends a unique hash (ETag) of the file content.

Cache Headers

  • Request Headers
NameDescription
Cache-Control: max-age=0Force revalidation
If-Modified-SinceLast modification time of the cached file
If-None-MatchETag value of the cached file
Cache-Control: no-cacheDo not use cache
Pragma: no-cacheDo not use cache (Legacy)
  • Response Headers
NameDescription
Cache-Control: publicCan be cached by public proxies/users
Cache-Control: privateCache only for specific user
Cache-Control: no-cacheMust validate with server before using cache
Cache-Control: no-storeNever cache (for sensitive data)
Cache-Control: max-age=60Cache expires in 60s (relative)
DateTime response was sent
ExpiresAbsolute expiration time
Last-ModifiedLast modification time on server
EtagUnique hash of the server file

Note: Cache-Control takes precedence over Expires.

ETag

Entity Tag. A hash string representing the file state. It solves issues where Last-Modified isn’t precise enough (it only goes down to the second) or where modification times change but content doesn’t.

Forcing No Cache

Ctrl+Shift+R forces a refresh, adding Cache-Control: no-cache to the request.

Direct Cache Use

Typing the URL in the address bar usually results in a “cache hit” if valid, without even checking the server.

Compression and URL Encoding

HTTP compression reduces the size of text content (HTML, JS, CSS) during transfer.

Compression Process

  1. Browser sends Accept-Encoding: gzip, deflate.
  2. Server generates response, compresses the Body with gzip, updates Content-Length, and adds Content-Encoding: gzip.
  3. Browser receives response and decompresses it.

Note: Browsers usually don’t compress requests.

Encoding Types

EncodingDescription
gzipGNU zip
compressUNIX compress
deflatezlib format
identityNo encoding (Default)

gzip is the most efficient and widely used.

Deep Dive into Cookies

HTTP is stateless. Each request is independent. Sessions solve this by maintaining state between the browser and server.

Sessions vs. Cookies

  1. Server creates a session and sends the Session ID to the browser.
  2. Browser stores the ID and sends it back in subsequent requests.
  3. Server identifies the user via the ID.

Cookies are the mechanism browsers use to store this ID.

What are Cookies?

Small data stored as key=value pairs, separated by semicolons. Primarily used for authentication, user preferences, and ad tracking. Some regions (EU) have laws requiring user consent for cookies.

  • Expires: When it expires. If omitted, it’s a session cookie (deleted when browser closes).
  • Path: The scope of the cookie. / means the whole site.
  • HttpOnly: Prevents JavaScript from reading the cookie. Essential for security against XSS.

Categories

  • Session Cookie: Temporary, stored in memory.
  • Persistent Cookie: Stored on disk with an expiration date. Used for “auto-login.”

HTTP Basic Authentication

Used by some desktop apps and routers. The client sends username:password encoded in Base64 via the Authorization header.

Process:

  1. Server returns 401 and a WWW-Authenticate header.
  2. Browser prompts for credentials.
  3. Browser sends Base64 encoded credentials.

Disadvantages

  1. Stateless: Every request must be authenticated.
  2. Insecure: Base64 is easily reversed. Must use HTTPS.
  3. No Logout: Cannot log out without closing the browser or clearing history.
  4. Vulnerable to replay attacks.

Digest Authentication

An improved version of Basic Auth. It uses hashes (MD5) and a “nonce” (number used once) from the server to prevent password sniffing and replay attacks.

References