Chapter 2:Application layer
2.1 Principles of network applications
Application architecture
Client-server

- server
- always-on host
- permanent IP address
- clients
- communicate with server
- may be intermittently connected
- may have dynamic IP address
- do not communicate directly with each other
Pure P2P architecture
每一个P 既是Client又是Server
- no always on server
- arbitrary end systems directly and change IP address
- example: Gnutella
- Highly scalable(可扩展)
- But difficult to manage
Hybrid of client-server and P2P
有一个server matain file在谁手上,然后与对应peers通信,但传file不经过server
- Napster
- File transfer P2P
- File search centralized
- Peers register content at central server
- Peers query same central server to locate content
- Instant messaging
- Chatting between two users is P2P(不经过server转送)
- Presence detection/location centralized:
- User registers its IP address with central server when it comes online
- User contacts central server to find IP addresses of buddies(伙伴)
Processes communicating
- within same host, two processes communicate using inter-process communication (defined by OS).
- processes in different hosts communicate by exchanging messages
Client process: process that initiates communication
Server process: process that waits to be contacted
applications with P2P architectures have client processes & server processesProcesses communicating across network
- process sends/receives messages to/from its socket
socket
可以类比于door
- sending process shoves message out door
- sending process relies on transport infrastructure on other side of door which brings message to socket at receiving process
API:
(1) choice of transport protocol; (TCP/UDP)
(2) ability to fix a few parameters (lots more on this later)
Addressing processes
1.找到电脑(靠IP
)
2.找到电脑对应的process(靠端口号
port numbers)
- For a process to receive messages, it must have an identifier
- Every host has a unique 32-bit IP address
- does the IP address of the host on which the process runs suffice for identifying the process?
No, many processes can be running on same host
- does the IP address of the host on which the process runs suffice for identifying the process?
- Example port numbers:
- HTTP server: 80
- Mail server: 25
- File server: 20, 21
- DNS:53
端口号1000以下的已经被占用
App-layer protocol defines
- Types of messages exchanged, eg, request & response messages
- Syntax of message types: what fields in messages & how fields are delineated
- Semantics of the fields, ie, meaning of information in fields
- Rules for when and how processes send & respond to messages
Public-domain protocols - defined in RFCs
- allows for interoperability
- eg, HTTP, SMTP
Proprietary protpcols(不公开的) - eg. KaZaA
What transport service does an app need?
Data loss
- some apps (e.g., audio) can tolerate some loss
- other apps (e.g., file transfer, telnet) require 100% reliable data transfer
Timing(时效性)
- some apps (e.g., Internet telephony, interactive games) require low delay to be “effective”
Bandwidth
- some apps (e.g., multimedia) require minimum amount of bandwidth to be “effective”
- other apps (“elastic apps”) make use of whatever bandwidth they get
Transport service requirements of common apps
TCP service
- connection-oriented: setup required between client and server processes
- reliable transport between sending and receiving process
- flow control: sender won’t overwhelm receiver
- congestion control: throttle sender when network overloaded
- does not providing: timing, minimum bandwidth guarantees
UDP service
比如query常用 - unreliable data transfer between sending and receiving process
- does not provide: connection setup, reliability, flow control, congestion control, timing, or bandwidth guarantee
2.2 Web and HTTP
First some jargon(专业术语)
- Web page consists of objects
- An object is a file such as an HTML file, a JPEG image, a Java applet, an audio file,…
- A Web page consists of a base HTML-file and several referenced objects
- The base HTML file references the other objects in the page with the object’s URLs (Uniform Resource Locators)
HTTP overview
HTTP: hypertext transfer protocol
- Web’s application layer protocol
- client/server model
- client: browser that requests, receives, “displays” Web objects
- server: Web server sends objects in response to requests
- HTTP 1.0: RFC 1945
- HTTP 1.1: RFC 2616
Uses TCP
- client initiates TCP connection(creaates socket) to server, port 80
- server accepts TCP connection from client
- HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
- TCP connection closed
HTTP is “stateless”(不记录之前的状态)
- server maintains no information past client requests
HTTP connections
Nonpersistent HTTP
每抓一个object都建一个连线 - At most one object is sent over a TCP connection.
- HTTP/1.0
Response timie modeling
total = 2RTT+transmit time
- one RTT to initiate TCP connection
- one RTT for HTTP request and first few bytes of HTTP response to return
- file transmission time
Definition of RTT: time to send a small packet to travel from client to server and back.

Persistent HTTP
每建一个连线,可以抓多个object
- Multiple objects can be sent over single TCP connection between client and server.
- A new connection need not be set up for the transfer of each Web object
- HTTP/1.1
Nonpersistent HTTP issues
- requires 2 RTTs per object
- OS must work and allocate host resources for each TCP connection
- but browsers often open parallel TCP connections to fetch referenced objects
Persistent HTTP
server leaves connection open after sending response
subsequent HTTP messages between same client/server are sent over connection
Persistent without pipelining
- client issues new request only when previous response has been received
- one RTT for each referenced object
Persistent with pipelining(并行)
- default in HTTP/1.1
- client sends requests as soon as it encounters a referenced object
- as little as one RTT for all the referenced objects
HTTP message
two types of HTTP messages:request, responseHTTP request message:general format
GET /somedir/page.html HTTP/1.1
-- Request to return the object /somedir/page.html
-- The browser implements version HTTP/1.1
Host: www.someschool.edu
-- Specifies the host on which the object resides
User-agent: Mozilla/4.0
-- Specifies the browser type that is making the request
Connection: close
-- Indicates that the connection SHOULD NOT be considered `persistent`. It wants the server to close the connection after the current request/response is complete
Accept-language:fr
-- Indicates that the user prefers to receive a French version of the object

Method types
HTTP/1.0
- GET : Return the object
- POST : Send information to be stored on the server
- HEAD :Return only information about the object, such as how old it is, but not the object itself
HTTP/1.1
- GET, POST, HEAD
- PUT : Uploads a new copy of existing object in entity body to path specified in URL field
- DELETE: deletes object specified in the URL field
Uploading(上传) form input
- Post method
- Web page often includes form input
- Input is uploaded to server in entity body
- URL method
- Uses GET method
- Input is uploaded in URL field of request line:
HTTP response message
An HTTP response consists of the following:
1.A status line, which indicates the success or failure of the request
2.Header lines: A description of the information in the response. This is the metadata or meta information
3.The actual information requested
HTTP response status codes
200 OK
request succeeded, requested object later in this message
301 Moved Permanently
requested object moved, new location specified later in this message (Location:)
400 Bad Request
request message not understood by server
404 Not Found
requested document not found on this server
505 HTTP Version Not Supported
User-server interaction: authorization
Authorization:control access to server content
- authorization credentials: typically name, password
User-server state : cookies
Cookies是数据包,可以让网页具有记忆功能,在某台电脑上记忆一定的信息。
Cookies的工作原理是,第一次由服务器端写入到客户端的系统中。 以后每次访问这个网页,都是先由客户端将Cookies发送到服务器端,再由服务器端进行判断,然后再产生HTML代码返回给客户端。
Four components of cookie technology:
- 1)cookie header line in the HTTP response message
- 2)cookie header line in HTTP request message
- 3)cookie file kept on user’s browser
- 4)back-end database at Web site
Cookies: keeping “state”

网站根据用户的cookie file,去查数据库,知道用户之前浏览过哪些网页,从而对用户做个性化的推荐
Web caches(proxy server)
Goal: satisfy client request without involving origin server

- user sets browser: Web accesses via cache
- browser sends all HTTP requests to cache
- (object in cache)有备份直接从cache返回object
- (else)request object from origin server,然后再返回给client
Why Web caching
Reduce response time for client request.
Reduce traffic on an institution’s access link.
Internet dense with caches enables “poor” content providers to effectively deliver content (but so does P2P file sharing)
(把资料放到很多cache中,别人access cache就可以,减轻server负担)如果增加bandwidth

如果用cache
negligible(微不足道的)
link使用率下降,在link上的delay就大幅下降
用比较小的频宽,得到较小的delay
2.3 FTP
file transfer protocol
- transfer file to/from remote host
- client/server model
- Client side: the side that initiates transfer(either to/from remote)
- Server side: remote host
- ftp: RFC 959
- ftp sever : port 21
FTP:seperate control, data connections
- FTP client contacts FTP server at port 21, specifying TCP as transport protocol
- Client obtains authorization over control connection –- username, password
- Client browses remote directory by sending commands over control connection.
- When server receives a command for a file transfer, the server opens a TCP data connection to client
- After transferring one file server closes connection

Server opens a second TCP data connection to transfer another file.
Control connection: “out of band”
FTP server maintains “state”: current directory, earlier authentication
FTP commands, responses

2.4 Electronic Mail
- SMTP, POP3, IMAP
Three maior components of a mail system: - user agents
- mail servers
- simple mail transfer protocol: SMTP
User Agent
- Also known as “mail reader”
- composing, editing, reading mail messages
- e.g., Eudora, Outlook, elm, Netscape Messenger
- outgoing, incoming messages stored on server
Maile servers
既是client(sending mail server),又是server(receiving mail server)mailbox
mailbox contains incoming messages for usermessage queue
message queue of outgoing (to be sent) mail messagesSMTP
Simple Mail Transfer Protocol
port 25 - uses TCP to reliably transfer email message from client to server, port 25
- direct transfer: sending server to receiving server
- three phases of transfer
- handshaking (greeting)
- transfer of messages
- closure
- command/response interaction
- commands: ASCII text
- response: status code and phrase
- messages must be in 7-bit ASCII
1) Alice uses user agent to compose message and “to” bob@someschool.edu
2) Alice’s user agent sends message to her mail server; message placed in message queue
3) Alice’s mail server (Client side) of SMTP opens TCP connection with Bob’s mail server (server side)
4) SMTP client sends Alice’s message over the TCP connection
5) Bob’s mail server places the message in Bob’s mailbox
6) Bob invokes his user agent to read message
SMTP final words
- SMTP uses persistent connections
- SMTP requires message (header & body) to be in 7-bit ASCII
- SMTP server uses CRLF.CRLF to determine end of message
Comparison with HTTP
- HTTP: pull protocol (client’s point of view)
- SMTP: push protocol
- both have ASCII command/response interaction, status codes
- HTTP does not require message to be in 7-bit ASCII
- HTTP: one object in one response message
- SMTP: multiple objects can be sent in one message
Mail access protocols
- SMTP: delivery/storage ti receiver’s server
- Maio acccess protocol: retrieval from server
- POP3:Post Office Protocol, version 3
- authorization(agent<–>server)and download
- IMAP: Internet Mail Access Protocol
- more features(more complex)
- manipulation of stored messages on server
- HTTP: Hotmail, Yahoo!Mail,etc
2.5 DNS
Domain Name System (map between IP address and name)
Internet hosts, routers:
- IP address(32 bit)-used for addressing datagrams
- “name”, e.g., bbs.hupu.com
Domain Name System
- A distributed database implemented in hierarchy of many name servers
- An application-layer protocol that allows host, routers, name servers to communicate to resolve names (address/name translation)
- DNS provides a core Internet function, implemented as application-layer protocol
- Hostname to IP address translation
- Host aliasing
- Canonical and alias names
- Relay1.west-coast.enterprise.com
- enterprise.com and www.enterprise.com
- Mail server aliasing
- bob@hotmail.com
- Relay1.west-coast.hotmail.com
- Load distribution
- Replicated Web servers: set of IP addresses for one canonical name
Why not centralize DNS?
doesn’t scale
single point of failure
traffic volume
distant centralized database
maintenance
Distributed,Hierarchical Database

Client wants IP for www.amazon.com; 1st approx:
- Client queries a root server to find com DNS server
- Client queries com DNS server to get amazon.com DNS server
- Client queries amazon.com DNS server to get IP address for www.amazon.com
DNS name servers
4 types of name servers
- 1) root name servers
- 2) top level name servers(to be explained next)
- 3) authoritative name servers
- 4) local name servers
2.6 P2P file sharing
Root name servers
- 13 root name servers worldwide
- contacted by local name server that can not resolve name
- root name server:
- gets mapping
- returns mapping to local name server
Authoritative Servers
rorganization’s DNS servers, providing authoritative hostname to IP mappings for organization’s servers (e.g., Web and mail).
- Can be maintained by organization or service provider
Local Name Server
- Does not strictly belong to hierarchy
- Each ISP (residential ISP, company, university) has one.
- Also called “default name server”
- When a host makes a DNS query, query is sent to its local DNS server
- Acts as a proxy, forwards query into hierarchy.
Example
Iterated query

contacted server replies with name of server to contact
- “I don’t know this name, but ask this server”
recursive query

puts burden of name resolution on contacted name server
- heavy load?

DNS: caching and updating records
- dynamic, 一定时间进行更新
- statically, 写死进去,长期有效
DNS records
DNS: distributed database storing resource records(RR) - Type = A
- name is hostname
- value is IP address
- (relay1.bar.foo.com, 145.37.93.126, A)
- Type = NS
- name is domain (e.g. foo.com)
- value is host name of an authoritative name server for this domain
- (foo.com, dns.foo.com, NS)
- Type = CNAME
- name is alias name for some “canonical” (the real) name
- www.ibm.com is really servereast.backup2.ibm.com
- value is canonical name
- (foo.com, relay1.bar.foo.com, CNAME)
- Type = MX
- name is alias name for some mail server
- value is the canonical name of the mail server
- identification: : 16 bit # for query, reply to query uses same
- flags:
- query or reply
- recursion desired
- recursion available
2.6 P2P file sharing
File distribution problem
上传很快,下载很慢。 用clien-server model很浪费时间
P2P file sharing
All peers are servers and clients = highly scalable!
P2P: centralized directory
每一个peer上线的时候,就把IP和content注册到server的database中
当peer需要某个内容时候,先去server查询。然后根据查询结果与指定peer建连线
1) when peer connects, it informs central server:
- IP address
- content
2) Alice queries for “Hey Jude”
3) Alice requests file from BobProblem
1) single point of failure
2) Performance bottleneck
3) Copyright infringement
file transfer is decentralized, but locating content is highly centralized
P2P: Query flooding : Gnutella
类似于广播,没有centralized的server来记录哪个peer有哪些内容
- fully distributed
- no central server
- public domain protocol
- many Gnutella clients implementing protocol
overlay network(重叠网络): graph
- edge between peer X and Y if there’s a TCP connection(双向的连接)
- all active peers and edges is overlay net.
- Edge is not a physical link
- Given peer will typically be connected with < 10 overlay neighbors
Gnutella: Peer joining
1.Joining peer X must find some other peer in Gnutella network: use list of candidate peers
2.X sequentially attempts to make TCP with peers on list until connection setup with Y
3.X sends Ping message to Y; Y forwards Ping message.
4.All peers receiving Ping message respond with Pong message
5.X receives many Pong messages. It can then setup additional TCP connections
Exploiting heterogeneity: KaZaA

- Each peer is either a group leader or assigned to a group leader.
- TCP connection between peer and its group leader.
- TCP connections between some pairs of group leaders.
- Group leader tracks the content in all its children.
Quering
- Each file has a hash and a descriptor
- Client sends keyword query to its group leader
- Group leader responds with matches:
- For each match: metadata(描述data的文字), hash, IP address
- If group leader forwards query to other group leaders, they respond with matches(转给其他的group leader)
- Client then selects files for downloading
- HTTP requests using hash as identifier sent to peers holding desired file
Kazaa tricks
- Request queuing(限制某一时间,抓取自己peer中file的数量,因为太多会限制自己的频宽)
- Limitation on the number of simultaneous uploads
- Incentive priorities(上传量越多,优先级越高)
- Give priority to users who have uploaded more files than they have downloaded
- Parallel downloading(抓一个file,可以同时找几个peer同时建连线,同时要一部分)
- Use the byte-range header of HTTP to request different portion of the file from different peers