Chapter 2:Application layer

2.1 Principles of network applications

Application architecture

Client-server

server

always-on host

permanent IP address

clients

communicate with server

may be intermittently connected

may have dynamic IP address

do not communicate directly with each other

Pure P2P architecture

每一个P 既是Client又是Server

no always on server

arbitrary end systems directly and change IP address

example: Gnutella

Highly scalable（可扩展）

But difficult to manage

Hybrid of client-server and P2P

有一个server matain file在谁手上，然后与对应peers通信，但传file不经过server

Napster

File transfer P2P

File search centralized

Peers register content at central server

Peers query same central server to locate content

Instant messaging

Chatting between two users is P2P(不经过server转送)

Presence detection/location centralized:

User registers its IP address with central server when it comes online

User contacts central server to find IP addresses of buddies(伙伴)

Processes communicating

within same host, two processes communicate using inter-process communication (defined by OS).
processes in different hosts communicate by exchanging messages
Client process: process that initiates communication
Server process: process that waits to be contacted
applications with P2P architectures have client processes & server processes
Processes communicating across network
process sends/receives messages to/from its socket
socket可以类比于door
- sending process shoves message out door
- sending process relies on transport infrastructure on other side of door which brings message to socket at receiving process

API:
(1) choice of transport protocol; （TCP/UDP）
(2) ability to fix a few parameters (lots more on this later)

Addressing processes

1.找到电脑（靠IP）
2.找到电脑对应的process(靠端口号 port numbers)

For a process to receive messages, it must have an identifier
Every host has a unique 32-bit IP address
- does the IP address of the host on which the process runs suffice for identifying the process?
  No, many processes can be running on same host

Example port numbers:

HTTP server: 80

Mail server: 25

File server: 20, 21

DNS：53
端口号1000以下的已经被占用

App-layer protocol defines

Types of messages exchanged, eg, request & response messages
Syntax of message types: what fields in messages & how fields are delineated
Semantics of the fields, ie, meaning of information in fields
Rules for when and how processes send & respond to messages
Public-domain protocols
defined in RFCs
allows for interoperability
eg, HTTP, SMTP
Proprietary protpcols（不公开的）
eg. KaZaA

What transport service does an app need?

Data loss

some apps (e.g., audio) can tolerate some loss
other apps (e.g., file transfer, telnet) require 100% reliable data transfer
Timing（时效性）
some apps (e.g., Internet telephony, interactive games) require low delay to be “effective”
Bandwidth
some apps (e.g., multimedia) require minimum amount of bandwidth to be “effective”
other apps (“elastic apps”) make use of whatever bandwidth they get
Transport service requirements of common apps

TCP service
connection-oriented: setup required between client and server processes
reliable transport between sending and receiving process
flow control: sender won’t overwhelm receiver
congestion control: throttle sender when network overloaded
does not providing: timing, minimum bandwidth guarantees
UDP service
比如query常用
unreliable data transfer between sending and receiving process
does not provide: connection setup, reliability, flow control, congestion control, timing, or bandwidth guarantee
2.2 Web and HTTP
First some jargon（专业术语）
Web page consists of objects
An object is a file such as an HTML file, a JPEG image, a Java applet, an audio file,…
A Web page consists of a base HTML-file and several referenced objects
The base HTML file references the other objects in the page with the object’s URLs (Uniform Resource Locators)
HTTP overview
HTTP: hypertext transfer protocol
Web’s application layer protocol
client/server model
- client: browser that requests, receives, “displays” Web objects
- server: Web server sends objects in response to requests
HTTP 1.0: RFC 1945
HTTP 1.1: RFC 2616
Uses TCP
client initiates TCP connection(creaates socket) to server, port 80
server accepts TCP connection from client
HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
TCP connection closed
HTTP is “stateless”(不记录之前的状态)
server maintains no information past client requests
HTTP connections
Nonpersistent HTTP
每抓一个object都建一个连线
At most one object is sent over a TCP connection.
HTTP/1.0
Response timie modeling
total = 2RTT+transmit time
one RTT to initiate TCP connection
one RTT for HTTP request and first few bytes of HTTP response to return
file transmission time

Definition of RTT: time to send a small packet to travel from client to server and back.

Persistent HTTP

每建一个连线，可以抓多个object

Multiple objects can be sent over single TCP connection between client and server.
A new connection need not be set up for the transfer of each Web object
HTTP/1.1

Nonpersistent HTTP issues

requires 2 RTTs per object

OS must work and allocate host resources for each TCP connection

but browsers often open parallel TCP connections to fetch referenced objects
Persistent HTTP
server leaves connection open after sending response
subsequent HTTP messages between same client/server are sent over connection

Persistent without pipelining

client issues new request only when previous response has been received
one RTT for each referenced object
Persistent with pipelining（并行）
default in HTTP/1.1
client sends requests as soon as it encounters a referenced object
as little as one RTT for all the referenced objects
HTTP message
two types of HTTP messages:request, response
HTTP request message:general format

GET /somedir/page.html HTTP/1.1

--  Request to return the object /somedir/page.html
--  The browser implements version HTTP/1.1

Host: www.someschool.edu

-- Specifies the host on which the object resides

User-agent: Mozilla/4.0

-- Specifies the browser type that is making the request

Connection: close

-- Indicates that the connection SHOULD NOT be considered `persistent`. It wants the server to close the connection after the current request/response is complete

Accept-language:fr

-- Indicates that the user prefers to receive a French version of the object

Method types

HTTP/1.0

GET : Return the object
POST : Send information to be stored on the server
HEAD :Return only information about the object, such as how old it is, but not the object itself

HTTP/1.1

GET, POST, HEAD
PUT : Uploads a new copy of existing object in entity body to path specified in URL field
DELETE: deletes object specified in the URL field

Uploading（上传） form input

Post method
- Web page often includes form input
- Input is uploaded to server in entity body
URL method
- Uses GET method
- Input is uploaded in URL field of request line:

HTTP response message

An HTTP response consists of the following:
1.A status line, which indicates the success or failure of the request
2.Header lines: A description of the information in the response. This is the metadata or meta information
3.The actual information requested

HTTP response status codes

200 OK

request succeeded, requested object later in this message

301 Moved Permanently

requested object moved, new location specified later in this message (Location:)

400 Bad Request

request message not understood by server

404 Not Found

requested document not found on this server

505 HTTP Version Not Supported

User-server interaction: authorization

Authorization:control access to server content

authorization credentials: typically name, password
User-server state : cookies

Cookies是数据包，可以让网页具有记忆功能，在某台电脑上记忆一定的信息。
Cookies的工作原理是，第一次由服务器端写入到客户端的系统中。以后每次访问这个网页，都是先由客户端将Cookies发送到服务器端，再由服务器端进行判断，然后再产生HTML代码返回给客户端。

Four components of cookie technology:

1)cookie header line in the HTTP response message
2)cookie header line in HTTP request message
3)cookie file kept on user’s browser
4)back-end database at Web site

Cookies: keeping “state”

网站根据用户的cookie file，去查数据库，知道用户之前浏览过哪些网页，从而对用户做个性化的推荐

Web caches(proxy server)

Goal: satisfy client request without involving origin server

user sets browser: Web accesses via cache
browser sends all HTTP requests to cache
- （object in cache)有备份直接从cache返回object
- (else)request object from origin server,然后再返回给client

Why Web caching

Reduce response time for client request.
Reduce traffic on an institution’s access link.
Internet dense with caches enables “poor” content providers to effectively deliver content (but so does P2P file sharing)
(把资料放到很多cache中，别人access cache就可以，减轻server负担)

如果增加bandwidth

如果用cache

negligible(微不足道的)
link使用率下降，在link上的delay就大幅下降
用比较小的频宽，得到较小的delay

2.3 FTP

file transfer protocol

transfer file to/from remote host
client/server model
- Client side: the side that initiates transfer(either to/from remote)
- Server side: remote host
ftp: RFC 959
ftp sever : port 21
FTP:seperate control, data connections
FTP client contacts FTP server at port 21, specifying TCP as transport protocol
Client obtains authorization over control connection –- username, password
Client browses remote directory by sending commands over control connection.
When server receives a command for a file transfer, the server opens a TCP data connection to client
After transferring one file server closes connection

Server opens a second TCP data connection to transfer another file.
Control connection: “out of band”
FTP server maintains “state”: current directory, earlier authentication

FTP commands, responses

2.4 Electronic Mail

SMTP, POP3, IMAP
Three maior components of a mail system:
user agents
mail servers
simple mail transfer protocol: SMTP
User Agent
Also known as “mail reader”
composing, editing, reading mail messages
e.g., Eudora, Outlook, elm, Netscape Messenger
outgoing, incoming messages stored on server
Maile servers
既是client(sending mail server),又是server(receiving mail server)
mailbox
mailbox contains incoming messages for user
message queue
message queue of outgoing (to be sent) mail messages
SMTP
Simple Mail Transfer Protocol
port 25
uses TCP to reliably transfer email message from client to server, port 25
direct transfer: sending server to receiving server
three phases of transfer
- handshaking (greeting)
- transfer of messages
- closure
command/response interaction
- commands: ASCII text
- response: status code and phrase
messages must be in 7-bit ASCII

1) Alice uses user agent to compose message and “to” bob@someschool.edu

2) Alice’s user agent sends message to her mail server; message placed in message queue

3) Alice’s mail server (Client side) of SMTP opens TCP connection with Bob’s mail server (server side)

4) SMTP client sends Alice’s message over the TCP connection

5) Bob’s mail server places the message in Bob’s mailbox

6) Bob invokes his user agent to read message

SMTP final words

SMTP uses persistent connections
SMTP requires message (header & body) to be in 7-bit ASCII
SMTP server uses CRLF.CRLF to determine end of message

Comparison with HTTP

HTTP: pull protocol (client’s point of view)
SMTP: push protocol
both have ASCII command/response interaction, status codes
HTTP does not require message to be in 7-bit ASCII
HTTP: one object in one response message
SMTP: multiple objects can be sent in one message
Mail access protocols
SMTP: delivery/storage ti receiver’s server
Maio acccess protocol: retrieval from server
- POP3：Post Office Protocol, version 3
- - authorization(agent<–>server)and download
- IMAP: Internet Mail Access Protocol
- - more features(more complex)
- - manipulation of stored messages on server
- HTTP: Hotmail, Yahoo!Mail,etc

2.5 DNS

Domain Name System (map between IP address and name)
Internet hosts, routers:

IP address(32 bit)-used for addressing datagrams
“name”, e.g., bbs.hupu.com
Domain Name System
A distributed database implemented in hierarchy of many name servers
An application-layer protocol that allows host, routers, name servers to communicate to resolve names (address/name translation)
- DNS provides a core Internet function, implemented as application-layer protocol
- DNS is an example of the Internet design philosophy of placing complexity at network’s “edge”
  DNS services
Hostname to IP address translation
Host aliasing
- Canonical and alias names
- Relay1.west-coast.enterprise.com
- enterprise.com and www.enterprise.com
Mail server aliasing
- bob@hotmail.com
- Relay1.west-coast.hotmail.com
Load distribution
- Replicated Web servers: set of IP addresses for one canonical name

Why not centralize DNS?

doesn’t scale

single point of failure
traffic volume
distant centralized database
maintenance

Distributed,Hierarchical Database

Client wants IP for www.amazon.com; 1st approx:

Client queries a root server to find com DNS server

Client queries com DNS server to get amazon.com DNS server

Client queries amazon.com DNS server to get IP address for www.amazon.com

DNS name servers

4 types of name servers

1) root name servers
2) top level name servers(to be explained next)
3) authoritative name servers
4) local name servers

Root name servers

13 root name servers worldwide
contacted by local name server that can not resolve name
root name server:
- gets mapping
- returns mapping to local name server
- contacts authoritative name server if name mapping not known
  TLD(Top-level domain servers)
  
  responsible for com, org, net, edu, etc, and all top-level country domains uk, fr, ca, jp.

Authoritative Servers

rorganization’s DNS servers, providing authoritative hostname to IP mappings for organization’s servers (e.g., Web and mail).

Can be maintained by organization or service provider

Local Name Server

Does not strictly belong to hierarchy
Each ISP (residential ISP, company, university) has one.
- Also called “default name server”
When a host makes a DNS query, query is sent to its local DNS server
- Acts as a proxy, forwards query into hierarchy.

Example

Iterated query

contacted server replies with name of server to contact

“I don’t know this name, but ask this server”

recursive query

puts burden of name resolution on contacted name server

heavy load?

DNS: caching and updating records

dynamic, 一定时间进行更新
statically, 写死进去，长期有效
DNS records
DNS: distributed database storing resource records(RR)
Type = A
- name is hostname
- value is IP address
- (relay1.bar.foo.com, 145.37.93.126, A)
Type = NS
- name is domain (e.g. foo.com)
- value is host name of an authoritative name server for this domain
- (foo.com, dns.foo.com, NS)
Type = CNAME
- name is alias name for some “canonical” (the real) name
- www.ibm.com is really servereast.backup2.ibm.com
- value is canonical name
- (foo.com, relay1.bar.foo.com, CNAME)
Type = MX
- name is alias name for some mail server
- value is the canonical name of the mail server
- (foo.com, mail.bar.foo.com, MX)
  DNS protocol, messages
  DNS protocol: query and reply messages, both with same message format message header
identification: : 16 bit # for query, reply to query uses same
flags:
- query or reply
- recursion desired
- recursion available
- reply is authoritative
  Inserting records into DNS
  DNS load balancing(DNS Round Robin)
  
  Replicated Web servers: set of IP addresses for one canonical name
  example
  1st request: 203.34.23.3
  2nd request: 203.34.23.4
  3rd request: 203.34.23.5
  4th request : 203.34.23.3

File distribution problem
上传很快，下载很慢。用clien-server model很浪费时间

P2P file sharing
All peers are servers and clients = highly scalable!

P2P: centralized directory

每一个peer上线的时候，就把IP和content注册到server的database中
当peer需要某个内容时候，先去server查询。然后根据查询结果与指定peer建连线
1) when peer connects, it informs central server:

IP address
content
2) Alice queries for “Hey Jude”
3) Alice requests file from Bob

Problem
1) single point of failure
2) Performance bottleneck
3) Copyright infringement
file transfer is decentralized, but locating content is highly centralized

P2P: Query flooding : Gnutella

类似于广播，没有centralized的server来记录哪个peer有哪些内容

fully distributed
- no central server
public domain protocol
many Gnutella clients implementing protocol

overlay network（重叠网络）: graph

edge between peer X and Y if there’s a TCP connection（双向的连接）
all active peers and edges is overlay net.
Edge is not a physical link
Given peer will typically be connected with < 10 overlay neighbors

Gnutella: Peer joining

1.Joining peer X must find some other peer in Gnutella network: use list of candidate peers
2.X sequentially attempts to make TCP with peers on list until connection setup with Y
3.X sends Ping message to Y; Y forwards Ping message.
4.All peers receiving Ping message respond with Pong message
5.X receives many Pong messages. It can then setup additional TCP connections

Exploiting heterogeneity: KaZaA

Each peer is either a group leader or assigned to a group leader.
- TCP connection between peer and its group leader.
- TCP connections between some pairs of group leaders.
Group leader tracks the content in all its children.

Quering

Each file has a hash and a descriptor

Client sends keyword query to its group leader

Group leader responds with matches:

For each match: metadata（描述data的文字）, hash, IP address

If group leader forwards query to other group leaders, they respond with matches（转给其他的group leader)

Client then selects files for downloading

HTTP requests using hash as identifier sent to peers holding desired file

Kazaa tricks

Request queuing（限制某一时间，抓取自己peer中file的数量，因为太多会限制自己的频宽）
- Limitation on the number of simultaneous uploads
Incentive priorities（上传量越多，优先级越高）
- Give priority to users who have uploaded more files than they have downloaded
Parallel downloading（抓一个file，可以同时找几个peer同时建连线，同时要一部分）
- Use the byte-range header of HTTP to request different portion of the file from different peers

Chapter 2:Application layer

2.1 Principles of network applications

Application architecture

Client-server

Pure P2P architecture

Hybrid of client-server and P2P

Processes communicating

Processes communicating across network

Addressing processes

App-layer protocol defines

What transport service does an app need?

Data loss

Timing（时效性）

Bandwidth

Transport service requirements of common apps

TCP service

UDP service

2.2 Web and HTTP

First some jargon（专业术语）

HTTP overview

HTTP: hypertext transfer protocol

Uses TCP

HTTP is “stateless”(不记录之前的状态)

HTTP connections

Nonpersistent HTTP

Response timie modeling

Persistent HTTP

Persistent without pipelining

Persistent with pipelining（并行）

HTTP message

HTTP request message:general format

Method types

HTTP response message

HTTP response status codes

User-server interaction: authorization

User-server state : cookies

Web caches(proxy server)

2.3 FTP

FTP:seperate control, data connections

FTP commands, responses

2.4 Electronic Mail

User Agent

Maile servers

mailbox

message queue

SMTP

SMTP final words

Mail access protocols

2.5 DNS

Domain Name System

DNS services