web space | website hosting | Business Hosting | Free Website Submission | shopping cart | php hosting
Optimized Memory Transfer and Flow Control
for High Speed Networks

Project Documents
Project Report(PDF)
Abstract(PDF)
IEEE-ACE 2003 Paper(PDF)
The journey of life has no blueprints.
You find it as you grow
through prayer, joy, pain, and love.
Keep moving on your path, keep learning and trying
for the good and the best
and it will take you to where ever you want to go.

Abstract


On high-speed networks, data transmission is limited by the data copy, which occurs in the network stack at the end-hosts. Also on high-speed network, the fundamental problem is that current flow control protocols are unable to determine the changes in the available network bandwidth in a short period of time.

This project aims to optimize the network bandwidth in high-speed networks by eliminating the data copy between the application and kernel domains during data transfer and by implementing a flow control algorithm that adapts to the changes in available network bandwidth in as little as one round trip time.

Elimination of the data copy between the application and operating system kernel is achieved by the Zbufs Framework, which performs explicit exchange of buffers during data transfer through the network stack. We also present an additional set of APIs to take advantage of the new I/O framework, which lends itself to an efficient elimination of this copy. The increase in network throughput and reduction in CPU utilization justifies the new set of APIs at the socket interface.

Packet-Pair is a feedback-based adaptive rate control scheme for faster networks that use Fair Queuing at switches. Under packet-pair, each source continually estimates the effective speed of its bottleneck link by transmitting its data as pairs of back-to-back packets and measuring the separation of the returning pairs of acknowledgements. The source periodically adjusts its sending rate in an effort to avoid both overflow and underflow of its logical buffer at the bottleneck switch. The packet-pair scheme includes a timeout and retransmission strategy optimized to deal with losses.


Objective


Networking PCs with their modest memory subsystem can achieve high performance on high-speed networks with highly optimized software subsystem. The objective of this project is to provide an optimized TCP/IP stack by eliminating the data copying at the socket interface and implementing a feedback-based adaptive rate-based flow control algorithm to maximize the network bandwidth utilization.


Need


For high-speed networks, the limiting factor for data transmission is not CPU processing power but the ability to move data through the host network stack and memory. The data movement is expensive as a lot of CPU cycles are consumed for every copy operation and it tends to pollute the data cache. Eliminating the cost of data copying between application and operating system kernel is an important step towards an efficient I/O system.

We have implemented an efficient I/O framework called the Zbuf framework, which relies on sharing of memory between the user and kernel address space. We perform explicit exchange of buffers while performing data transfer thus eliminating the data copy at the user-kernel interface.

Also in high-speed networks the bandwidth-delay product is high. Hence the delay in knowing the network state causes buffer buildups and eventually causes congestion. The window-based flow control algorithm takes around log2(N) + N/2 round trip times to reach the optimal window size of 'N' (i.e the bandwidth-delay product).

The Packet-Pair flow control algorithm responds quickly to changes in the network state within approximately one round trip time as against other flow-control schemes. The idea is to send data as back-to-back packets and measure the inter-ack spacing, which determines the allocated service rate of the network and sends data at this rate. However this algorithm assumes that all the bottlenecks (routers and switches) in the network, serve packets in Fair Queuing Discipline.


Design


Zbuf Framework



    Assumption
      After the application program does a send() on a buffer, the buffer may be reused but the application generally does not reuses the contents of buffers. Therefore we explicitly exchange buffers when performing data transfer to achieve zero copy.

      Hardware Checksumming support would further enhance our implementation.

    Features
      In the zero bufs (zbufs) framework virtual address of the application and kernel are different, however map onto the same physical address. Also explicit exchange of buffers, occurs when performing I/O operations.

      The application intending to use the framework initially needs to register with the framework every socket that it uses.

      To implement this framework a new set of APIs namely have been developed, to allocate-deallocate zbufs, read-write data into zbufs and send-receive zbufs.

      Two system calls have been implemented to enable this framework to service the allocation-deallocation of zbufs and to service the socket calls for registration.

      When an application does a send() on the Zbufs, the Zbuf is tranferred from the application domain to the kernel domain. While returning back to the user, a new Zbuf is allocated and explicitly exchange the Zbufs for further use by the application. Also if the NIC does not support hardware checksumming, TCP checksumming is done at the device driver while copying the data into its own buffers.

      On reception of a data packet, the device driver examins a part of the data packet to determine whether the port has been registered with the framework,if so the device driver allocates a Zbuf and places the data in it. When the application program does a recv(), the socket layer dequeues the Zbuf from its receive queue and transfers the Zbuf to the application domain. The previously allocated user-level Zbuf is also deallocated before returning to the user process.

      The zbufs framework does not modify nor delete page table entries; hence there are no TLB flushes. In addition the framework does not use any locks (including VM locks) hence the performance of this framework will not degrade even on multiprocessor systems.

      The security and modularity aspects have been taken into consideration while designing and implementing this module.

Packet Pair Algorithm



    Assumption
      All servers in the network follow the fair queuing discipline.
      Routing table updates are infrequent.
      Packets are of same size.

    Basic Idea
      The source transmits data in the form of packet-pair probes. When two packets belonging to the same connection enter a server back-to-back, an interval that is inversely proportional to the connection's service rate at the bottleneck separates them when they leave. This separation is the largest at the bottleneck server. If the source measures the inter-ack spacing, it can directly determine the bottleneck service rate with every pair of acks it receives. If the bottleneck service rate changes with time, the source automatically detects and adapts to the changes within approximately one round trip time (RTT). Though rate control is used to choose the current sending rate, there is window limit as well, which prevents the flow control protocol from overflowing buffers even when there are errors in rate monitoring.


Implemented in Linux 2.4.18 Kernel