You are on page 1of 88

LinuxKernelNetworking

RamiRosen
ramirose@gmail.com
Haifux,August2007

Disclaimer
Everything in this lecture shall not, under any
circumstances, hold any legal liability whatsoever.
Any usage of the data and information in this document
shall be solely on the responsibility of the user.
This lecture is not given on behalf of any company
or organization.

Warning

Thislecturewilldealwithdesignfunctional

descriptionsidebysidewithmanyimplementationdetails;
someknowledgeofCispreferred.

General

TheLinuxnetworkingkernelcode(includingnetworkdevice
drivers)isalargepartoftheLinuxkernelcode.
Scope:Wewillnotdealwithwireless,IPv6,andmulticasting.

Alsonotwithuserspaceroutingdaemons/apps,andwith
securityattacks(likeDoS,spoofing,etc.).

Understandingapacketwalkthroughinthekernelisakeyto
understandingkernelnetworking.Understandingitisamustif

wewanttounderstandNetfilterorIPSecinternals,andmore.

Thereisa10pagesLinuxkernelnetworkingwalkthrouhdocument

GeneralContd.

Thoughitdealswith2.4.20Linuxkernel,mostofitisrelevant.
Thislecturewillconcentrateonthiswalkthrough(designand
implementationdetails).

Referencestocodeinthislecturearebasedonlinux2.6.23rc2.

Therewassomeseriouscleanupin2.6.23

Hierarchyofnetworkinglayers

Thelayersthatwewilldealwith(basedonthe7layersmodel)are:

TransportLayer(L4)(udp,tcp...)

NetworkLayer(L3)(ip)

LinkLayer(L2)(ethernet)

NetworkingDataStructures

Thetwomostimportantstructuresoflinuxkernelnetworklayer
are:

sk_buff(definedininclude/linux/skbuff.h)

netdevice(definedininclude/linux/netdevice.h)

Itisbettertoknowabitaboutthembeforedelvingintothe
walkthroughcode.

SK_BUFF

sk_buffrepresentsdataandheaders.

sk_buffAPI(examples)

sk_buffallocationisdonewithalloc_skb()ordev_alloc_skb();
driversusedev_alloc_skb();.(freebykfree_skb()and
dev_kfree_skb().

unsignedchar*data:pointstothecurrentheader.
skb_pull(intlen)removesdatafromthestartofabufferby
advancingdatatodata+lenandbydecreasinglen.
Almostalwayssk_buffinstancesappearasskbinthekernel

code.

SK_BUFFcontd

sk_buffincludes3unions;eachcorrespondstoakernelnetwork
layer:
transport_header(previouslycalledh)forlayer4,thetransport
layer(canincludetcpheaderorudpheaderoricmpheader,and
more)

network_header(previouslycallednh)forlayer3,thenetwork
layer(canincludeipheaderoripv6headerorarpheader).

mac_header(previouslycalledmac)forlayer2,thelinklayer.
skb_network_header(skb),skb_transport_header(skb)and

skb_mac_header(skb)returnpointertotheheader.

SK_BUFFcontd.

structdst_entry*dsttherouteforthissk_buff;thisrouteis
determinedbytheroutingsubsystem.

Ithas2importantfunctionpointers:

int(*input)(structsk_buff*);

int (*output)(structsk_buff*);

input()canbeassignedtooneofthefollowing:ip_local_deliver,
ip_forward,ip_mr_input,ip_errorordst_discard_in.

output()canbeassignedtooneofthefollowing:ip_output,
ip_mc_output,ip_rt_bug,ordst_discard_out.

SK_BUFFcontd.

Intheusualcase,thereisonlyonedst_entryforeveryskb.
WhenusingIPSec,thereisalinkedlistofdst_entriesandonlythe
lastoneisforrouting;allotherdst_entriesareforIPSec
transformers;theseotherdst_entrieshavetheDST_NOHASH
flagset.
tstamp(oftypektime_t):timestampofreceivingthepacket.

net_enable_timestamp()mustbecalledinordertogetvalues.

net_device

net_devicerepresentsanetworkinterfacecard.

Therearecaseswhenweworkwithvirtualdevices.

Forexample,bonding(settingthesameIPfortwoormore
NICs,forloadbalancingandforhighavailability.)

Manytimesthisisimplementedusingtheprivatedataofthe
device(thevoid*privmemberofnet_device);

InOpenSolaristhereisaspecialpseudodrivercalledvnic
whichenablesbandwidthallocation(projectCrossBow).

Importantmembers:

net_devicecontd

unsignedintmtuMaximumTransmissionUnit:themaximum
sizeofframethedevicecanhandle.

Eachprotocolhasmtuofitsown;thedefaultis1500forEthernet.

youcanchangethemtuwithifconfig;forexample,likethis:

ifconfigeth0mtu1400

Youcannotofcourse,changeittovalueshigherthan1500on
10Mb/snetwork:

ifconfigeth0mtu1501willgive:

SIOCSIFMTU:Invalidargument

net_devicecontd

unsignedintflags(whichyouseeorsetusingifconfigutility):
forexample,RUNNINGorNOARP.

unsignedchardev_addr[MAX_ADDR_LEN]:theMACaddress
ofthedevice(6bytes).

int (*hard_start_xmit)(structsk_buff*skb,

structnet_device*dev);

apointertothedevicetransmitmethod.

int promiscuity;(acounterofthetimesaNICistoldtosetto
workinpromiscuousmode;usedtoenablemorethanonesniffing

client.)

net_devicecontd

YouarelikelytoencountermacrosstartingwithIN_DEVlike:

IN_DEV_FORWARD()orIN_DEV_RX_REDIRECTS().Howarethe
relatedtonet_device?Howarethesemacrosimplemented?

void*ip_ptr:IPv4specificdata.Thispointerisassignedtoa
pointertoin_deviceininetdev_init()(net/ipv4/devinet.c)

net_deviceContd.

structin_devicehaveamembernamedcnf(instanceof
ipv4_devconf).Setting/proc/sys/net/ipv4/conf/all/forwarding

eventuallysetstheforwardingmemberofin_deviceto1.
Thesameistruetoaccept_redirectsandsend_redirects;both
arealsomembersofcnf(ipv4_devconf).

Inmostdistros,/proc/sys/net/ipv4/conf/all/forwarding=0

ButprobablythisisnotsoonyourADSLrouter.

networkinterfacedrivers

MostofthenicsarePCIdevices;therearealsosomeUSB
networkdevices.
ThedriversfornetworkPCIdevicesusethegenericPCIcalls,like

pci_register_driver()andpci_enable_device().

FormoreinfoonnicdrivesseethearticleWritingNetwork
DeviceDriverforLinux(linkno.9inlinks)andchap17inldd3.

TherearetwomodesinwhichaNICcanreceiveapacket.

Thetraditionalwayisinterruptdriven:eachreceivedpacketis
anasynchronouseventwhichcausesaninterrupt.

NAPI

NAPI(newAPI).

TheNICworksinpollingmode.

Inorderthatthenicwillworkinpollingmodeitshouldbebuilt
withaproperflag.

Mostofthenewdriverssupportthisfeature.

WhenworkingwithNAPIandwhenthereisaveryhighload,

packetsarelost;butthisoccursbeforetheyarefedintothe
networkstack.(inthenonNAPIdrivertheypassintothestack)

inSolaris,pollingisbuiltintothekernel(noneedtobuild

UserSpaceTools

iputils(includingping,arping,andmore)

nettools(ifconfig,netstat,,route,arpandmore)

IPROUTE2(ipcommandwithmanyoptions)

UsesrtnetlinkAPI.

Hasmuchwiderfunctionalities;forexample,youcancreate
tunnelswithipcommand.

Note:noneedfornflagwhenusingIPROUTE2(becauseit
doesnotworkwithDNS).

RoutingSubsystem

Theroutingtableandtheroutingcacheenableustofindthenet
deviceandtheaddressofthehosttowhichapacketwillbesent.
Readingentriesintheroutingtableisdonebycalling
fib_lookup(conststructflowi*flp,structfib_result*res)

FIBistheForwardingInformationBase.

Therearetworoutingtablesbydefault:(nonPolicyRoutingcase)

localFIBtable(ip_fib_local_table;ID255).

mainFIBtable(ip_fib_main_table;ID254)

See:include/net/ip_fib.h.

RoutingSubsystemcontd.

Routescanbeaddedintothemainroutingtableinoneof3ways:

Bysysadmincommand(routeadd/iproute).

Byroutingdaemons.

AsaresultofICMP(REDIRECT).

Aroutingtableisimplementedbystructfib_table.

RoutingTables

fib_lookup()firstsearchesthelocalFIBtable(ip_fib_local_table).
Incaseitdoesnotfindanentry,itlooksinthemainFIBtable
(ip_fib_main_table).

Whyisitinthisorder?

Thereisoneroutingcache,regardlessofhowmanyroutingtables
thereare.

YoucanseetheroutingcachebyrunningrouteC.

Alternatively,youcanseeitby:cat/proc/net/rt_cache.

con:thisway,theaddressesareinhexformat

RoutingCache

Theroutingcacheisbuiltofrtableelements:

structrtable(see:/include/net/route.h)
{
union{
structdst_entry dst;
}u;
...

RoutingCachecontd

Thedst_entryistheprotocolindependentpart.

Thus,forexample,wehaveadst_entrymember(also
calleddst)inrt6_infoinipv6.(include/net/ip6_fib.h)

ThekeyforalookupoperationintheroutingcacheisanIP
address(whereasintheroutingtablethekeyisasubnet).
Insertingelementsintotheroutingcacheby:rt_intern_hash()
Thereisanalternatemechanismforroutecachelookup,
calledfib_trie,whichisinsidethekerneltree
(net/ipv4/fib_trie.c)

RoutingCachecontd

Itisbasedonextendingthelookupkey.

Youshouldset:CONFIG_IP_FIB_TRIE(=y)

(insteadofCONFIG_IP_FIB_HASH)

ByRobertOlssonetal(seelinks).

CreatingaRoutingCacheEntry

Allocationofrtableinstance(rth)isdoneby:dst_alloc().

Settinginputandoutputmethodsofdst:

(rth>u.dst.inputandrth>u.dst.input)

Settingtheflowimemberofdst(rth>fl)

dst_alloc()infactcreatesandreturnsapointerto
dst_entryandwecastittortable(net/core/dst.c).

Nexttimethereisalookupinthecache,forexample,
ip_route_input(),wewillcompareagainstrth>fl.

RoutingCacheContd.

Agarbagecollectioncallwhichdelete

eligibleentriesfromtheroutingcache.

Whichentriesarenoteligible?

PolicyRouting(multipletables)

Genericroutingusesdestinationaddressbaseddecisions.
Therearecaseswhenthedestinationaddressisnotthesole
parametertodecidewhichroutetogive;PolicyRoutingcomesto
enablethis.

PolicyRouting(multipletables)contd.

Addingaroutingtable:byaddingalineto:/etc/iproute2/rt_tables.

Forexample:addtheline252my_rt_table.

Therecanbeupto255routingtables.

Policyroutingshouldbeenabledwhenbuildingthekernel
(CONFIG_IP_MULTIPLE_TABLESshouldbeset.)

Exampleofaddingarouteinthistable:

>iprouteadddefaultvia192.168.0.1tablemy_rt_table

Showthetableby:

iprouteshowtablemy_rt_table

PolicyRouting(multipletables)contd.

Youcanaddaruletotheroutingpolicydatabase(RPDB)

byipruleadd...

Therulecanbebasedoninputinterface,TOS,fwmark
(fromnetfilter).

iprulelistshowallrules.

PolicyRouting:add/deletearuleexample

ipruleaddtos0x04table252

Thiswillcausepacketswithtos=0x08(intheiphdr)

toberoutedbylookingintothetableweadded(252)

Sothedefaultgwforthesetypeofpacketswillbe
192.168.0.1

ipruleshowwillgive:

32765:fromalltosreliabilitylookupmy_rt_table

...

PolicyRouting:add/deletearuleexample

Deletearule:ipruledeltos0x04table252

RoutingLookup
ip_route_input()in:net/ipv4/route.c

Cachelookup

Hit

Miss
ip_route_input_slow()
in:net/ipv4/route.c

fib_lookup()in
ip_fib_local_table

Hit

Miss

Droppacket

orip_forward()
accordingtoresult

Miss
fib_lookup()in
ip_fib_main_table

Deliverpacketby:
ip_local_deliver()

Hit

RoutingTableDiagram
fib_table

33

tb_lookup()
tb_insert()
tb_delete()

structfn_zone
structfn_zone
...
...
structfn_zone

structfn_zone
fz_hash

structfib_node fib_node
hlist_head
hlist_head

fz_divisor

hlist_head
...
hlist_head

fn_alias
fn_alias

fn_key

fn_key

structfib_alias
fa_info

structfib_info

fib_nh

RoutingTables

Breakingthefib_tableintomultipledatastructuresgives
flexibilityandenablesfinegrainedandhighlevelofsharing.

Supposethatwe10routesto10differentnetworkshave
thesamenexthopgw.

Wecanhaveonefib_infowhichwillbesharedby10
fib_aliases.

fz_divisoristhenumberofbuckets

RoutingTablescontd

Eachfib_nodeelementrepresentsauniquesubnet.

Thefn_keymemberoffib_nodeisthesubnet(32bit)

RoutingTablescontd

Supposethatadevicegoesdownorenabled.

Weneedtodisable/enableallrouteswhichusethisdevice.

Buthowcanweknowwhichroutesusethisdevice?

Inordertoknowitefficiently,thereisthefib_info_devhash
table.

Thistableisindexedbythedeviceidentifier.

Seefib_sync_down()andfib_sync_up()in
net/ipv4/fib_semantics.c

RoutingTablelookupalgorithm

LPM(LongestPrefixMatch)isthelookupalgorithm.

Theroutewiththelongestnetmaskistheonechosen.

Netmask0,whichistheshortestnetmask,isforthedefault
gateway.

Whathappenswhentherearemultipleentrieswith
netmask=0?

fib_lookup()returnsthefirstentryitfindsinthefibtable
wherenetmasklengthis0.

RoutingTablelookupcontd.

Itmaybethatthisisnotthebestchoicedefaultgateway.
Soincasethatnetmaskis0(prefixlenofthefib_resultreturned
fromfib_lookis0)wecallfib_select_default().
fib_select_default()willselecttheroutewiththelowestpriority

(metric)(bycomparingtofib_priorityvaluesofalldefault
gateways).

Receivingapacket

Whenworkingininterruptdrivenmodel,thenicregistersan
interrupthandlerwiththeIRQwithwhichthedeviceworksby
callingrequest_irq().

Thisinterrupthandlerwillbecalledwhenaframeisreceived

Thesameinterrupthandlerwillbecalledwhentransmissionofa
frameisfinishedandunderotherconditions.(dependsonthe
NIC;sometimes,theinterrupthandlerwillbecalledwhenthereis
someerror).

Receivingapacketcontd

Typicallyinthehandler,weallocatesk_buffbycalling
dev_alloc_skb();alsoeth_type_trans()iscalled;amongother
thingsitadvancesthedatapointerofthesk_bufftopointtotheIP
header;thisisdonebycallingskb_pull(skb,ETH_HLEN).
See:net/ethernet/eth.c

ETH_HLENis14,thesizeofethernetheader.

Receivingapacketcontd

Thehandlerforreceivingapacketisip_rcv().(net/ipv4/ip_input.c)

Handlerfortheprotocolsareregisteredatinitphase.

Likewise,arp_rcv()isthehandlerforARPpackets.

First,ip_rcv()performssomesanitychecks.Forexample:
if(iph>ihl<5||iph>version!=4)
gotoinhdr_error;

iphistheipheader;iph>ihlistheipheaderlength(4bits).

Theipheadermustbeatleast20bytes.

Itcanbeupto60bytes(whenweuseipoptions)

Receivingapacketcontd

Thenitcallsip_rcv_finish(),by:

NF_HOOK(PF_INET,NF_IP_PRE_ROUTING,skb,dev,NULL,
ip_rcv_finish);

Thisdivisionofmethodsintotwostages(wherethesecondhas
thesamenamewiththesuffixfinishorslow,istypicalfor
networkingkernelcode.)
Inmanycasesthesecondmethodhasaslowsuffixinsteadof
finish;thisusuallyhappenswhenthefirstmethodlooksinsome
cacheandthesecondmethodperformsalookupinatable,which

isslower.

Receivingapacketcontd

ip_rcv_finish()implementation:

if(skb>dst==NULL){
interr=ip_route_input(skb,iph>daddr,iph>saddr,iph>tos,
skb>dev);
...
}
...

returndst_input(skb);

Receivingapacketcontd

ip_route_input():

Firstperformsalookupintheroutingcachetoseeifthereisa
match.Ifthereisnomatch(cachemiss),calls
ip_route_input_slow()toperformalookupintheroutingtable.
(Thislookupisdonebycallingfib_lookup()).

fib_lookup(conststructflowi*flp,structfib_result*res)

Theresultsarekeptinfib_result.

ip_route_input()returns0uponsuccessfullookup.(alsowhen
thereisacachemissbutasuccessfullookupintheroutingtable.)

Receivingapacketcontd
Accordingtotheresultsoffib_lookup(),weknowiftheframeisfor
localdeliveryorforforwardingortobedropped.

Iftheframeisforlocaldelivery,wewillsettheinput()function
pointeroftheroutetoip_local_deliver():

rth>u.dst.input=ip_local_deliver;

Iftheframeistobeforwarded,wewillsettheinput()function
pointertoip_forward():

rth>u.dst.input=ip_forward;

LocalDelivery

Prototype:
ip_local_deliver(structsk_buff*skb)(net/ipv4/ip_input.c).

callsNF_HOOK(PF_INET,NF_IP_LOCAL_IN,skb,skb>dev,
NULL,ip_local_deliver_finish);

Deliversthepackettothehigherprotocollayersaccordingtoits
type.

Forwarding

Prototype:

intip_forward(structsk_buff*skb)

(net/ipv4/ip_forward.c)

decreasesthettlintheipheader

Ifthettlis<=1,themethodssendICMPmessage
(ICMP_TIME_EXCEEDED)anddropsthepacket.

CallsNF_HOOK(PF_INET,NF_IP_FORWARD,skb,skb>dev,
rt>u.dst.dev,ip_forward_finish);

ForwardingContd

ip_forward_finish():sendsthepacketoutbycalling
dst_output(skb).
dst_output(skb)isjustawrapper,whichcalls

skb>dst>output(skb).(seeinclude/net/dst.h)

SendingaPacket

Handlingofsendingapacketisdoneby
ip_route_output_key().
Weneedtoperformroutinglookupalsointhecaseof

transmission.

Incaseofacachemiss,wecallsip_route_output_slow(),

whichlooksintheroutingtable(bycallingfib_lookup(),as
alsoisdoneinip_route_input_slow().)

Ifthepacketisforaremotehost,wesetdst>outputto

ip_output()

SendingaPacketcontd

ip_output()willcallip_finish_output()

ThisistheNF_IP_POST_ROUTINGpoint.

ip_finish_output()willeventuallysendthepacketfroma
neighborby:

dst>neighbour>output(skb)

arp_bind_neighbour()seestoitthattheL2addressofthe
nexthopwillbeknown.(net/ipv4/arp.c)

SendingaPacketContd.

Ifthepacketisforthelocalmachine:

dst>output=ip_output

dst>input=ip_local_deliver

ip_output()willsendthepacketontheloopbackdevice,

Thenwewillgointoip_rcv()andip_rcv_finish(),butthis
timedstisNOTnull;sowewillendinip_local_deliver().

See:net/ipv4/route.c

Multipathrouting

Thisfeatureenablestheadministratortosetmultiplenext
hopsforadestination.
Toenablemultipathrouting,
CONFIG_IP_ROUTE_MULTIPATHshouldbesetwhen
buildingthekernel.
Therewasalsoanoptionformultipathcaching:(bysetting
CONFIG_IP_ROUTE_MULTIPATH_CACHED).
Itwasexperimentalandremovedin2.6.23Seelinks(6).

Netfilter

Netfilteristhekernellayertosupportapplyingiptablesrultes.

Itenables:

Filtering

Changingpackets(masquerading)

ConnectionTracking

Netfilterruleexample

Shortexample:

Applyingthefollowingiptablesrule:

iptablesAINPUTpudpdport9999jDROP

ThisisNF_IP_LOCAL_INrule;

Thepacketwillgoto:

ip_rcv()

andthen:ip_rcv_finish()

Andthenip_local_deliver()

Netfilterruleexample(contd)

butitwillNOTproceedtoip_local_deliver_finish()asinthe
usualcase,withoutthisrule.

Asaresultofapplyingthisruleitreachesnf_hook_slow()
withverdict==NF_DROP(callsskb_free()tofreethepacket)
See/net/netfilter/core.c.

ICMPredirectmessage

ICMPprotocolisusedtonotifyaboutproblems.

AREDIRECTmessageissentincasetheroute
issuboptimal(inefficient).

Thereareinfact4typesofREDIRECT

Onlyoneisused:

RedirectHost(ICMP_REDIR_HOST)

SeeRFC1812(RequirementsforIPVersion4Routers).

ICMPredirectmessagecontd.

TosupportsendingICMPredirects,themachineshouldbe
configuredtosendredirectmessages.

/proc/sys/net/ipv4/conf/all/send_redirectsshouldbe1.

Inorderthattheothersidewillreceiveredirects,weshould
set
/proc/sys/net/ipv4/conf/all/accept_redirectsto1.

ICMPredirectmessagecontd.

Example:

Addasuboptimalrouteon192.168.0.31:

routeaddnet192.168.0.10netmask255.255.255.255gw
192.168.0.121
Runningnowrouteon192.168.0.31willshowanewentry:

DestinationGatewayGenmaskFlagsMetricRefUseIface
192.168.0.10192.168.0.121255.255.255.255UGH000eth0

ICMPredirectmessagecontd.

Sendpacketsfrom192.168.0.31to192.168.0.10:

ping192.168.0.10(from192.168.0.31)

Wewillsee(on192.168.0.31):

From192.168.0.121:icmp_seq=2RedirectHost(New
nexthop:192.168.0.10)

now,runningon192.168.0.121:

routeCn|grep.10

showsthatthereisanewentryintheroutingcache:

ICMPredirectmessagecontd.

192.168.0.31192.168.0.10192.168.0.10ri0034eth0

Therintheflagscolumnmeans:RTCF_DOREDIRECT.

The192.168.0.121machinehadsentaredirectbycalling
ip_rt_send_redirect()fromip_forward().

(net/ipv4/ip_forward.c)

ICMPredirectmessagecontd.

Andon192.168.0.31,runningrouteC|grep.10shows
nowanewentryintheroutingcache:(incase
accept_redirects=1)
192.168.0.31192.168.0.10192.168.0.10001
eth0

Incaseaccept_redirects=0(on192.168.0.31),wewillsee:

192.168.0.31192.168.0.10192.168.0.121000eth0

whichmeansthatthegwisstill192.168.0.121(whichisthe

ICMPredirectmessagecontd.

Addinganentrytotheroutingcacheasaresultofgetting
ICMPREDIRECTisdoneinip_rt_redirect(),net/ipv4/route.c.
Theentryintheroutingtableisnotdeleted.

NeighboringSubsystem

Mostknownprotocol:ARP(inIPV6:ND,neighbourdiscovery)

ARPtable.

Ethernetheaderis14byteslong:

Sourcemacaddress(6bytes).

Destinationmacaddress(6bytes).

Type(2bytes).

0x0800isthetypeforIPpacket(ETH_P_IP)

0x0806isthetypeforARPpacket(ETH_P_ARP)

see:include/linux/if_ether.h

NeighboringSubsystemcontd

WhenthereisnoentryintheARPcacheforthedestinationIP
addressofapacket,abroadcastissent(ARPrequest,
ARPOP_REQUEST:whohasIPaddressx.y.z...).Thisisdoneby
amethodcalledarp_solicit().(net/ipv4/arp.c)
Youcanseethecontentsofthearptablebyrunning:

cat/proc/net/arporbyrunningthearpfromacommandline.

Youcandeleteandaddentriestothearptable;seemanarp.

BridgingSubsystem

Youcandefineabridgeandadd NICstoit(enslaving
ports)usingbrctl(frombridgeutils).
Youcanhaveupto1024portsforeverybridgedevice
(BR_MAX_PORTS).

Example:

brctladdbrmybr

brctladdifmybreth0

brctlshow

BridgingSubsystemcontd.

WhenaNICisconfiguredasabridgeport,thebr_port
memberofnet_deviceisinitialized.

(br_portisaninstanceofstructnet_bridge_port).

Whenwereceiveaframe,netif_receive_skb()calls
handle_bridge().

BridgingSubsystemcontd.

Thebridgingforwardingdatabaseissearchedforthe

destinationMACaddress.

Incaseofahit,theframeissenttothebridgeportwith
br_forward()(net/bridge/br_forward.c).
Ifthereisamiss,theframeisfloodedonall

bridgeportsusingbr_flood()(net/bridge/br_forward.c).

Note:thisisnotabroadcast!

TheebtablesmechanismistheL2parallelofL3Netfilter.

BridgingSubsystemcontd

Ebtablesenableustofilterandmanglepackets

atthelinklayer(L2).

IPSec

WorksatnetworkIPlayer(L3)

UsedinmanyformsofsecurednetworkslikeVPNs.

MandatoryinIPv6.(notinIPv4)

Implementedinmanyoperatingsystems:Linux,Solaris,Windows,
andmore.

RFC2401

In2.6kernel:implementedbyDaveMillerandAlexeyKuznetsov.

Transformationbundles.

Chainofdstentries;onlythelastoneisforrouting.

IPSeccont.

Userspacetools:http://ipsectools.sf.net

BuildingVPN:http://www.openswan.org/(OpenSource).

TherearealsononIPSecsolutionsforVPN

example:pptp

structxfrm_policyhasthefollowingmember:

structdst_entry*bundles.

__xfrm4_bundle_create()createsdst_entries(withthe
DST_NOHASHflag)see:net/ipv4/xfrm4_policy.c

TransportModeandTunnelMode.

IPSeccontd.

Showthesecuritypolicies:

ipxfrmpolicyshow

CreateRSAkeys:

ipsecrsasigkeyverbose2048>keys.txt

ipsecshowhostkeyleft>left.publickey

ipsecshowhostkeyright>right.publickey

IPSeccontd.
Example:HosttoHostVPN(usingopenswan)
in/etc/ipsec.conf:
connlinuxtolinux
left=192.168.0.189
leftnexthop=%direct
leftrsasigkey=0sAQPPQ...
right=192.168.0.45
rightnexthop=%direct
rightrsasigkey=0sAQNwb...
type=tunnel

auto=start

IPSeccontd.

serviceipsecstart(tostarttheservice)
ipsecverifyCheckyoursystemtoseeifIPsecgotinstalledand
startedcorrectly.
ipsecautostatus

IfyouseeIPsecSAestablished,thisimpliessuccess.

Lookforerrorsin/var/log/secure(fedoracore)orinkernelsyslog

Tipsforhacking

Documentation/networking/ipsysctl.txt:networkingkerneltunabels

Exampleofreadingahexaddress:

iph>daddr==0x0A00A8C0or

meanscheckingiftheaddressis192.168.0.10(C0=192,A8=168,
00=0,0A=10).

TipsforhackingContd.

Disablepingreply:

echo1>/proc/sys/net/ipv4/icmp_echo_ignore_all

Disablearp:iplinkseteth0arpoff(theNOARPflagwillbeset)

Alsoifconfigeth0arphasthesameeffect.

HowcanyougetthePathMTUtoadestination(PMTU)?

Usetracepath(seemantracepath).

Tracepathisfromiputils.

TipsforhackingContd.

Keepiphdrstructhandy(printout):(fromlinux/ip.h)

structiphdr{
__u8

ihl:4,

version:4;
__u8 tos;
__be16

tot_len;

__be16

id;

__be16

frag_off;

__u8 ttl;
__u8 protocol;
__sum16

check;

__be32

saddr;

__be32

daddr;

/*Theoptionsstarthere.*/

};

TipsforhackingContd.

NIPQUAD():macroforprintinghexaddresses

CONFIG_NET_DMAisforTCP/IPoffload.

Whenyouencounter:xfrm/CONFIG_XFRMthishastotodowith
IPSEC.(transformers).

Newandfuturetrends

IO/AT.

NetChannels(VanJacobsonandEvgeniyPolyakov).

TCPOffloading.

RDMA.

Mulitqueus.:somenewnics,likee1000andIPW2200,

allowtwoormorehardwareTxqueues.Therearealready
patchestoenablethis.

Newandfuturetrendscontd.

See:EnablingLinuxNetworkSupportofHardware
MultiqueueDevices,OLS2007.
Somemoreinfoin:Documentation/networking/multiqueue.txt
inrecentLinuxkernels.
DeviceswithmultipleTX/RXqueueswillhavethe
NETIF_F_MULTI_QUEUEfeature(include/linux/netdevice.h)
MQnicdriverswillcallalloc_etherdev_mq()or
alloc_netdev_mq()insteadofalloc_etherdev()or
alloc_netdev().

Linksandmoreinfo
1)LinuxNetworkStackWalkthrough(2.4.20):

http://gicl.cs.drexel.edu/people/sevy/network/Linux_network_stack_walkth
2)UnderstandingtheLinuxKernel,SecondEdition
ByDanielP.Bovet,MarcoCesati
SecondEditionDecember2002
chapter18:networking.
UnderstandingLinuxNetworkInternals,Christianbenvenuti

Oreilly,FirstEdition.

Linksandmoreinfo
3)LinuxDeviceDriver,byJonathanCorbet,AlessandroRubini,Greg
KroahHartman
ThirdEditionFebruary2005.

Chapter17,NetworkDrivers

4)Linuxnetworking:(alotofdocsaboutspecificnetworkingtopics)

http://linuxnet.osdl.org/index.php/Main_Page

5)netdevmailinglist:http://www.spinics.net/lists/netdev/

Linksandmoreinfo
6)Removalofmultipathroutingcachefromkernelcode:
http://lists.openwall.net/netdev/2007/03/12/76
http://lwn.net/Articles/241465/

7)LinuxAdvancedRouting&TrafficControl:
http://lartc.org/
8)ebtablesafilteringtoolforabridging:
http://ebtables.sourceforge.net/

Linksandmoreinfo
9)WritingNetworkDeviceDriverforLinux:(article)

http://app.linux.org.mt/article/writingnetdrivers?locale=en

Linksandmoreinfo
10)Netconfayearlynetworkingconference;firstwasin2004.

http://vger.kernel.org/netconf2004.html

http://vger.kernel.org/netconf2005.html

http://vger.kernel.org/netconf2006.html

Nextone:LinuxConfAustralia,January2008,Melbourne

DavidS.Miller,JamesMorris,RustyRussell,JamalHadiSalim,StephenHemminger
,HaraldWelte,HideakiYOSHIFUJI,HerbertXu,ThomasGraf,RobertOlsson,Arnaldo
CarvalhodeMeloandothers

Linksandmoreinfo
11)PolicyRoutingWithLinuxOnlineBookEdition

byMatthewG.Marsh(Sams).

http://www.policyrouting.org/PolicyRoutingBook/

12)THRASHAdynamicLCtrieandhashdatastructure:
RobertOlssonStefanNilsson,August2006
http://www.csc.kth.se/~snilsson/public/papers/trash/trash.pdf
13)IPSechowto:

http://www.ipsechowto.org/t1.html

Linksandmoreinfo
14)Openswan:BuildingandIntegratingVirtualPrivate
Networks,byPaulWouters,KenBantoft
http://www.packtpub.com/book/openswan/mid/061205jqdnh2by
publisher:PacktPublishing.

You might also like