<?xml version="1.0" ?>

<kc>

<title>Kernel Traffic</title>

<author contact="mailto:zbrown@tumblerings.org">Zack Brown</author>

<issue num="80" date="14 Aug 2000 00:00:00 -0800" />

<intro>

<p>

Thanks go to Frederic Stark, who noticed that the links into the mailing
list archives were not working, and who also found a broken cross reference.
Thanks, Fred!

</p><p>

There is also a new indexing feature this week, extending through all back
issues of KT and all the Cousins. You can find it in the left nav bar and in
the little '[*]' links in each issue's text. Please send me bug reports and
feature ideas.

</p>

</intro>

<stats posts="1263" size="5351" contrib="441" multiples="195" lastweek="157">

<person posts="50" size="144" who="Alan Cox " />
<person posts="32" size="128" who="Linus Torvalds " />
<person posts="31" size="155" who="Tigran Aivazian " />
<person posts="24" size="99" who="&quot;Jeff V. Merkey&quot; " />
<person posts="24" size="96" who="&quot;H. Peter Anvin&quot; " />
<person posts="22" size="76" who="Alexander Viro " />
<person posts="16" size="64" who="Keith Owens " />
<person posts="16" size="56" who="&quot;Theodore Y. Ts'o&quot; " />
<person posts="16" size="55" who="Jamie Lokier " />
<person posts="16" size="48" who="&quot;David S. Miller&quot; " />
<person posts="15" size="41" who="&quot;Garst R. Reese&quot; " />
<person posts="14" size="52" who="&quot;Richard B. Johnson&quot; " />
<person posts="13" size="51" who="&quot;Dunlap, Randy&quot; " />
<person posts="12" size="45" who="&quot;H. Peter Anvin&quot; " />
<person posts="12" size="44" who="" />
<person posts="12" size="41" who="Jes Sorensen " />
<person posts="11" size="55" who="Andre Hedrick " />
<person posts="11" size="53" who="Frank van Maarseveen " />
<person posts="11" size="37" who="Rik van Riel " />
<person posts="10" size="37" who="&quot;Khimenko Victor&quot; " />
<person posts="10" size="34" who=" (Miquel van Smoorenburg)" />
<person posts="9" size="65" who="Mark Hemment " />
<person posts="9" size="37" who="&quot;Linda Walsh&quot; " />
<person posts="9" size="37" who=" (Rogier Wolff)" />
<person posts="9" size="37" who="Mike Galbraith " />
<person posts="9" size="35" who="Ulrich Drepper " />
<person posts="9" size="33" who="James Simmons " />
<person posts="9" size="33" who="Andrew Morton " />
<person posts="9" size="29" who="Matthias Andree " />
<person posts="8" size="44" who="&quot;Michael H. Warfield&quot; " />
<person posts="8" size="38" who="Chris Meadors " />
<person posts="8" size="30" who="" />
<person posts="8" size="29" who="Oliver Xymoron " />
<person posts="8" size="28" who="&quot;Andi Kleen&quot; " />
<person posts="8" size="27" who="Ookhoi " />
<person posts="8" size="27" who="Philipp Rumpf " />
<person posts="7" size="36" who="Shawn Starr " />
<person posts="7" size="31" who="Jesse Pollard " />
<person posts="7" size="30" who="Russell King " />
<person posts="7" size="29" who=" (Linus Torvalds)" />
<person posts="7" size="27" who="James Stevenson " />
<person posts="7" size="27" who="clubneon " />
<person posts="7" size="26" who="&quot;Juan J. Quintela&quot; " />
<person posts="7" size="26" who="&quot;Stephen C. Tweedie&quot; " />
<person posts="7" size="26" who="Andrea Arcangeli " />
<person posts="7" size="24" who="Andries Brouwer " />
<person posts="6" size="61" who="Andi Kleen " />
<person posts="6" size="26" who="Alexander Viro " />
<person posts="6" size="22" who=" (Kai Henningsen)" />
<person posts="6" size="21" who="Henry Spencer " />
<person posts="6" size="20" who="Thomas Molina " />
<person posts="6" size="19" who="David Woodhouse " />
<person posts="6" size="19" who="" />
<person posts="5" size="28" who="&quot;Madan A S&quot; " />
<person posts="5" size="27" who="bert hubert " />
<person posts="5" size="24" who="&quot;Mike A. Harris&quot; " />
<person posts="5" size="23" who="Daniel Phillips " />
<person posts="5" size="22" who="Khimenko Victor " />
<person posts="5" size="21" who="Mikael Pettersson " />
<person posts="5" size="21" who="&quot;Albert D. Cahalan&quot; " />
<person posts="5" size="18" who="Chris Quinn " />
<person posts="5" size="18" who="Thomas Graichen " />
<person posts="5" size="18" who="Mike Castle " />
<person posts="5" size="17" who="Thomas Davis " />
<person posts="5" size="17" who="Tim Waugh " />
<person posts="5" size="16" who="Christoph Hellwig " />
<person posts="5" size="16" who="Adam Sampson " />
<person posts="5" size="16" who="Ralf Baechle " />
<person posts="5" size="15" who="" />
<person posts="5" size="15" who="Matthew Wilcox " />
<person posts="4" size="33" who="Chipzz " />
<person posts="4" size="21" who="Roger Larsson " />
<person posts="4" size="19" who="James Lewis Nance " />
<person posts="4" size="16" who="Peter Jones " />
<person posts="4" size="15" who="Andreas Bombe " />
<person posts="4" size="15" who="Neil Brown " />
<person posts="4" size="14" who="&quot;Thomas E. Vaughan&quot; " />
<person posts="4" size="13" who="David Caswell " />
<person posts="4" size="13" who="David Hinds " />
<person posts="4" size="13" who="&quot;Copeland, Matthew&quot; " />
<person posts="4" size="13" who="octave klaba " />
<person posts="4" size="13" who="&quot;Theodore Ts'o&quot; " />
<person posts="4" size="13" who="Vojtech Pavlik " />
<person posts="4" size="10" who="Michael Elizabeth Chastain " />
<person posts="3" size="20" who="&quot;Mr. James W. Laferriere&quot; " />
<person posts="3" size="16" who="Mitchell Blank Jr " />
<person posts="3" size="15" who="&quot;Robert M. Love&quot; " />
<person posts="3" size="15" who="Sandy Harris " />
<person posts="3" size="13" who="Xuan Baldauf " />
<person posts="3" size="13" who="&quot;Johan Kullstam&quot; " />
<person posts="3" size="13" who="Simon Kirby " />
<person posts="3" size="12" who="" />
<person posts="3" size="12" who="Dieter =?iso-8859-1?Q?N=FCtzel?= " />
<person posts="3" size="11" who="&quot;Matt D. Robinson&quot; " />
<person posts="3" size="11" who="Boszormenyi Zoltan " />
<person posts="3" size="11" who="Andrzej Krzysztofowicz " />
<person posts="3" size="10" who="James Sutherland " />
<person posts="3" size="10" who="Urban Widmark " />
<person posts="3" size="10" who="Mahesh Mahadevan " />
<person posts="3" size="10" who="Matti Aarnio " />
<person posts="3" size="10" who="" />
<person posts="3" size="10" who="Erik Mouw " />
<person posts="3" size="10" who="Andreas Dilger " />
<person posts="3" size="10" who="Andrey Savochkin " />
<person posts="3" size="9" who="Stephen Frost " />
<person posts="3" size="9" who="Jakub Jelinek " />
<person posts="3" size="9" who="Jeff Garzik " />
<person posts="3" size="9" who="Bill Maidment " />
<person posts="3" size="9" who="Chris Wedgwood " />
<person posts="3" size="9" who="David Lombard " />
<person posts="3" size="9" who="Andreas Dilger " />
<person posts="3" size="9" who="Pavel Machek " />
<person posts="3" size="8" who="Christoph Hellwig " />
<person posts="3" size="8" who="Michael Elizabeth Chastain " />
<person posts="3" size="8" who="Tony Hoyle " />
<person posts="3" size="8" who="" />
<person posts="3" size="8" who="" />
<person posts="3" size="8" who="David Howells " />
<person posts="3" size="8" who=" (Arjan van de Ven)" />
<person posts="3" size="7" who="Jeffrey Fielding " />
<person posts="2" size="143" who="Patrick van de Lageweg " />
<person posts="2" size="57" who="Simon Trimmer " />
<person posts="2" size="26" who="Dawson Engler " />
<person posts="2" size="20" who="Daniel Phillips " />
<person posts="2" size="16" who="Alex Romosan " />
<person posts="2" size="15" who="Jeff Hartmann " />
<person posts="2" size="14" who="Guido Bartsch " />
<person posts="2" size="13" who="Hans Reiser " />
<person posts="2" size="13" who="&quot;S. Shore&quot; " />
<person posts="2" size="12" who="Pauline Middelink " />
<person posts="2" size="12" who="Crispin Cowan " />
<person posts="2" size="11" who="Zack Geers " />
<person posts="2" size="11" who="Richard Brunner " />
<person posts="2" size="10" who="BaRT " />
<person posts="2" size="10" who="&quot;Strahm, Bill&quot; " />
<person posts="2" size="10" who="Donald Becker " />
<person posts="2" size="9" who="Derek Martin " />
<person posts="2" size="9" who="Cesar Eduardo Barros " />
<person posts="2" size="9" who="" />
<person posts="2" size="8" who="Andreas Haumer " />
<person posts="2" size="8" who=" (Gary Funck)" />
<person posts="2" size="8" who="&quot;Grover, Andrew&quot; " />
<person posts="2" size="8" who="Jan-Benedict Glaw " />
<person posts="2" size="8" who="Roy Sigurd Karlsbakk " />
<person posts="2" size="8" who="&quot;Jeffrey E. Hundstad&quot; " />
<person posts="2" size="7" who="Thomas Graichen " />
<person posts="2" size="7" who="Sunny Zhou " />
<person posts="2" size="7" who="Malcolm Beattie " />
<person posts="2" size="7" who="Christopher Thompson " />
<person posts="2" size="7" who=" (Alexander Schulz)" />
<person posts="2" size="7" who="Miles Lane " />
<person posts="2" size="7" who="Rusty Russell " />
<person posts="2" size="7" who="Drew Sanford " />
<person posts="2" size="7" who=" (Kirk Smith)" />
<person posts="2" size="7" who="Karl-Heinz Herrmann " />
<person posts="2" size="7" who="Greg KH " />
<person posts="2" size="7" who="Christoph Egger " />
<person posts="2" size="7" who="&quot;Daniel Lafraia&quot; " />
<person posts="2" size="7" who="Mircea Damian " />
<person posts="2" size="7" who="Werner Almesberger " />
<person posts="2" size="7" who="Patrick Michael Kane " />
<person posts="2" size="7" who="Brian Gerst " />
<person posts="2" size="6" who="&quot;Petr Vandrovec&quot; " />
<person posts="2" size="6" who="Abramo Bagnara " />
<person posts="2" size="6" who="Matthew Dharm " />
<person posts="2" size="6" who="Hildo Biersma " />
<person posts="2" size="6" who="&quot;Brian W. Johanson&quot; " />
<person posts="2" size="6" who="Mjo " />
<person posts="2" size="6" who="Jeremy Hansen " />
<person posts="2" size="6" who="Ravi Wijayaratne " />
<person posts="2" size="6" who="&quot;Diego Messano&quot; " />
<person posts="2" size="6" who="&quot;Sasa Ostrouska&quot; " />
<person posts="2" size="6" who="&quot;Jeroen Geusebroek&quot; " />
<person posts="2" size="6" who="" />
<person posts="2" size="6" who="Mark Gray " />
<person posts="2" size="6" who="sat " />
<person posts="2" size="6" who="Martin Mares " />
<person posts="2" size="6" who="Andrew McNabb " />
<person posts="2" size="6" who="&quot;Oliver Antwerpen&quot; " />
<person posts="2" size="6" who="Florian Weimer " />
<person posts="2" size="5" who="" />
<person posts="2" size="5" who="Philip Blundell " />
<person posts="2" size="5" who=" (goingware.com)" />
<person posts="2" size="5" who="&quot;Michael J. Dikkema&quot; " />
<person posts="2" size="5" who="&quot;B. James Phillippe&quot; " />
<person posts="2" size="5" who="Marcelo Tosatti " />
<person posts="2" size="5" who="Ingo Molnar " />
<person posts="2" size="5" who="Rui Sousa " />
<person posts="2" size="5" who="Jerry Frana " />
<person posts="2" size="5" who="" />
<person posts="2" size="5" who="Pietje pukkemans " />
<person posts="2" size="4" who="Elmer Joandi " />
<person posts="2" size="4" who="&quot;Conrad Heiney&quot; " />
<person posts="2" size="4" who="&quot;William Scott Lockwood III&quot; " />
<person posts="2" size="4" who="Dan Hollis " />
<person posts="1" size="54" who="Andy Chou " />
<person posts="1" size="45" who="&quot;Christopher E. Brown&quot; " />
<person posts="1" size="43" who="Pablo De Napoli " />
<person posts="1" size="28" who="" />
<person posts="1" size="27" who="" />
<person posts="1" size="20" who="FORT David " />
<person posts="1" size="18" who="Roger Gammans " />
<person posts="1" size="16" who="Dan Aloni " />
<person posts="1" size="14" who="Matija Nalis " />
<person posts="1" size="12" who="" />
<person posts="1" size="10" who="Byron Stanoszek " />
<person posts="1" size="10" who="" />
<person posts="1" size="9" who="Matthew Harrell " />
<person posts="1" size="9" who="OGAWA Hirofumi " />
<person posts="1" size="7" who="Frank van Maarseveen " />
<person posts="1" size="7" who="Radovan Garabik " />
<person posts="1" size="6" who="Torsten Landschoff " />
<person posts="1" size="6" who="" />
<person posts="1" size="6" who="Martin Tessun " />
<person posts="1" size="6" who="=?iso-8859-1?Q?Andr=E9_Dahlqvist?= " />
<person posts="1" size="6" who="John Covici " />
<person posts="1" size="6" who="&quot;Ph. Marek&quot; " />
<person posts="1" size="5" who="Tomasz Wegrzanowski " />
<person posts="1" size="5" who="&quot;Vinche&quot; " />
<person posts="1" size="5" who="khromy " />
<person posts="1" size="5" who="Richard Rager " />
<person posts="1" size="5" who="&quot;Chris 'Chipper' Chiapusio&quot; " />
<person posts="1" size="5" who="Anton Ivanov " />
<person posts="1" size="5" who=" (Denis Vlasenko)" />
<person posts="1" size="5" who="&quot;Timothy A. DeWees&quot; " />
<person posts="1" size="5" who="Drew Sanford " />
<person posts="1" size="5" who="Kanoj Sarcar " />
<person posts="1" size="5" who="Michael Bacarella " />
<person posts="1" size="5" who="Riley Williams " />
<person posts="1" size="5" who="Vincent Stemen " />
<person posts="1" size="5" who="&quot;Maciej W. Rozycki&quot; " />
<person posts="1" size="5" who="Jeff McNeil " />
<person posts="1" size="5" who="" />
<person posts="1" size="5" who="Brian Warner " />
<person posts="1" size="4" who="Jens Benecke " />
<person posts="1" size="4" who="&quot;John Silva&quot; " />
<person posts="1" size="4" who="Animesh_Singh " />
<person posts="1" size="4" who="" />
<person posts="1" size="4" who="Greg KH " />
<person posts="1" size="4" who="&quot;Victor&quot; " />
<person posts="1" size="4" who="Andreas Tobler " />
<person posts="1" size="4" who="" />
<person posts="1" size="4" who="Lee Howard " />
<person posts="1" size="4" who="Bryan -TheBS- Smith " />
<person posts="1" size="4" who=" (Stuart Lynne)" />
<person posts="1" size="4" who="David Gould " />
<person posts="1" size="4" who="Thomas Pornin " />
<person posts="1" size="4" who="Adam Radford " />
<person posts="1" size="4" who="" />
<person posts="1" size="4" who="Hideaki YOSHIFUJI " />
<person posts="1" size="4" who="Brian Pomerantz " />
<person posts="1" size="4" who="&quot;Stuart MacDonald&quot; " />
<person posts="1" size="4" who="George Anzinger " />
<person posts="1" size="4" who="Andrzej Krzysztofowicz " />
<person posts="1" size="4" who="David Moffatt " />
<person posts="1" size="4" who="&quot;Jan Gyselinck&quot; " />
<person posts="1" size="4" who="Vincent Stemen " />
<person posts="1" size="4" who="Karl Hammar " />
<person posts="1" size="4" who="Andreas Schwab " />
<person posts="1" size="4" who="Matthew Jacob " />
<person posts="1" size="4" who="&quot;Mohammad A. Haque&quot; " />
<person posts="1" size="4" who="=?ISO-8859-1?Q?St=E9phane_Doyon?= " />
<person posts="1" size="4" who=" (Stephen Harris)" />
<person posts="1" size="4" who="Stanislav Rost " />
<person posts="1" size="4" who="" />
<person posts="1" size="4" who="Adrian Bridgett " />
<person posts="1" size="4" who="Gerhard Mack " />
<person posts="1" size="4" who="peter swain " />
<person posts="1" size="4" who="gus " />
<person posts="1" size="4" who="Mathieu Chouquet-Stringer " />
<person posts="1" size="4" who=" (Rogier Wolff)" />
<person posts="1" size="3" who=" (Henrique M. Holschuh)" />
<person posts="1" size="3" who="John Levon " />
<person posts="1" size="3" who="&quot;Kenneth C. Arnold&quot; " />
<person posts="1" size="3" who="Koblinger Egmont " />
<person posts="1" size="3" who="Martin MaD Douda " />
<person posts="1" size="3" who="Rasmus Andersen " />
<person posts="1" size="3" who="&quot;Christian Stuke&quot; " />
<person posts="1" size="3" who="Adam " />
<person posts="1" size="3" who="David Schleef " />
<person posts="1" size="3" who=" (A. Ott)" />
<person posts="1" size="3" who="Kristoffer von Sydow " />
<person posts="1" size="3" who="Bob Frey " />
<person posts="1" size="3" who="Marc Lehmann " />
<person posts="1" size="3" who="Christer Weinigel " />
<person posts="1" size="3" who="Igmar Palsenberg " />
<person posts="1" size="3" who="rtviado " />
<person posts="1" size="3" who="Vince Weaver " />
<person posts="1" size="3" who="&quot;Detlef Schmicker&quot; " />
<person posts="1" size="3" who="Chuck Lever " />
<person posts="1" size="3" who="Jeffry McNeil " />
<person posts="1" size="3" who="Junjiro Okajima " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="Jean-Francois Landry " />
<person posts="1" size="3" who="Borislav Deianov " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who=" (Arjan van de Ven)" />
<person posts="1" size="3" who="Hans-Joachim Hetscher " />
<person posts="1" size="3" who="Sasa Ostrouska " />
<person posts="1" size="3" who="Chmouel Boudjnah " />
<person posts="1" size="3" who="Happy " />
<person posts="1" size="3" who="David Richardson " />
<person posts="1" size="3" who="&quot;Ralston, Steve&quot; " />
<person posts="1" size="3" who="Ian Eure " />
<person posts="1" size="3" who="&quot;Ian S. Nelson&quot; " />
<person posts="1" size="3" who=" (Kevin Buhr)" />
<person posts="1" size="3" who="Eddie Williams " />
<person posts="1" size="3" who="Joel Jaeggli " />
<person posts="1" size="3" who="Mark-Andre Hopf " />
<person posts="1" size="3" who="Christoph Hellwig " />
<person posts="1" size="3" who="&quot;Brodmann, Andreas&quot; " />
<person posts="1" size="3" who="Admin Mailing Lists " />
<person posts="1" size="3" who="kmb " />
<person posts="1" size="3" who="Richard Guenther " />
<person posts="1" size="3" who="Bill Huey " />
<person posts="1" size="3" who="Gerard Beekmans " />
<person posts="1" size="3" who="Jorge Nerin " />
<person posts="1" size="3" who="Derek Fawcus " />
<person posts="1" size="3" who="Thorsten Kranzkowski " />
<person posts="1" size="3" who="Chris Mason " />
<person posts="1" size="3" who="&quot;Pat O'Rourke&quot; " />
<person posts="1" size="3" who="&quot;Mike Black&quot; " />
<person posts="1" size="3" who="Dimitris Michailidis " />
<person posts="1" size="3" who="Douglas Gilbert " />
<person posts="1" size="3" who="Samuel Thompson " />
<person posts="1" size="3" who="Ahmed El-Mahmoudy " />
<person posts="1" size="3" who="Frank Jacobberger " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="Ruth Ivimey-Cook " />
<person posts="1" size="3" who="Jeff Lightfoot " />
<person posts="1" size="3" who="Simon Richter " />
<person posts="1" size="3" who="Gerrit Huizenga " />
<person posts="1" size="3" who="Romano Giannetti " />
<person posts="1" size="3" who="Horst von Brand " />
<person posts="1" size="3" who="Dave Cecil " />
<person posts="1" size="3" who="Richard Zidlicky " />
<person posts="1" size="3" who="&quot;Rask Ingemann Lambertsen&quot; " />
<person posts="1" size="3" who="Jon Mitchell " />
<person posts="1" size="3" who="Steve Whitehouse " />
<person posts="1" size="3" who=" (Erik Mouw)" />
<person posts="1" size="3" who="Ricky Beam " />
<person posts="1" size="3" who="&quot;Sam Thompson&quot; " />
<person posts="1" size="3" who=" (Eugene Crosser)" />
<person posts="1" size="3" who="Arjan van de Ven " />
<person posts="1" size="3" who="Damon LoCascio " />
<person posts="1" size="3" who="David Hinds " />
<person posts="1" size="3" who="=?iso-8859-1?Q?Markus_D=F6hr?= " />
<person posts="1" size="3" who="Gary Lawrence Murphy " />
<person posts="1" size="3" who="Ivan Passos " />
<person posts="1" size="3" who="John Kodis " />
<person posts="1" size="3" who="Nick Cabatoff " />
<person posts="1" size="3" who="Dax Kelson " />
<person posts="1" size="3" who="Erik Arjan Hendriks " />
<person posts="1" size="3" who="Michael " />
<person posts="1" size="3" who="Aleksandr Koltsoff " />
<person posts="1" size="3" who="Moritz Schulte " />
<person posts="1" size="3" who=" (Crossfire)" />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="Mike Davis " />
<person posts="1" size="3" who="&quot;Robinson, Daniel&quot; " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="&quot;John Anthony Kazos Jr.&quot; " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="Bill Huey " />
<person posts="1" size="2" who="Stephen Torri " />
<person posts="1" size="2" who="&quot;Andrew Stubbs&quot; " />
<person posts="1" size="2" who="Trond Myklebust " />
<person posts="1" size="2" who="Ari Pollak " />
<person posts="1" size="2" who="David Weinehall " />
<person posts="1" size="2" who="Miquel van Smoorenburg " />
<person posts="1" size="2" who="&quot;ryan.tecco&quot; " />
<person posts="1" size="2" who="Cindy Cohn " />
<person posts="1" size="2" who="Marcus Meissner " />
<person posts="1" size="2" who="Larry McVoy " />
<person posts="1" size="2" who="Frank Mehnert " />
<person posts="1" size="2" who="Francois Wautier " />
<person posts="1" size="2" who="&quot;Brian S. Julin&quot; " />
<person posts="1" size="2" who=" (John Alvord)" />
<person posts="1" size="2" who="Walter Hofmann " />
<person posts="1" size="2" who="Seth Vidal " />
<person posts="1" size="2" who="Ben Pfaff " />
<person posts="1" size="2" who="Olivier Galibert " />
<person posts="1" size="2" who="I Lee Hetherington " />
<person posts="1" size="2" who="Benno Senoner " />
<person posts="1" size="2" who="Curt McCutchin " />
<person posts="1" size="2" who="Manfred Spraul " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="Andreas Jellinghaus " />
<person posts="1" size="2" who="&quot;Michael T. Babcock&quot; " />
<person posts="1" size="2" who="&quot;Alan Curry&quot; " />
<person posts="1" size="2" who="Terence Ripperda " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="andy thomas " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="Claus-Justus Heine " />
<person posts="1" size="2" who="&quot;S Park&quot; " />
<person posts="1" size="2" who="Tim Hockin " />
<person posts="1" size="2" who="Giampaolo Gallo " />
<person posts="1" size="2" who="Md A Saifulla " />
<person posts="1" size="2" who=" (Steve Ralston)" />
<person posts="1" size="2" who="&quot;Aamir Shaikh&quot; " />
<person posts="1" size="2" who="Marc SCHAEFER " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="ADAM Sulmicki " />
<person posts="1" size="2" who="&quot;Kissandrakis S. George&quot; " />
<person posts="1" size="2" who="Alex Buell " />
<person posts="1" size="2" who="Abhishek Khaitan " />
<person posts="1" size="2" who="Juanjo Ciarlante " />
<person posts="1" size="2" who="Martin Brooks " />
<person posts="1" size="2" who=" (Trond Eivind=?iso-8859-1?q?_Glomsr=F8d?=)" />
<person posts="1" size="2" who="Peter Blomgren " />
<person posts="1" size="2" who="Eric Buddington " />
<person posts="1" size="2" who="Myrddin Emrys " />
<person posts="1" size="2" who="Gregory Maxwell " />
<person posts="1" size="2" who="&quot;Raymond Miller&quot; " />
<person posts="1" size="2" who="Sean Harding " />
<person posts="1" size="2" who="Alexander Gordeyev " />
<person posts="1" size="2" who="Don Geddes " />
<person posts="1" size="2" who="Michael Meding " />
<person posts="1" size="2" who="Samuel Thibault " />
<person posts="1" size="2" who="Horst von Brand " />
<person posts="1" size="2" who="root " />
<person posts="1" size="2" who="&quot;imel...&quot; " />
<person posts="1" size="2" who="Andrew Lagun " />
<person posts="1" size="2" who="Mark Hahn " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="&quot;Petr Soucek&quot; " />
<person posts="1" size="2" who="Terry Hardie " />
<person posts="1" size="2" who="Anton Blanchard " />
<person posts="1" size="2" who="Jean-Luc Pedneault " />
<person posts="1" size="2" who="Olaf Titz " />
<person posts="1" size="2" who="Velizar Bodoursky " />
<person posts="1" size="2" who="Justin " />
<person posts="1" size="2" who="BERTRAND =?iso-8859-1?Q?Jo=EBl?= " />
<person posts="1" size="2" who="Britton " />
<person posts="1" size="2" who="Mike Sklar " />
<person posts="1" size="2" who=" (Bruce Perens)" />
<person posts="1" size="2" who="&quot;clemej&quot; " />
<person posts="1" size="2" who="&quot;David Feuer&quot; " />
<person posts="1" size="2" who="" />

</stats>

<section
  title="ext3-0.0.2f Released; Consistency Checkers; New &quot;Phase Tree&quot; Algorithm"
  subject="ext3-0.0.2e released"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_01/msg00665.html"
  posts="56"
  startdate="05 Jul 2000 00:00:00 -0800"
  enddate="02 Aug 2000 00:00:00 -0800"
>
<topic>BSD: FreeBSD</topic>
<topic>Disk Arrays: LVM</topic>
<topic>FS: JFS</topic>
<topic>FS: NFS</topic>
<topic>FS: ext2</topic>
<topic>FS: ext3</topic>
<topic>Virtual Memory</topic>
<topic>Web Servers</topic>

<mention>Andreas Dilger</mention>
<mention>Victor Yodaiken</mention>
<mention>Bill Huey</mention>
<mention>Andrew Morton</mention>

<p>

Stephen C. Tweedie announced:

</p>

<quote who="Stephen C. Tweedie">

<p>

ext3-0.0.2e has been uploaded to

</p><p>

<a
href="ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/ext3-0.0.2e.tar.gz">ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/ext3-0.0.2e.tar.gz</a>

</p><p>

This release fixes a few problems seen on rare occasions, plus one much more
serious crash-on-unmount. It includes patches for both 2.2.17pre9, and the
current Red Hat 2.2.16-3 errata kernel.

</p><p>

It also includes "orphan-list" code based on an implementation by Andreas
Dilger, for cleaning up inodes which have been unlinked but are still held
open by a process --- such inodes need to be deleted properly on a crash.

</p><p>

The full list of changes is below.

</p><p>

I will now be starting to merge in a substantial amount of newer code,
including the new error-handling infrastructure for ext3 and the
metadata-only journaling. The plan is to keep this 0.0.2e release as a 0.1
stable branch for those relying on ext3 while the new code is being merged
in.

</p><p>

Thanks to all who have helped with the testing of 0.0.2d so far.

</p>

</quote>

<p>

He listed the changes in this release:

</p>

<quote who="Stephen C. Tweedie">

<p>

Port forward to current (2.2.17pre9, and Red
Hat errata 2.2.16-3) kernels

</p><p>

Merge in a number of ext2 fixes from 2.2.15+:

</p><p>

<ul>

<li>NFS versioning</li>
<li>Set directory type information correctly on sockets</li>

</ul>

</p><p>

Fix a number of buffer leaks in recovery (prevents set_blocksize errors on
mounting filesystems)

</p><p>

sync(2) waits for current transactions correctly

</p><p>

Set the superblock s_dirt flag on all transaction completions

</p><p>

Fixed the order of asserts and buffer writes in fs/buffer.c: this was
causing false assertion failures on Mylex raid controllers

</p><p>

Delete the filesystem commit timer on unmount in all cases

</p><p>

Include Andreas Dilger's implementation of the "orphan list" code:

</p><p>

  The orphan list maintains an on-disk list of inodes needing cleaned up
  on recovery, including:

</p><p>

<ul>

<li>Deletion of unlinked, but still opened, files after a reboot;</li>

<li>Completion after recovery of truncates which were in progress but which
had to be split across a transaction boundary</li>

</ul>

</p>

</quote>

<p>

He replied to himself the next day with a patch and a warning:

</p>

<quote who="Stephen C. Tweedie">

<p>

This has now been superceded by ext3-0.0.2f (patch enclosed), which fixes a
major bug --- the new truncate code in 0.0.2e did not propagate extensions
of existing directories to disk (existing, sufficiently-padded directories
would not be affected, but appending a lot of new dirents to an existing
directory could leave the new dirents unreachable after a reboot). e2fsck
should be able to restore the directories if this has caught anybody --- the
contents of the directories was not lost, only the update of the on-disk
copy of the directory's size was being missed.

</p><p>

I'll push a complete set of clean ext3-0.0.2f patches out to
ftp.uk.linux.org shortly, but in the mean time please apply the patch below
if you are running 0.0.2e.

</p><p>

Andreas, I also found that your orphan list code was missing the case of a
"rmdir" of a directory still being used as the working directory of an
existing process. 0.0.2f should also clean up such a case on reboot.

</p>

</quote>

<p>

Later he gave a link to <a
href="ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/ext3-0.0.2f.tar.gz">the
patch</a>.

</p><p>

In the course of discussion, Theodore Y. Ts'o said:

</p>

<quote who="Theodore Y. Ts'o">

<p>

Even with journaling filesystems, there will be cases you will need to run
some kind of filesystem consistency checker.

</p><p>

<ol>

<li>In case of disk drive problems.</li>

<li>In case of memory problems (particularly cache memory)</li>

<li>In case of kernel bugs (many times what people think of as "bugs" in
filesystem code is really bugs in the VM or buffer cache parts of the
kernel.)</li>

</ol>

</p><p>

This is true for all journaling filesystems; they aren't magic. What
journaling filesystems do protect you against is the need to run fsck in
case of an power failure or a kernel crash, or some other kind of unclean
shutdown (so long as that unclean shutdown doesn't cause any other forms of
on-disk corruption.)

</p>

</quote>

<p>

Andrew Morton asked if this consistency checking could be done while the FS
were online. Theodore replied:

</p>

<quote who="Theodore Y. Ts'o">

<p>

Multics operating system was able to run its filesystem recovery tool while
the filesystem was online. Then again, Multics was also designed so that if
a circuit breaker snapped off and one of its three memory cabinets got
uncleanly shutdown, only processes that had memory pages on the downed
memory subsystem would get killed. The thinking was: just because you lost
1/3 of your memory and you have to kill off 22 user's processes, why should
you have to ruin the the other 45 users's day? :-)

</p><p>

In practice, though, I'm not aware of any filesystem consistency checker
since the days of Multics that could do this. It's possible, but you have to
put all sorts of very careful interlocking between the checking code and the
filesystem code, and this adds a *lot* of complexity. In the case of
multics, the filesystem consistency checker was actually part of the kernel
(it ran in Ring 0), and this tends to go against the general Unix and Linux
design principles of keeping as much as possible in userspace.

</p>

</quote>

<p>

Manfred Spraul replied:

</p>

<quote who="Manfred Spraul">

<p>

Windows 95/98 scandisk 8-)

</p><p>

Their implementation well documented (they need 3 lock levels), but very
slow (scandisk restarts from scratch if someone else writes to the disk)

</p><p>

The filesystem reduces metadata caching, and scandisk uses a special
interface for atomic read-fsck-write cycles. (conditional writes, similar to
the atomic instructions on most RISC cpus)

</p>

</quote>

<p>

Nick Cabatoff also said, <quote who="Nick Cabatoff">There's one for UFS/FFS
now on the way in FreeBSD 5.0: Kirk McKusick just released alpha code to do
what he calls snapshots, which I'm told will enable background fscking,
among other things. See <a
href="http://people.freebsd.org/~mckusick/snap.tgz">http://people.freebsd.org/~mckusick/snap.tgz</a>
(or the freebsd-arch archives) if you're curious.</quote> Theodore replied:

</p>

<quote who="Theodore Y. Ts'o">

<p>

That's not a full filesystem consistency
checker, though. He's running fsck on a consistent snapshot of the
filesystem in order to detect orphaned blocks which can then be freed in the
live filesystem. (The BSD soft update code can leak blocks from inodes which
are open at the time of a system crash, which is why this is necessary.)

</p><p>

This technique can't be used to deal with arbitrary filesystem corruption,
however. It only addresses a very specific case which can't be handled any
other way given the BSD Soft Updates approach.

</p>

</quote>

<p>

And Andi Kleen added, <quote who="Andi Kleen">You can do the same today on
Linux with LVM snapshots. They are only useful for read-only consistency
checking because they are read-only. (so in case of a problem you'll need to
umount and rerun fsck on the normal device)</quote> Alan Cox put in, <quote
who="Alan Cox">Also there is ext2 based work going on using phase tree
rather than journalling which gives you similar journal properties, in
future snapshots and also very nice handling of multipath error
recovery.</quote>

</p><p>

Steve Whitehouse asked for an explanation/reference for "phase tree", and
Daniel Phillips replied at length:

</p>

<quote who="Daniel Phillips">

<p>

It's the algorithm used in my Tux2 filesystem
that I've been working on since around Christmas, or longer than that if you
count 10 years of thinking about doing it :-)

</p><p>

Phase tree is my name for an algorithm similar to one that has been used in
WAFL and in a OS named Auragen that you can ask Victor Yodaiken about. I
developed the algorithm independently and it was only on reading a posting
from Victor on linux-kernel in from June, 1997, that I realized I wasn't the
only one to have thought of it.

</p><p>

My phase tree algorithm is different enough from the other two that I think
it's fair to call it a new algorithm, or at least a close cousin. I'm
writing a white paper on it for presentation at the ALS this fall. An
abstract is available now. I have a working prototype of Tux2 "with some
issues" that I'm now busily porting form 2.2.12 to 2.4.0.test.

</p><p>

I have attached Victor's original email, which makes very good reading.  You
can get the Tux2 abstract by <a
href="mailto:phillips@innominate.de">emailing me</a>... (I'm very interested
in finding out exactly who is interested.)

</p><p>

Here is a brief description:

</p><p>

Tux2 is based on Ext2.  It is not a journalling filesystem, but it does what
a journalling filesystem does: keep your files safe in the event of a
processing interruption. It does that for both data and metadata and,
according to my early benchmarks, should do it at about the same speed at
which a JFS does metadata-only. We shall see.

</p><p>

Tux2 uses my "phase tree" algorithm (so christened by Alan Cox - I called it
tree/phase but I like his name more). Phase tree imposes a partial ordering
on disk writes to ensure that a filesystem on disk is always updated
atomically, with a single write of the filesystem metaroot. To work
properly, the entire filesystem including all metadata, must be structured
as a tree. Ext2 is not structured as a tree, therefore, the major difference
between Ext2 and Tux2 is that all metadata has been rearranged into a tree.

</p><p>

Once you have the filesystem in the form of a tree you can make a copy of
the metaroot, then for all updates, apply a "branching" algorithm that works
from the updated block towards the metaroot doing a copy-on-write at each
node that needs updating. After some number of updates (the exact number is
a performance-tuning parameter) you store the new metaroot on disk, which
gives you an atomic update. So far this is similar to Auragen and WAFL.

</p><p>

Tux2's phase tree algorithm works almost entirely in cache and is intimately
coupled to the buffer cache system. A third metatree is added, to allow
filesystem updating to continue without pause while the second tree is
commited to disk, eventually replacing the first metatree using the
abovementioned atomic write.

</p><p>

In tree phase terminology, the three trees are called "phases".  The three
phases are:

</p><p>

<ul>

<li>recorded phase (the consistent filesystem image currently on disk)</li>
<li>recording phase (diffs for a new consistent image currently being written)</li>
<li>branching phase (the changing filesystem as applications see it)</li>

</ul>

</p><p>

Tux2 has its own update daemon that handles its "phase transitions".  A
phase transition is the act of commiting a new metaroot wherein the second
phase tree becomes the first, the third becomes the second and a new
metaroot is created. (This is a function analgous to kflushd, though kflushd
in its current form can't possibly know what it would need to know to cause
phase transistions at appropriate times, and in any event, it has no way to
initiate one.)

</p><p>

That's basically it.  There are some other wrinkles in Tux2 that serve to
flatten the filesystem tree, reduce the number of block writes required and
keep cpu usage to a reasonable level.

</p><p>

As Alan mentioned, there are many interesting things you can do when you
have a filesystem's metadata in the form of a tree. Tux2 doesn't do most of
those things at this point, since its main purpose in life is to demonstrate
the efficacy of the phase tree algorithm and to allow me to do kernel
development without putting my precious files at risk every time I need to
reset the system.

</p>

</quote>

<p>

Bill Huey said this looked a lot like journalling, and Daniel replied:

</p>

<quote who="Daniel Phillips">

<p>

Yes, that's the point.  It is supposed to do what a journaled FS does, i.e.,
keep your files safe and make fsck go away, but with less overhead. There
are other advantages over journalling: there are *far* fewer boundary
conditions to deal with. Basically, there is only one ordering constraint to
worry about per phase: write one entire batch of updates before writing the
next. Within a phase the order of writing is completely unconstrained, so an
elevator algorithm is free to choose the shortest path across the disk
surface. This decouples the filesystem from the lowlevel I/O in a very
satisfying way.

</p><p>

Note also that most journalling filesystems do not attempt to preserve the
integrity of data within files rigorously because of the associated overhead
of writing every data block twice (roughly speaking). Tux2 does provide an
integrity guarantee for *both* data and metadata.

</p><p>

To be fair, there are two things a journal can do that phase tree cannot:
(1) roll forward and (2) preserve filesystem integrity right up to the last
completed disk write. It's not really clear to me why (1) is useful. But (2)
is important for something like a network transaction server that wants to
report each transaction "complete and safe" absolutely as soon as possible.
So an agressive transaction server would report completion as soon as the
journal entry had been made, allowing the client application to stop waiting
and go on about its business. Phase tree has to wait until the upcoming
metaroot write completes before it can report any transaction complete; this
will be any time from a few tens to a few thousands of disk operations
later, depending on how the phase change heuristics are designed and
configured.

</p><p>

This means that journalling is better than phase tree for transaction
serving. In most other applications, IMHO, phase tree will offer higher
throughput while still giving an acceptable transaction latency. I think
that is good.

</p>

</quote>

</section>

<section
  title="Linux CVS Archive"
  subject="[ANN] Linux Kernel Source Reference"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_04/msg00436.html"
  posts="14"
  startdate="24 Jul 2000 00:00:00 -0800"
  enddate="01 Aug 2000 00:00:00 -0800"
>
<topic>Version Control</topic>

<mention>Ivan Passos</mention>
<mention>Riley Williams</mention>
<mention>Gary Lawrence Murphy</mention>

<p>

Thomas Graichen gave a link to <a
href="http://innominate.org/~graichen/projects/lksr/">The Linux Kernel
Source Reference</a> and described, <quote who="Thomas Graichen">it's
basically a cvs tree with all linux versions starting from 1.0 until the
latest one with a cvsweb www frontend and pserver remote functionality on
top of it ... this way you can easily get or diff or whatever any ever
released i(since 1.0 :-) version of the linux kernel source.</quote> He
invited comments and criticism, and some folks mentioned Riley Williams' <a
href="http://www.memalpha.cx/Linux/Kernel">online index</a> of all known
kernels. Riley added that he'd just finished updating it after a brief
hiatus. Ivan Passos pointed out that Riley's collection wasn't CVSed, and
folks agreed that the two project complemented each other. Gary Lawrence
Murphy suggested merging the two, and also mentioned <a
href="http://lxr.linux.no/">LXR (Linux Cross Reference)</a>, a searchable
kernel archive.

</p><p>

Regarding Thomas' archive, David Schleef asked if the whole thing were
available as a tarball for download, so he could set up his own local
high-speed repository. Thomas thought something like 'rsync' would be great,
and said he'd try to work on that in the next few days.

</p>

</section>

<section
  title="Linus Still Accepting Major Rewrites To USB Code"
  subject="[linux-usb-devel] USB status in 2.4.0-test5"
  archive="http://marc.theaimsgroup.com/?l=linux-usb-devel&amp;m=96499407511930&amp;w=2"
  posts="19"
  startdate="30 Jul 2000 00:00:00 -0800"
  enddate="06 Aug 2000 00:00:00 -0800"
>
<topic>FS: devfs</topic>
<topic>Hot-Plugging</topic>
<topic>Modems</topic>
<topic>Networking</topic>
<topic>PCI</topic>
<topic>SMP</topic>
<topic>USB</topic>

<mention>Alan Cox</mention>

<p>

In the linux-usb-devel mailing list, Randy Dunlap explained and announced:

</p>

<quote who="Randy Dunlap">

<p>

For sometime now, Alan Cox has maintained a list
of problems of various severity for 2.4. He and his gnomes have given up the
ghost on this list, but Linus wanted it to be kept up, so Ted Ts'o
volunteered to take it over. Ted updates the list and posts it to the kernel
mailing list and to http://linux24.sourceforge.net/ .

</p><p>

I should have done this long ago (and maybe some of you thought that I did),
but I'm trying to use this same method to track USB problems/status in
2.4.0-testN. I'm using the same format that Alan/Ted use. After this list
(that I just threw together) has been sanitized/reviewed/corrected, Ted can
have it..... so please send me updates, corrections, additions, etc., for
this USB 2.4 status list.

</p>

</quote>

<p>

He posted his list:

</p>

<quote who="Randy Dunlap">

<p>

USB Status/Problems in 2.4.0-testN<br />
2000-July-30

</p><p>

<ol>
  
<li>Should Be Fixed</li>
  
<li>Capable of corrupting your FS</li>
  
<ol>

<li>Problems with USB storage drives (ORB, maybe Zip) during APM sleep/suspend</li>
  
</ol>

<li>Security</li>

<li>Boot-Time Failures</li>

<li>Compile-Time Failures</li>

<li>In Progress</li>

<ol>

<li>usb-uhci and uhci to handle control/bulk IN STALLS better</li>
<li>usb-uhci not use set PCI Latency Timer register to 0</li>
<li>usb-uhci SMP spinlock/bad pointer crash</li>
<li>hotplug (PNP) and module autoloader support</li>

</ol>

<li>Obvious Projects for People (well if you have the hardware..)</li>

<li>Fix Exists But Isn't Merged</li>

<li>To Do</li>

<ol>

<li>race conditions on devices in use and being unplugged</li>
<li>cpia camera driver with OHCI HCD locks up or fails</li>
<li>pegasus (ethernet) driver crashes often</li>
<li>SANE backend can't communicate to its scanner (sometimes, some scanners)</li>
<li>OHCI memory corruption problem</li>
<li>Fix differences in UHCI and OHCI HCD behaviors/semantics</li>

</ol>

<li>To Do But Non-Showstopper</li>

<ol>
<li>add bandwidth allocation support to usb-uhci and OHCI HCDs</li>
<li>acm (modem) driver is slow compared to Windows drivers for same modems (probably a host controller driver problem, not acm driver)</li>
<li>printer driver can lose data when printing huge files (like 100 MB)</li>
<li>printer driver aborts on out-of-paper or off-line conditions instead of retrying until the condition is fixed</li>
<li>speed up device enumeration (hub driver has large delays in it)</li>
<li>add devfs support to drivers that don't have it</li>
<li>add DocBook info to main USB driver interfaces (usb.c)</li>
</ol>

<li>Compatibility Errors</li>

<li>Probably Post 2.4</li>

<ol>

<li>spread out interrupt frames for devices that use the same interrupt period (interval)</li>
<li>add USB 2.0 EHCI HCD</li>

</ol>

<li>Drivers in 2.2 and not 2.4</li>

<li>To Check</li>

<li>Fixed</li>

</ol>

</p>

</quote>

<p>

To item 9.3 (To Do: pegasus (ethernet) driver crashes often), Petko Manolov
replied that he was suspicious of these crashes, they seemed to be more KCD
and USB core related, but he said he'd go change a lot of Pegasus code
anyway. Miles Lane pointed out, <quote who="Miles Lane">That, perhaps, is
not the greatest idea. My understanding is that Linus has been quite ademant
about only accepting bug fixes. A major rewrite of any big chunk of code may
simply introduce many new bugs.</quote> But he replied to himself the next
day, <quote who="Miles Lane">My apologies to you, Petko. Randy has informed
me that Linus is still taking major rewrites of USB driver code. I guess I
should have gathered that without having to be told, but sometimes the
obvious alludes me.</quote>

</p>

</section>

<section
  title="Symlinks In The Kernel; Kernel/Library/etc Interface Dispute"
  subject="RLIM_INFINITY inconsistency between archs"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_04/msg00876.html"
  posts="185"
  startdate="27 Jul 2000 00:00:00 -0800"
  enddate="03 Aug 2000 00:00:00 -0800"
>
<topic>Backward Compatibility</topic>
<topic>FS: NFS</topic>
<topic>FS: ext2</topic>
<topic>SMP</topic>
<topic>USB</topic>

<mention>James Lewis Nance</mention>
<mention>Adam Sampson</mention>

<p>

Boszormenyi Zoltan had some trouble compiling the latest 'egcs' snapshots on
a Linux 2.4.0 system, and traced the problem to the fact that <quote
who="Boszormenyi Zoltan">/usr/include/asm is a symlink to
/usr/src/linux/include/asm, as in the original distribution but
/usr/src/linux is a 2.4.0-testX tree. With a 2.2.X source tree, it does not
produce any warning.</quote> Linus Torvalds replied:

</p>

<quote who="Linus Torvalds">

<p>

I've asked glibc maintainers to stop the symlink
insanity for the last few years now, but it doesn't seem to happen.

</p><p>

Basically, that symlink should not be a symlink.  It's a symlink for
historical reasons, none of them very good any more (and haven't been for a
long time), and it's a disaster unless you want to be a C library developer.
Which not very many people want to be.

</p><p>

The fact is, that the header files should match the library you link
against, not the kernel you run on.

</p><p>

Think about it a bit..  Imagine that the kernel introduces a new "struct X",
and maintains binary backwards compatibility by having an old system call in
the old place that gets passed a pointer to "struct old_X". It's all
compatible, because binaries compiled for the old kernel will still continue
to run - they'll use the same old interfaces they are still used to, and
they obviously do not know about the new ones.

</p><p>

Now, if you start mixing a new kernel header file with an old binary
"glibc", you get into trouble. The new kernel header file will use the
_new_ "struct X", because it will assume that anybody compiling against
it is after the new-and-improved interfaces that the new kernel
provides.

</p><p>

But then you link that program (with the new "struct X") to the binary
library object archives that were compiled with the old header files, that
use the old "struct old_X" (which _used_ to be X), and that use the old
system call entry-points that have the compatibility stuff to take "struct
old_X".

</p><p>

Boom! Do you see the disconnect?

</p><p>

In short, the _only_ people who should update their /usr/include/linux tree
are the people who actually make library releases and compile their own
glibc, because if they want to take advantaged of new kernel features they
need those new definitions. That way there is never any conflict between the
library and the headers, and you never get warnings like the above..

</p>

</quote>

<p>

He went on:

</p>

<quote who="Linus Torvalds">

<p>

I would suggest that people who compile new kernels should:

</p><p>

<ul>

<li>NOT do so in /usr/src. Leave whatever kernel (probably only the header
files) that the distribution came with there, but don't touch it.</li>

<li>compile the kernel in their own home directory, as their very own
selves. No need to be root to compile the kernel. You need to be root to
_install_ the kernel, but that's different.</li>

<li>not have a single symbolic link in sight (except the one that the kernel
build itself sets up, namely the "linux/include/asm" symlink that is only
used for the internal kernel compile itself)</li>

</ul>

</p><p>

And yes, this is what I do. My /usr/src/linux still has the old 2.2.13
header files, even though I haven't run a 2.2.13 kernel in a _loong_ time.
But those headers were what glibc was compiled against, so those headers are
what matches the library object files.

</p><p>

And this is actually what has been the suggested environment for at least
the last five years. I don't know why the symlink business keeps on living
on, like a bad zombie. Pretty much every distribution still has that broken
symlink, and people still remember that the linux sources should go into
"/usr/src/linux" even though that hasn't been true in a _loong_ time.

</p><p>

Is there some documentation file that I've not updated and that people are
slavishly following outdated information in? I don't read the documentation
myself, so I'd never notice ;)

</p>

</quote>

<p>

Mike A. Harris commended, <quote who="Mike A. Harris">I very much like the
idea of what you describe below however as it solves NUMEROUS problems
indeed. This information should be put in the top level README file, and
emphasis put on the 'dont compile in /usr/local' part, because it would sure
save people a lot of headaches IMHO.</quote> Also in reply to Linus, Kai
Henningsen pointed out that in Debian at least, <quote who="Kai
Henningsen">/usr/include/asm is a directory, and its contents come with the
libc6-dev package.</quote>

</p><p>

In reply to Linus' question about misleading docs that might be floating
around, several folks piped up. Jeff Lightfoot pointed out that a ton of
files in the 'Documentation' directory referenced '/usr/src/linux', and
James Lewis Nance and Andr&#233; Dahlqvist independently posted patchs to clean
that up in the main README. Adam Sampson added that the 'glibc' installation
instructions had similar problems, and Kai added that in the Linux sources,
the problem existed in <quote who="Kai Henningsen">Lots of places, actually.
'find -type f | xargs grep /usr/include' and shudder.</quote>

</p><p>

Also in reply to Linus, Theodore Y. Ts'o suggested having /usr/src/linux be
a symlink to the header files of whatever kernel booted by default. Since
only root could actually install a kernel (even though any user could do the
compilation themselves), the question of where the link should point would
always be clear. He explained, <quote who="Theodore Y. Ts'o">The problem is
that unless you are trying to say that you want to outlaw external source
packages which generate kernel modules, there needs to be some way for such
packages to be able to find the kernel header files.</quote> But Linus
replied that this would force kernel header files to maintain source-level
backward compatibility forever, which would cause big problems. In terms of
how external packages could find header files, Linus replied:

</p>

<quote who="Linus Torvalds">

<p>

By hand. By the maintainer. And _independently_
of what random user Joe Blow has on his particular installation.

</p><p>

Because it's not unreasonable AT ALL to have those packages be compiled with
newer header files than the user even has access to. Imagine a ext2 library
that wants to support new features of the filesystem, compiled on a box that
only has 2.2.13 installed. Neve rever had anything newer.

</p><p>

Should that newer source package dumb itself down to 2.2.13 level, so that
the e2fsck doesn't know how to handle new filesystems? Sure, the user
obviously isn't using them _now_, but wouldn't it be a lot nicer if you just
had a source tree that ended up generating the same binary that you as the
maintainer has? With all the new features, just suppressed by the fact that
it ends up running on a old-style filesystem image..

</p><p>

Trust me, it's STUPID to have user-level binaries that end up different
depending on what machine they were compiled on. We've had exactly that
happen, and it's a BUG. It's nasty to debug.

</p><p>

Think about it. You have machine X and machine Y, and they both have the
ext2-programs compiled with the same compiler from the same sources with the
same libraries. Would you _really_ consider it acceptable if they act
differently?

</p><p>

I don't. And that is why I will continue to maintain that it is WRONG to
have that symlink. No ifs, buts of other crap. Just face reality.

</p>

</quote>

<p>

Elsewhere in the same vein, he went on:

</p>

<quote who="Linus Torvalds">

<p>

I know people who _routinely_ compile stuff over NFS on another machine
simply because that other machine is a lot faster, and the network is fast.
They expect the binary to be the same. And I agree 100% percent. It should
NOT depend on your particular kernel configuration (and yes, some kernel
header files actually _change_: they depend on whether the kernel was
compiled for a PII or a i386 etc).

</p><p>

Say you have a build-server that runs an older kernel because it doesn't
really matter, and it's not running gnome etc. Say your desktop uses USB and
you've upgraded. Or the reverse may be true, where the build-server is a SMP
machine that uses a newer kernel because it handles the load better.

</p><p>

With your approach, that build-server would be unable to generate programs
that take advantage of the new features that somebody wanted to have in the
program. They would generate programs that are doing things that the locally
generated programs wouldn't be doing.

</p><p>

What I mean is that the above generation-script should be generated _once_.
The source gets distributed with the generated file, so that whatever
happens you at least get reliable results in a reasonably heterogenous
environment.

</p><p>

A "normal user" would never generate nofollow.h at all. The generation
script would be used by the _maintainer_ or by people who add new features
(And yes, in the above example it's rather simplistic. A real example would
generate the proper architecture ifdef's etc).

</p><p>

I expect that library versions and compiler versions should matter to
compiling programs. But I do _not_ want kernel versions to do that. It's
already painful for people that you have to have the right library version.
I'd _hate_ to see source code that says "requires kernel 2.3.99 or higher
sources in /usr/src/linux" in addition to saying "needs glibc-2.1.2 or newer
for threading reasons".

</p>

</quote>

<p>

He replied to himself:

</p>

<quote who="Linus Torvalds">

<p>

Put another way that maybe is a clearer example:

</p><p>

<ul>

<li>when I download the binary rpm of package "foo-2.3.5.rpm", should I
really have to care on what machine it was compiled?</li>

</ul>

</p><p>

A lot of old-time UNIX people seem to think that everybody compiles sources
themselves. That's madness. Yes, it's important that you _can_. But you
shouldn't have to. If I hear that the new feature 2.3.5 of package "foo"
supports the new filesystem layout that I've been waiting for, should I have
to pray that the person who compiled the binary happened to use one of the
development kernels where that feature was actually implemented?

</p><p>

Or should I have to recompile it myself to make sure?

</p><p>

Or, wonder of wonders, should it just WORK?

</p><p>

I think the latter. And I hope I've made clear to everybody why a software
package must NOT EVER depend on what kernel version happened to be installed
when it was compiled. And why it is so _important_ that nobody even by
mistake does this. EVER.

</p><p>

The defense rests.

</p>

</quote>

<p>

Theodore replied that he hadn't meant userland programs, and said:

</p>

<quote who="Theodore Y. Ts'o">

<p>

I'm talking about kernel modules.  Like the external PCMCIA package;
remember? The one which you recommended distro's should use because the 2.4
PCMCIA code wasn't quite up to snuff yet.

</p><p>

Kernel modules *inherently* depends on which kernel happens to be running on
which machine. We can't change that, because we don't want to lock down
kernel interfaces.

</p><p>

It would be nice, however, if there was a painless way to compile such
external kernel modules so they easily work with whatever kernels happens to
be on the machine.

</p><p>

I accept your arguments that user-mode programs shouldn't depend on the
kernel which you happen to be compiling on. But this simply doesn't work for
kernel modules.

</p>

</quote>

<p>

Linus replied, <quote who="Linus Torvalds">You're right, right now kernel
modules need some way of specifying where the kernel is. I've always just
had a define at the top of a makefile that the user actually had to edit by
hand (this was how early USB-development was done, for example). Not very
pretty, I guess. But at least it doesn't screw the "normal" user
packages.</quote> And Theodore said, <quote who="Theodore Y. Ts'o">I'd
really, really, like some kind of convention that could be
standardized.</quote> He proposed either:

</p><p>

<ul>

<li>/usr/src/linux</li>
<li>/lib/include/`uname -r`</li>
<li>../linux</li>

</ul>

</p><p>

He went on:

</p>

<quote who="Theodore Y. Ts'o">

<p>

I could live with any of these; as long as we all can agree on a single
convention, so that default is always right. If you don't like
/usr/src/linux because of the past history, and how user-mode packages
are using it incorrectly, let's create a new convention.  I personally
think /lib/include (ala /lib/modules) is probably the best one but it
means dropping approxmiately 4 megabytes into /lib, which might cause
some problems for some partitioning schemes.

</p><p>

My external kernel module packages use a define at the top of a makefile as
well (and currently defaulted to /usr/src/linux; I can change that). This is
fine for me, but I'd like to be able to support users that don't necessarily
know how to edit Makefiles. I'd like for them to be able to type "make" and
"make install" as root, and that's about it. In order to do this, we need
some kind of convention. Covnentions are Good Things.

</p>

</quote>

<p>

Alan Cox also advocated standardizing on something, and suggested:

</p>

<quote who="Alan Cox">Symlinks are wonderful things

<p>

        /lib/modules/2.2.14/build

</p><p>

neither needs to be a source tree in full nor a copy. In fact its ideal
since make modules_install will know enough to make the link so the link
will defacto get put in the right place when people install new kernels.
Self updating to new features is good.

</p>

</quote>

<p>

Linus replied, <quote who="Linus Torvalds">I like this one. It puts the
thing in the same tree as the modules themselves, so it's self-contained.
Let's _document_ it as a symlink, and make "make modules_install" do that
part too (I don't use modules so I'd rather somebody else sent me the tested
- likely one-liner - patch to do this).</quote> Theodore posted a very small
patch, and added, <quote who="Theodore Y. Ts'o">Vendors should test this
against their kernel packaging tools, which tend to do all sorts of
non-standard stuff because they try to build build multiple kernels and
multiple sets of modules from a single kernel source tree.</quote> There
followed some implementation discussion about various pitfalls to be
avoided, and how best to code the patch to avoid them.

</p><p>

Elsewhere, Ulrich Drepper had some angry words for Linus regarding the whole
discussion:

</p>

<quote who="Ulrich Drepper">

<p>

Your style of development these sudden,
unplanned changes is what makes it necessary to not add all the content to
the libc headers. In addition, and I repeat this probably for the thousands
time, where the f*ck is the sysconf() functions which is so very much
needed?

</p><p>

Until you provide solutions for this you cannot expect others to do more
work. I would have to release a new glibc version every week since something
changed and somebody will run into the problems. And no, your argument that
the people who are doing such low-level work should know what to do doesn't
cut. Those people might know, but what they produce and ideally distribute
in source form has to be compiled by the clueless. They don't know how to
change their system (if they even have the permission) and hardcoding new
values is also out of question.

</p><p>

Maybe you should spent some time thinking how *you* can improve the process
of using more recent kernels before complaining about others. The first and
obvious thing is to implement __sys_sysconf (maybe do it on top of sysctl, I
don't care).

</p>

</quote>

<p>

Regarding sysconf(), Linus replied:

</p>

<quote who="Linus Torvalds">

<p>

I've never needed it.

</p><p>

Uli, maybe you forgot about that "open source" thing?

</p><p>

And btw, the kernel doesn't even _know_ many of the sysconf values. They
depend on library implementation, and apparently even on things like the
implementation of the "expr" binary. So "sysconf()" is not a kernel thing.

</p><p>

A subset of those sysconf values are things that you should ask the kernel,
but go look at what sysconf should return: it's definitely not a system
call.

</p><p>

You're barking up the wrong tree.

</p>

</quote>

<p>

Ulrich replied that he had no time to work on the kernel; and that in any
case he wasn't asking for a full implementation, only one that would expose
the kernel parameters, and the library could take care of the rest. He went
on, <quote who="Ulrich Drepper">Just recently I needed the real value of
NGROUPS_MA. How should I get it? Also, fpathconf() is needed. And no, I'm
not misdirecting this. Their were in the past some tries to implement this
and you ignored them.</quote> To Ulrich's time constraints, Linus replied

</p>

<quote who="Linus Torvalds">

<p>

So don't complain if I'm not interested in some esoteric glibc issue that I
find totally removed from the kernel.

</p><p>

In particular, why curse at me when it's your own problem.

</p><p>

In short, go away until you can behave.

</p>

</quote>

<p>

Ulrich replied:

</p>

<quote who="Ulrich Drepper">

<p>

It's a problem caused by you and the short-sighted way the kernel interfaces
are designed so that they need constant attention. You are unwilling to
cooperate in any way. Saying that writing a kernel version of
sysconf/fpathconf is *my* problem is simply ridiculous. According to your
logic it is my problem to keep the libc interface and it is my problem to
keep (ehm, make) the kernel interface sane. You are happy living in the
kenrel-only world. Probably using a shell kernel module or so since, as you
mentioned, the libc problems are only "esoteric" problems for you.

</p><p>

Why are you constantly rejecting advices and even implementations of proper
interface for the kernel? I know that you don't think it's fair to compare
yourself to the developers of the other (commerial) Unix kernels. But how
about just taking a look at the interfaces? Why do you think they have, for
instance, kernel sysond and fpathconf interfaces? The reason is very similar
to the situation we are in here: they have separate groups working on the
kernel and user-level stuff, they allow the admin to reconfigure the kernel.
This all cries out loud for a sane and stable interface.

</p><p>

If you don't want to work on these things, fine, nobody can blame you. But I
think you owe it to all the other people working on and using the system to
listen to their comments and accept some changes for which you in the
kernel-only world see absolutely no need.

</p><p>

Having said this it is I think time to call for volunteers ones again. Maybe
we can actually find some if you are stating that you are actually willing
to seriously consider using what they are coming up with. What is needed is:

</p><p>

<ul>

<li>very clear markings of the data structures in the kernel headers which
can be used at userlevel. Tools of whatever fashion can then be used to
extract them. I agree that the headers definitions should not affect the
possibility to run the application on other platforms and you'll find in
glibc plenty of code just to allow this. We are always proposing to use the
very latest kernel headers even if the kernel which is actually used is very
old.</li>

<li>write interfaces to get kernel parameters.  This must come in two
flavours:

<ul>

  <li>sysconf-like for generic parameters like<br />

    <ul>

    <li>NGROUPS_MAX</li>
    <li>ARG_MAX</li>
    <li>OPEN_MAX</li>
    <li>CLK_TCK</li>
    <li>NSIG (I know that you don't think more than 64 signals are useful
          but people can and do reconfigure the sources)</li>
    <li>total number of processors</li>
    <li>online processors (parsing /etc/cpuinfo, as we do it now, is a pain)</li>
    <li>MSGMAX</li>

    </ul>

    and possibly more in future.  All these are kernel related and can be
    reconfigured. Just recently I've rewritten parts of the group handling
    in glibc since using a constant NGROUPS_MAX is not possible. There are
    people changing the constant and recompiling the kernel. We are even
    doing this in-house for some special purpose machines.

  </li>

  <li>fpathconf-like<br />

    must handle something like<br />

    LINK_MAX     which varies with the filesystem

  </li>

</ul>

</li>

<li>design interfaces and data structures with a little bit of farsightness.
I know you want to keep the resource requirements as low as possible and
this is fine. But being limited by existing APIs which cannot be changed is
worse and finally adds more baggage since you have to drag compatibility
code around with you.</li>

</ul>

</p><p>

Please consider these advices.

</p>

</quote>

<p>

To Ulrich's statement that Kernel interfaces were shortsightedly designed
and required constant attention, Linus replied, <quote who="Linus
Torvalds">No. I've told you (in fact this whole thread is all _about_ that)
how to not need constant attention. The fact that you repeatedly ignore this
is your problem, not mine.</quote>

</p><p>

Ulrich accused Linus of taking the easy way out, and at one point Alan said
he felt the balance was somewhere in the middle, and elsewhere volunteered
to work on sysconf(), if Ulrich would provide a precise list of items.
Ulrich replied, <quote who="Ulrich Drepper">The ones I mentioned in one of
the last mails are those I'm currently aware of. But the scheme should be
easily extendible anyway since there will be new requirements in the future.
And ideally modules will be able to register their own extensions.</quote>

</p><p>

But Linus interposed:

</p>

<quote who="Linus Torvalds">

<p>

Don't do this.

</p><p>

Make it a _minimal_ list, not the kind of "this is everything I can think
of, and I'll also add a way for modules to add their own" stuff.

</p><p>

Yes, Uli, I know you like overdesigning things.

</p>

</quote>

<p>

Ulrich said he wasn't overdesigning, just looking to the future, and cited,
<quote who="Ulrich Drepper">For example, there are still people using the
STREAMS stuff. This code should also export it's parameters.</quote> But
Linus replied:

</p>

<quote who="Linus Torvalds">

<p>

NO.

</p><p>

That code should just DIE.

</p><p>

sysconf() isn't even important enough to overdesign for. Why really cares
whether _SC_STREAM_MAX gets the exact right value? I've never seen anybody
use it.

</p><p>

The way code gets added to the kernel is when somebody cares enough to write
it, and it looks good enough to add.

</p><p>

Code does NOT get added to the kernel just because somebody makes a big deal
of nothing.

</p>

</quote>

<p>

Ulrich said the 'streams' thing was just an example of the kind of thing he
was talking about, not a specific case where it definitely should be done.
But he concluded, <quote who="Ulrich Drepper">I'll stop trying to convince
you since I don't have much hope. When glibc 2.2 comes out I'll provide some
information on how much code is necessary to handle all the different kernel
versions. Almost all of these changes could have been avoided if this purely
minimalistic approach to kernel interface design would be replaced by
something more flexible.</quote>

</p><p>

At this point the discussion veered off.

</p>

</section>

<section
  title="Trouble With PS/2 Hotplugging In Stable Series"
  subject="ps/2 mouse (synaptics touchpad)"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_04/msg01086.html"
  posts="16"
  startdate="28 Jul 2000 00:00:00 -0800"
  enddate="01 Aug 2000 00:00:00 -0800"
>
<topic>Hot-Plugging</topic>
<topic>Version Control</topic>

<mention>Vladimir Dergachev</mention>
<mention>Vojtech Pavlik</mention>
<mention>Andrew McNabb</mention>

<p>

Vladimir Dergachev noticed that 'gpm 1.19.3' gave tons of errors in
2.4.0-test4, while under 2.2.14 it worked fine. Andrew McNabb reported
seeing the same problem when he'd upgraded to 2.2.16, and recommended just
removing 'gpm', since it was unmaintained anyway. But Vladimir replied with
a patch, having tracked the problem to some PS/2 reconnect code, introduced
into 2.4 and 2.2 at about the same time. His fix was to remove the new code,
after which the system worked fine again (although it would be impossible to
hotplug PS/2 devices). He also mentioned that 'gpm' was being maintained
again, at least as of June. Alan Cox replied that removing the code was not
the right answer, and started asking debugging questions. No solution
presented itself, aside from re-implementing reconnect-event determination
(Vojtech Pavlik gave a link to <a
href="http://www.suse.cz/development/input">Linux Input Drivers</a>), and at
one point Alan said, <quote who="Alan Cox">If someone has infinite bandwidth
to go digging in that CVS and cares to send me the relevant pieces let me
know. Otherwise I'll worry about this after 2.2.17</quote>

</p>

</section>

<section
  title="Feature Consideration"
  subject="[PATCH] Decrease hash table memory overhead"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_04/msg01100.html"
  posts="25"
  startdate="28 Jul 2000 00:00:00 -0800"
  enddate="04 Aug 2000 00:00:00 -0800"
>
<topic>Networking</topic>

<p>

Andi Kleen posted a patch for 2.4, and reported:

</p>

<quote who="Andi Kleen">

<p>

Linux uses double linked list heads in the inode and dcache hash tables.
That wastes a lot of memory, especially since neither inode nor dcache ever
try to access the tail of the hash list. The following patch adds a new
hlist_* implementation that works on double linked lists with a single
pointer head. It adds a few jumps over the list_* rings, but IMHO the
decreased cache line usage in the hash heads is more than worth it (you can
do a lot of jumps in a single cache miss)

</p><p>

This saves about 96K memory on my 128MB machine, more on machines with
bigger ram.

</p>

</quote>

<p>

But Linus Torvalds replied, <quote who="Linus Torvalds">I'd rather have just
one list function than save a few kB of RAM. Avoid confusion, and make
people so used to that one list-handling functionalty that bugs don't crop
up as easily.</quote> Andi said he was almost certain that the patch would
also speed up the system, and offered to do a benchmark; and Linus replied,
<quote who="Linus Torvalds">Hey, feel free. That might motivate me if it is
noticeable.</quote> Andi posted some good numbers, but Linus objected:

</p>

<quote who="Linus Torvalds">

<p>

I'm not interested in made-up benchmarks that cannot be reproduced under
real load.

</p><p>

Can you make it show up on a real filesystem even with a contrieved
user-mode benchmark?

</p><p>

(Btw, even if you do convince me, please don't use a name like "hlist".
"hlist WHAT?" What's the "h" for? "hash"? Why? Basically, it sounds
nonsensical).

</p><p>

Btw, from past exprience I've found that it can be a lot more advantageous
to just dynamically move the hash entries to the front of the list when
accessed, rather than worry about how the list is set up. Hashes are bad on
the caches by design, and whether the hash table takes up x or 2x of memory
is pretty much immaterial for performance. But whether you find the entry on
the first or the fifth try is noticeable.

</p><p>

I suspect you'd find more of a performance advantage from trying something
like that instead..

</p>

</quote>

<p>

Andi explained, <quote who="Andi Kleen">Doing it completely from user space
would probably add so many other variables and variances that the results
would be hard to interpret,</quote> and Linus came back with:

</p>

<quote who="Linus Torvalds">

<p>

Yes.

</p><p>

The other way to say the same thing is

</p><p>

        "Doing it from user space might show that it's not a performance
         optimization that can be noticed".

</p><p>

See?

</p>

</quote>

<p>

Andi gave up on the benchmark idea as being too much work, but added, <quote
who="Andi Kleen">Anyways, hlists are already used all over the kernel (e.g.
try grep pprev net/ipv4/*), just everybody is reinventing the wheel on them
all the time. I did that myself several times. It would be nicer to use
list_*() macros the time, just without the bloat of the list_* list
heads.</quote> To which Linus agreed:

</p>

<quote who="Linus Torvalds">

<p>

Now THIS is a valid argument that I can find no holes in.
 
</p><p>

The argument of "inode.c could be speeded up/shrunk/xxxx" doesn't strike me
as being a very good argument especially just before 2.4.x.

</p><p>

The argument that "lots of code already does this, except they aren't very
clean about it and do it by hand", is an argument I can buy into.

</p><p>

You might consider just going about it a different way: pick the places that
_already_ use this kind of list, and clean them up using a generic list
package. I still don't like "hlists" as a name, because I still don't see
the "hash" in them conceptually, but I would certainyl consider any cleanup
a good thing.

</p><p>

And once you come from that direction, it's going to be a lot easier
convincing me to eventually potentially switch over some of the current
lists.h users to a new implementation.

</p>

</quote>

<p>

Andi replied that he'd look into this for 2.5, and posted a new patch
containing only the pure bugfixes from his initial code.

</p>

</section>

<section
  title="Stopping Buffer-Overrun Attacks"
  subject="Stopping buffer-overflow security exploits using page protection"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_04/msg01322.html"
  posts="14"
  startdate="28 Jul 2000 00:00:00 -0800"
  enddate="01 Aug 2000 00:00:00 -0800"
>
<topic>Security</topic>

<mention>Derek Martin</mention>

<p>

Bruce Perens gave a pointer to <a href="http://technocrat.net/964824712/">an
article on technocrat.net</a> and asked, <quote who="Bruce Perens">Is there
any good reason that we can not run Linux executables with the execute
permission turned off, by default, on all stack and data pages? Wouldn't
this stop buffer-overflow security exploits that try to inject executable
code onto the stack or into function tables? i386 won't support it, but
other architectures do.</quote> James Sutherland replied that this sounded
like the "nonexecutable stack" idea that had been floating around for awhile
(see <kcref subject="Unexecutable stack" startdate="27 Dec 1999 00:00:00 -0800"></kcref> and
<kcref subject="Unexecutable Stack / Buffer Overflow Exploits..."
startdate="28 Dec 1999 00:00:00 -0800"></kcref>). He added, <quote who="James
Sutherland">It doesn't stop anything - just changes the nature of the
exploit needed (i.e. the skr1pt k1dd13s need to find v2 of their little
skr1pt). The opinion round here seems to be that this isn't worth the
hassle?</quote> Alan Cox added:

</p>

<quote who="Alan Cox">

<p>

As for the number of exploits this would stop,
including this in the mainstream kernel would only be a stopgap measure. All
it took to open the floodgates for stack smashing exploits was a single
well-written article - Aleph One's "<a
href="http://www.codetalker.com/whitepapers/other/p49-14.html">Smashing the
Stack for Fun and Profit</a>". Now writing an exploit once you find an
overflow is a cookbook exercise. A 2nd edition of the cookbook would be all
it would take to render the patch meaningless.

</p><p>

The problem isn't Intel's fault or any OS's, it's a problem in the C
language and compiler. There are 5 fixes:

</p><p>

<ol>

<li>write safe code (which has so far proved hard)</li>
<li>compile with bounds-checking (big performance hit)</li>
<li>compile with StackGuard, etc. (doesn't stop exploits that corrupt
   other locals)</li>
<li>separate the return address stack from the automatic variable stack
   (ditto)</li>
<li>use another language (performance)</li>

</ol>

</p>

</quote>

<p>

Derek Martin said he didn't understand why folks were against closing up the
security holes that they could, even if there were others they couldn't.
Oliver Xymoron explained, <quote who="Oliver Xymoron">We have n exploitable
buffer overruns. The non-exec patch will leave us with n exploitable buffer
overruns next week and a false sense of security. Meanwhile the patch is
disgusting complex - it's like putting four deadbolts on your front door
while leaving your back door open..</quote>

</p><p>

And Lamont Granquist put in:

</p>

<quote who="Lamont Granquist">

<p>

This should really be a FAQ.

</p><p>

The problem is that you don't reduce any potential vulnerabilities at all.
For every buffer overflow exploit out there you can modify it and produce a
version which will work against a non-exec stack page on an x86. It is not
hard. I was actually considering producing a "Smashing the Stack for Fun and
Profit, Part II: Non-Exec Stacks" text to show just how easy it really is.
For now, I suggest you check out the VULN-DEV archives -- a few very helpful
people on that list <a
href="http://www.hideaway.net/vuln-dev/april/subject.html">walked me
through</a> how to produce non-exec stack exploits.

</p><p>

If a non-exec stack ever got accepted into the kernel, then exploit writers
would simply start coding for non-exec stacks. The end result is that you
would gain precisely nothing. And what you would lose is that you would have
broken the x86 API -- for nothing. So, yes, there is a drawback, and no you
don't reduce any vulnerabilities. Linus has already rejected such patches
for this reason. Check out the Libsafe documentation for a little bit of
background and references.

</p>

</quote>

</section>

<section
  title="mount() History And Proposal"
  subject="[RFC][Long][Horror story] Mount flags"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_05/msg00150.html"
  posts="13"
  startdate="30 Jul 2000 00:00:00 -0800"
  enddate="07 Aug 2000 00:00:00 -0800"
>
<topic>FS: NFS</topic>
<topic>POSIX</topic>

<mention>Andries Brouwer</mention>

<p>

Alexander Viro gave an amazing history of mount(), and proposed (quoted in full):

</p>

<quote who="Alexander Viro">

<p>

Sorry for the length of that, but I really felt that the whole story was
needed to appreciate the situation. In short, I think that there is a need
of new variant of mount(2). Yep, new syscall number. See below for the
reasons. Here it comes:

</p><p align="center">

                            Mount Flags, or<br />
                          A Story of Interace Rot.

</p><p>

Once upon a time life was simple, interfaces pleasant and look at the
mount(2) didn't raise a suspicion that Frankenstein's monster got what he
wanted. Back then mount(2) had 3 arguments - directory, device and rw flag
(unused, by the way). Alas, it didn't last. In March '92 mount(2) got a new
argument - fs type. So far, so good, but the story didn't end on that -
somewhere in July '92 msdosfs went in and brought mount(8) options that were
obviously fs-specific. And that brought a new argument - void *data.
sys_newmount(9), you are saying? You wish... That's what had actually
happened:

</p>

<blockquote>
 
<p>

Flags is a 16-bit value that allows up to 16 non-fs dependent flags to be
given to the mount() call (ie: read-only, no-dev, no-suid etc).

</p><p>

data is a (void *) that can point to any structure up to 4095 bytes, which
can contain arbitrary fs-dependent information (or be NULL).

</p><p>

NOTE! As old versions of mount() didn't use this setup, the flags has to
have a special 16-bit magic number in the hight word: 0xC0ED. If this magic
word isn't present, the flags and data info isn't used, as the syscall
assumes we are talking to an older version that didn't understand them.

</p>

</blockquote>
 
<p>

and
 
</p>

<blockquote>

<p>

do_mount() does the actual mounting after sys_mount has done the ugly
parameter parsing. When enough time has gone by, and everything uses the
new mount() parameters, sys_mount() can then be cleaned up.

</p>

</blockquote>

<p>

Needless to say, this interface is still with us. Nevermind that current
kernel will simply refuse to exec() a binary from '92, the kludge is still
there. First bunch of flags was nice and sweet: ro, nodev, nosuid, noexec
and sync. Bits 0--4, indeed. But in January '93 we've got remount and it had
been implemented as a new flag. It wasn't a flag, indeed, but hey, why not
encode the action into the same argument and avoid API changes? So there it
went and got the bit #5. And so it stayed for a while. Flags (real flags,
that is, not remount one) were mirrored into -&gt;i_flags of every inode and
everyone was happy.

</p><p>

In August '94 -&gt;i_flags got two new bits - S_APPEND and S_IMMUTABLE. They
could not be passed by mount(2), indeed. They got bits #8 and #9, apparently
to make them visibly separate from the rest. Well, putting them at #16 and
above might be wiser, but hey, who will ever need more than 8 (OK, 7) mount
flags?

</p><p>

In the late '95 we got an implementation of quota. And -&gt;i_flags got a
new bit - S_QUOTA, #7. Originally it got an inventive name S_WRITE, but that
insanity had been fixed in '98.

</p><p>

Fast-forward to October '96. POSIX mandatory braindam^Wlocking gores in.
Since nobody wants the overhead hitting all filesystems we are getting a new
mount flag. This time - real. OK, #6 is still free, so there it goes.

</p><p>

Novermber '96, and we have one more flag - noatime. Oops, looks like we had
made a bad choice when append-only and immutable went in. Oh, well, who
actually cares? #10 it is.

</p><p>

Originally remount could change only read-only bit. Well, mandatory locking
and noatime also became changable, so in September '97 somebody asked
himself why the rest didn't? At that point MS_RMT_MASK (flags that can be
changed by remount) started to look somewhat ugly. It got worse three months
later, when nodiratime went in (bit #11).

</p><p>

In October '98 RMK noticed that remount doesn't update -&gt;i_flags, so
macros got uglier - now we were checking both for -&gt;i_flags and
-&gt;s_flags.

</p><p>

In April '99 unrelated events (rename() cleanup) had lead to Yet Another
Mount Flags Ugliness(tm). This time the guilty party is known - it's me
(AV). I needed a way to tell rename() that some filesystems need special
treatment (silly-rename ones). Instead of putting that into
-&gt;s_type-&gt;fs_flags (after all, that's a property of filesystem type)
I've added a new bit to -&gt;s_flags (#15 - at that point we were visibly
low on space; why not #16? Hell knows, I plead temporary braindamage
inflicted by contact with NFS).

</p><p>

A year later one more bit got there, this time in -&gt;i_flags - S_DEAD.
That time I had finally had seen the light (OK, actually I had seen the dire
lack of space, but let's pretend that I was clever) and it went into #16.

</p><p>

About the same time we've got Plan9-ish bindings. I made some noises about a
new syscall, but they were not too convincing. For several reasons: first of
all, the name (bind) had been already taken and bind9(2) was a half-hearted
proposal at best. Moreover, I wanted to debug it fast and didn't want to
change mount(8) source. So the quick kludge^Whack went in - -t bind. In
other words, passing the thing through the "type" argument.

</p><p>

The same batch of changes introduced unlimited stacking. Which looked fine
at first, but brought a lot of complaints, arguments and finally such an
example of misuse that drove the point through. It was an obvious exploit,
letting any user who can mount something (floppy, CD, whatever) to drive the
system into OOM. Worse yet, cleaning up after that was damn hard, and I
don't mean washing the LART. That was it - we need more flags, since the
ability to overmount must be root-only. And checks should be in mount(8),
since it's suid-root and from the kernel POV all calls of mount(2) are done
by root. On the other hand, the actual test for presence of another
filesystem at the mountpoint must be left to mount(2) to avoid races.

</p><p>

OK, but we also want to be able to support union-mounts at some future
point. That means two more flags (head/tail of the union). We also want to
get rid of the -t bind kludge, so that's one more bit going our way.
However, currently we have only 3 unused bits - #12, #13 and #14. We can get
more if we relocate S_QUOTA, S_APPEND, S_IMMUTABLE and MS_ODD_RENAME,
though... OK, assume that we've done that, what do we have?

</p><p>

#0 to #4, #6, #10 and #11 are used for real flags. Fine. #5 and some of the
rest are used for "action" flags. So we can fit into 16 bits, but it's
getting really, really crowded here. We can get a bit more if we notice that
MS_RENAME, MS_AFTER, MS_BEFORE and MS_OVER are mutually exclusive, but that
gets really ugly - we could fit into 3 bits instead of 4, but they would be
spread over not-contiguous area. And we can't do anything about that without
breaking every existing binary of mount(8). Moreover, we can't do anything
about the 0xc0ed kludge - all kernels since '92 are going to send us to hell
if we change that. Yes, Virginia, removal of that check had been overdue for
some 7 years, but there is no helping to that.

</p><p>

_Or_ we can do what needed to be done back in '92 and '94 and introduce
sys_newmount(action, mountpoint, type, flags, device, data). Why "action"
separate from "flags"? Well, see the story above. Mixing the bitmap and
number into one integer _never_ pays. And inside the kernel we will have to
start with separating them anyway. I could buy an argument about the
register pressure, but damnit, it's mount(2) we are talking about. If it's a
hotspot of your program I want to know what the hell are you trying to do.

</p>

</quote>

<p>

Hans Reiser replied, <quote who="Hans Reiser">Changing mount for the reasons
you cite sounds reasonable as a general proposition, I'll let others comment
on whether you picked the best possible parameters definition for
mount().</quote> H. Peter Anvin also said, <quote who="H. Peter Anvin">It
seems to me that it would make more sense to introduce Viro's proposed
mount6() system call if we're going to introduce a new API.</quote>

</p><p>

Andries Brouwer (mount() maintainer) was not in favor of the proposal, and
suggested that all the problems could be solved without major changes. He
gave specific technical examples, but Matthias Andree objected, <quote
who="Matthias Andree">Enhancing the mess is no good. It's no good joining
things together that the kernel needs to separate again later either. I vote
in favor of Al's approach.</quote> Andries was not convinced, and the thread
petered out.

</p>

</section>

<section
  title="Some Discussion Of gcc/Kernel interactions"
  subject="2GIG-file"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_05/msg00256.html"
  posts="78"
  startdate="31 Jul 2000 00:00:00 -0800"
  enddate="04 Aug 2000 00:00:00 -0800"
>
<topic>Version Control</topic>

<p>

In the course of discussion, Victor Khimenko mentioned, <quote who="Victor
Khimenko">If RELEASED gcc miscompiles kernel it's kernel problem (BTW I've
using gcc 2.95 compiled 2.2.x kernels for last year without problems). If
UNSTABLE gcc miscompiles kernel then it's not even kernel issue ...</quote>
To the 'released gcc' proposition, Linus Torvalds replied, <quote who="Linus
Torvalds">Not always. There have been gcc releases that are buggy too.
Sometimes the kernel ends up having work-arounds. Sometimes the end result
is to tell people not to use them.</quote> And to the 'unstable gcc'
proposition, he went on:

</p>

<quote who="Linus Torvalds">

<p>

Not necessarily true either.  Quite often new
compilers just do optimizations that were always legal but just didn't
trigger, and nobody noticed some bug in the kernel. So even a new snapshot
of gcc may be fine, and miscompile the kernel even so. I'll try to fix the
kernel asap, of course (sometimes that fix is to simply disable an
optimization that isn't appropriate for the kernel - this was the case with
the strict alias analysis code, for example).

</p><p>

It _sounds_ like gcc-2.96 is just not quite stable. Somebody claimed that
the new 2.96-based one in 7.0beta was ok again. I certainly know of people
using the latest CVS snapshots to compile the kernel, and it can often be a
case of "it works for them" and then end up that some other configuration of
Linux might show problems.

</p><p>

It's not a clear-cut problem. There have certainly been bugs in both gcc and
the kernel, in all combinations of "stable vs experimental".

</p>

</quote>

<p>

For more on 'gcc' issues, see <kcref subject="gcc-2.7.2.3 warnings [PATCH]"
startdate="14 Jul 2000 00:00:00 -0800"></kcref>.

</p>

</section>

<section
  title="Twisted VM Tweaking"
  subject="kupdate, high CPU usage"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_05/msg00260.html"
  posts="7"
  startdate="31 Jul 2000 00:00:00 -0800"
  enddate="01 Aug 2000 00:00:00 -0800"
>
<topic>Virtual Memory</topic>

<p>

At one point Rik van Riel said, <quote who="Rik van Riel">Disabling kupdate
or kflushd is dangerous to your data and should never ever be done.</quote>
Andrea Arcangeli replied:

</p>

<quote who="Andrea Arcangeli">

<p>

Disabling kflushd can impact the stabiliy of the system.
 
</p><p>

Disabling kupdate is useful on the airplane to save battery power 8). (do
it at your own risk of course)

</p>

</quote>

<p>

And Rik said:

</p>

<quote who="Rik van Riel">

<p>

You don't want to disable it.
 
</p><p>

Just set it to one-hour wakeup intervals so it'll flush every piece of data
once per hour ;)

</p>

</quote>

<p>

The friendliest exchange Rik and Andrea have shared in awhile...

</p>

</section>

<section
  title="Status Of Crypto Patches"
  subject="Crypto"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00171.html"
  posts="66"
  startdate="01 Aug 2000 00:00:00 -0800"
  enddate="05 Aug 2000 00:00:00 -0800"
>

<p>

Cindy Cohn asked if cryptography would be included in the mainstream
sources, and at one point Sandy Harris said, <quote who="Sandy Harris">As I
recall, someone from kernel.org said a few months back that their lawyers
were looking at this. Anyone know the results?</quote> H. Peter Anvin
replied:

</p>

<quote who="H. Peter Anvin">

<p>

Indeed we do :) The current policy on kernel.org now is that cryptographic
software is OK as long as it's Open Source and the source is available on
kernel.org itself.

</p><p>

The "no government end user" restriction -- which we were originally very
concerned about -- turns out not to apply for Open Source software. Also,
the BXA seems to have recently expressed "intent to clarify" (<a
href="http://www.bxa.doc.gov/Encryption/July2KProposedRegSum.html">http://www.bxa.doc.gov/Encryption/July2KProposedRegSum.html</a>)
that the Open Source exception applies as well to "object code compiled from
source code that is considered publically available".

</p>

</quote>

</section>

<section
  title="Ancient ext2 Race Uncovered"
  subject="BUG in ext2"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00765.html"
  posts="6"
  startdate="05 Aug 2000 00:00:00 -0800"
  enddate="05 Aug 2000 00:00:00 -0800"
>
<topic>FS: ext2</topic>

<mention>Andrew Morton</mention>

<p>

Andrew Morton was getting repeatable assertion failures on 2.4.0-test6-pre2
in the ext2 block allocation code, and Andreas Dilger replied:

</p>

<quote who="Andreas Dilger">

<p>

Unfortunately, the whole ext2 block allocation code was re-written recently
by Al Viro for test6-pre1, and it looks like it has bugs (see also thread
&lt;test6-pre2 loop in ext2_get_block&gt;)... I understand that there may
have been some locking problems with the old code because of the VFS
re-design, but it seems like a bad move IMHO to change such an important
piece of code in a drastic way right now.

</p><p>

I think the majority of the change was a FEATURE to have zero-locking block
allocation and while this itself is a good thing, RIGHT NOW is not the time
to do it. My online resize patch, which is by far less intrusive since it is
only called at mount time and resize time and is mostly just moving existing
code into a subroutine, was rejected (rightfully so) because ext2 is too
important to break at this late date.

</p><p>

If it were up to me, I'd back out this patch and fix only the minimum
required areas.

</p>

</quote>

<p>

To the idea that the changes were to implement a feature and not to fix a
bug, Alexander Viro replied, <quote who="Alexander Viro">No, it was not. The
reason of the change was to close several bad (read: fs-corrupting) races in
ext2. Locking didn't change, BTW - it's still under BKL. List of the crap
that required that fixing will be posted as soon as fix will be in 2.2 - I'm
not too happy about posting "here's how user nobody can chew the fs, fsck
quotas, eat reserved blocks and panic your box" recipes. If you want it
right now - ask and I'll send it off-list. BTW, if you volunteer to help
with minix/sysv/UFS - be my guest, they require the same bunch of
fixes.</quote>

</p><p>

Andrew also replied to Andreas, saying that he suspected the problem was not
as bad as Andreas feared. He posted some data, and Alexander pegged it as a
slow block leak, adding, <quote who="Alexander Viro">that leak had been
there since long (I'm afraid that "long" may be something about '93). Oh,
well, one more race that needs fixing...</quote>

</p>

</section>

<section
  title="VM Hangs On For 2.5"
  subject="[PATCH] lock troubles in pre6-2"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00789.html"
  posts="9"
  startdate="05 Aug 2000 00:00:00 -0800"
  enddate="06 Aug 2000 00:00:00 -0800"
>
<topic>Virtual Memory</topic>

<mention>David S. Miller</mention>
<mention>Rusty Russell</mention>

<p>

Paul Rusty Russell posted a patch, David S. Miller pointed out a problem. Rusty
replied:

</p>

<quote who="Paul Rusty Russell">

<p>

Errr... Yeah, I guess I'll take your word for it, because I can't follow
that code at all 8(. I see that try_to_swap_out() does an unlock without a
lock anywhere in sight, but I can't see the path between this and
swap_out_mm().

</p><p>

Please: am I too stupid to understand this, or is the code a convoluted
mess?

</p><p>

Must be this warm English beer.

</p>

</quote>

<p>

Rik van Riel explained:

</p>

<quote who="Rik van Riel">

<p>

It's not the English beer that's bothering you (well, maybe it is, but it's
not the cause of this particular itch).

</p><p>

I'm ashamed to admit that the VM code still is a horrible mess, but code
readability will be a major goal in the new VM implementation.

</p>

</quote>

<p>

For more on the VM situation, see <kcref subject="2.4 / 2.5 VM plans"
startdate="25 Jun 2000 00:00:00 -0800"></kcref>.

</p>

</section>

<section
  title="Building XFS; Some Experiences With Other FSes"
  subject="how to actually build SGI's xfs?"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00814.html"
  posts="9"
  startdate="05 Aug 2000 00:00:00 -0800"
  enddate="06 Aug 2000 00:00:00 -0800"
>
<topic>FS: JFS</topic>
<topic>FS: ReiserFS</topic>
<topic>FS: XFS</topic>

<mention>Keith Owens</mention>

<p>

Jeremy Hansen was anxious to try out XFS, but couldn't find any docs on how
to actually build the filesystem. There didn't seem to be any XFS-related
mailing lists on the SGI site, so he posted to linux-kernel. Keith Owens
pointed him to <a href="http://oss.sgi.com/projects/xfs/index.html">a
general info page</a> and <a
href="http://oss.sgi.com/projects/xfs/mail.html">a mailing list page</a>,
but there was no reply. James Lewis Nance was also interested in getting XFS
to work, and mentioned peripherally:

</p>

<quote who="James Lewis Nance">

<p>

I spent last week playing with different file
systems. Reiserfs and jfs patch and compile w/o too much trouble. Xfs and
NWFS seem to be missing good instructions on exactly how to patch them into
the kernel, so I did not play with them.

</p><p>

BTW, I was quite impressed with JFS.  Its definitly not ready for production
yet, but its seems slightly faster than reiserfs (for my single benchmark,
which is to build mozilla on the fs) and its only 2/3 of the size of
reiserfs.

</p>

</quote>

<p>

He gave a link to <a
href="http://oss.software.ibm.com/developerworks/opensource/jfs/index.html">IBM's
JFS page</a>, but Andi Kleen replied, <quote who="Andi Kleen">The Linux
implementation of JFS is not journaled yet, so it is very likely to be
faster than reiserfs for meta data intensive operations (like creating lots
of small files in a compile) because it doesn't do any journal IO.</quote>

</p>

</section>

</kc>
