<?xml version="1.0" ?>

<kc>

<title>Kernel Traffic</title>

<author contact="mailto:zbrown@tumblerings.org">Zack Brown</author>

<issue num="81" date="21 Aug 2000 00:00:00 -0800" />

<intro>

<p>

I'd like to thank Robert Szokovacs for finding a serious bug with the new
indexing feature, in which index links from printer-friendly issues all gave
404s. That's been fixed, and many thanks for the report!

</p><p>

Thanks also go to Robert Casties for feedback on how the index links should
display in the printer-friendly versions. They now don't expand to show the
full path, the way other links do in printer-friendly pages. Thanks for the
suggestion, Robert!

</p><p>

Joe Buehler reported a silly typo in one of the section titles last week
(I'd used the nonword "Consistancy"). Thanks Joe! ;-)

</p>

</intro>

<stats posts="1891" size="8585" contrib="551" multiples="249" lastweek="184">

<person posts="64" size="274" who="Linus Torvalds " />
<person posts="62" size="183" who="Alan Cox " />
<person posts="53" size="178" who="Alexander Viro " />
<person posts="44" size="162" who="Mo McKinlay " />
<person posts="37" size="318" who="Arnaldo Carvalho de Melo " />
<person posts="28" size="96" who="Pavel Machek " />
<person posts="28" size="91" who="Michael Rothwell " />
<person posts="26" size="239" who="Rik van Riel " />
<person posts="26" size="104" who="James Sutherland " />
<person posts="26" size="89" who="Andre Hedrick " />
<person posts="26" size="83" who="Jamie Lokier " />
<person posts="26" size="83" who="Matthew Wilcox " />
<person posts="23" size="93" who=" (Kai Henningsen)" />
<person posts="22" size="92" who="&quot;Jeff V. Merkey&quot; " />
<person posts="22" size="83" who="&quot;Theodore Y. Ts'o&quot; " />
<person posts="22" size="76" who="Jens Axboe " />
<person posts="21" size="119" who="&quot;Mike A. Harris&quot; " />
<person posts="20" size="98" who="Tigran Aivazian " />
<person posts="20" size="71" who="&quot;Andi Kleen&quot; " />
<person posts="17" size="60" who="Jeff Garzik " />
<person posts="16" size="85" who="Andrew Morton " />
<person posts="16" size="65" who="David Ford " />
<person posts="15" size="74" who=" (Rogier Wolff)" />
<person posts="15" size="73" who="&quot;Theodore Ts'o&quot; " />
<person posts="15" size="66" who=" (Linus Torvalds)" />
<person posts="15" size="54" who="Philipp Rumpf " />
<person posts="15" size="48" who="Jes Sorensen " />
<person posts="14" size="65" who="Russell King " />
<person posts="14" size="47" who="Mike Galbraith " />
<person posts="13" size="43" who="David Woodhouse " />
<person posts="13" size="39" who="&quot;David S. Miller&quot; " />
<person posts="13" size="38" who="Igmar Palsenberg " />
<person posts="12" size="67" who="Michael Rothwell " />
<person posts="12" size="36" who="Keith Owens " />
<person posts="11" size="86" who="Urban Widmark " />
<person posts="11" size="55" who="Michael W Zappe " />
<person posts="11" size="43" who="&quot;H. Peter Anvin&quot; " />
<person posts="10" size="43" who="Michael Rothwell " />
<person posts="9" size="40" who="Andrea Arcangeli " />
<person posts="9" size="32" who="James Simmons " />
<person posts="9" size="29" who="Chris Wedgwood " />
<person posts="9" size="29" who="Marcelo Tosatti " />
<person posts="9" size="27" who="David Woodhouse " />
<person posts="8" size="106" who="Dawson Engler " />
<person posts="8" size="44" who="Horst von Brand " />
<person posts="8" size="42" who="Byron Stanoszek " />
<person posts="8" size="38" who="Bill Wendling " />
<person posts="8" size="31" who="Abramo Bagnara " />
<person posts="8" size="26" who="&quot;Maciej W. Rozycki&quot; " />
<person posts="8" size="25" who="" />
<person posts="7" size="26" who="Roger Gammans " />
<person posts="7" size="26" who="Benno Senoner " />
<person posts="7" size="25" who="Ingo Molnar " />
<person posts="7" size="25" who="Mikael Pettersson " />
<person posts="7" size="24" who="Marc Lehmann " />
<person posts="7" size="22" who="Matthias Andree " />
<person posts="7" size="22" who="&quot;Garst R. Reese&quot; " />
<person posts="7" size="20" who="Andries Brouwer " />
<person posts="7" size="17" who="Dan Hollis " />
<person posts="6" size="42" who="Vojtech Pavlik " />
<person posts="6" size="38" who="Christoph Hellwig " />
<person posts="6" size="35" who="Jesse Pollard " />
<person posts="6" size="27" who="Rob Landley " />
<person posts="6" size="25" who="Nathan Straz " />
<person posts="6" size="24" who="ADAM Sulmicki " />
<person posts="6" size="24" who="Bjorn Wesen " />
<person posts="6" size="21" who="Stephen Rothwell " />
<person posts="6" size="19" who="Dave Cecil " />
<person posts="6" size="18" who="Bernd Kischnick " />
<person posts="6" size="18" who="Alex Buell " />
<person posts="6" size="17" who="Michael Elizabeth Chastain " />
<person posts="6" size="16" who=" (Miquel van Smoorenburg)" />
<person posts="5" size="40" who="Rusty Russell " />
<person posts="5" size="35" who="Corin Hartland-Swann " />
<person posts="5" size="29" who="Xuan Baldauf " />
<person posts="5" size="27" who="Jakub Jelinek " />
<person posts="5" size="23" who="&quot;Adam J. Richter&quot; " />
<person posts="5" size="21" who="Ingo Oeser " />
<person posts="5" size="21" who="octave klaba " />
<person posts="5" size="21" who="Petr Vandrovec " />
<person posts="5" size="20" who="Frank van Maarseveen " />
<person posts="5" size="18" who="Chris Mason " />
<person posts="5" size="17" who="Brian Gerst " />
<person posts="5" size="16" who="Michael Peddemors " />
<person posts="5" size="15" who="Matthew Jacob " />
<person posts="5" size="14" who="Oliver Xymoron " />
<person posts="5" size="13" who="Gregory Maxwell " />
<person posts="5" size="13" who="" />
<person posts="5" size="13" who="Anton Blanchard " />
<person posts="4" size="52" who="Elmer Joandi " />
<person posts="4" size="46" who="John Silva " />
<person posts="4" size="24" who="&quot;Chris McClellen&quot; " />
<person posts="4" size="20" who="Cesar Eduardo Barros " />
<person posts="4" size="19" who="Matthew Dillon " />
<person posts="4" size="19" who="John Franklin " />
<person posts="4" size="18" who="David Gould " />
<person posts="4" size="17" who="Horst von Brand " />
<person posts="4" size="16" who="Michael Meding " />
<person posts="4" size="16" who="Donald Becker " />
<person posts="4" size="16" who="Hans Reiser " />
<person posts="4" size="15" who="&quot;Mr. James W. Laferriere&quot; " />
<person posts="4" size="15" who="&quot;Andrew Stubbs&quot; " />
<person posts="4" size="15" who="=?iso-8859-1?Q?Henrik_St=F8rner?= " />
<person posts="4" size="14" who="Oscar Roozen " />
<person posts="4" size="14" who="Trond Myklebust " />
<person posts="4" size="14" who="D Milburn " />
<person posts="4" size="13" who="" />
<person posts="4" size="13" who=" (Bob_Tracy)" />
<person posts="4" size="12" who="Thomas Molina " />
<person posts="4" size="11" who="Jeff Dike " />
<person posts="4" size="10" who="Pierre Rousselet " />
<person posts="4" size="10" who="Ricky Beam " />
<person posts="4" size="10" who="Justin " />
<person posts="3" size="105" who="Christoph Hellwig " />
<person posts="3" size="86" who="Philippe Troin " />
<person posts="3" size="30" who="&quot;J. Robert von Behren&quot; " />
<person posts="3" size="21" who="" />
<person posts="3" size="20" who="Neil Brown " />
<person posts="3" size="20" who="&quot;List User&quot; " />
<person posts="3" size="14" who="Dieter =?iso-8859-1?Q?N=FCtzel?= " />
<person posts="3" size="14" who="George Anzinger " />
<person posts="3" size="13" who="&quot;Ian S. Nelson&quot; " />
<person posts="3" size="13" who="Don Faulkner " />
<person posts="3" size="13" who="Michael Rothwell " />
<person posts="3" size="13" who="Larry McVoy " />
<person posts="3" size="13" who="Kanoj Sarcar " />
<person posts="3" size="12" who="&quot;Phillips, Mike&quot; " />
<person posts="3" size="12" who="Jan Dvorak " />
<person posts="3" size="12" who="Christer Weinigel " />
<person posts="3" size="11" who="David Hinds " />
<person posts="3" size="11" who="" />
<person posts="3" size="11" who="Adam Sampson " />
<person posts="3" size="11" who="Chris Meadors " />
<person posts="3" size="10" who="Charles Samuels " />
<person posts="3" size="10" who="Roman Zippel " />
<person posts="3" size="10" who="Guest section DW " />
<person posts="3" size="9" who="Yong-iL Joh " />
<person posts="3" size="9" who="&quot;H. Peter Anvin&quot; " />
<person posts="3" size="9" who="Gerhard Mack " />
<person posts="3" size="9" who="Christopher Vickery " />
<person posts="3" size="9" who="Bill Huey " />
<person posts="3" size="9" who="Chris Kloiber " />
<person posts="3" size="9" who="David Lombard " />
<person posts="3" size="9" who=" (Eugene Crosser)" />
<person posts="3" size="9" who="Wade Hampton " />
<person posts="3" size="9" who="Tim Waugh " />
<person posts="3" size="9" who="&quot;Stuart MacDonald&quot; " />
<person posts="3" size="8" who="Adam " />
<person posts="3" size="8" who="Martin Costabel " />
<person posts="3" size="8" who="&quot;Richard B. Johnson&quot; " />
<person posts="3" size="7" who="Kent Hunt " />
<person posts="3" size="7" who="Mark Lehrer " />
<person posts="3" size="7" who="Meino Christian Cramer " />
<person posts="2" size="96" who="Wallace Huang " />
<person posts="2" size="42" who="Rok Papez " />
<person posts="2" size="40" who="ludovic fernandez " />
<person posts="2" size="19" who="Martin Schenk " />
<person posts="2" size="17" who="Josh Huber " />
<person posts="2" size="16" who="Gregory Leblanc " />
<person posts="2" size="15" who="Miles Lane " />
<person posts="2" size="15" who="James Stevenson " />
<person posts="2" size="14" who="Adam McKenna " />
<person posts="2" size="13" who="Shane Shrybman " />
<person posts="2" size="12" who="Matthew Darwin " />
<person posts="2" size="11" who="Christian Bricart " />
<person posts="2" size="10" who="Daniel Phillips " />
<person posts="2" size="10" who="Bob Taylor " />
<person posts="2" size="9" who="" />
<person posts="2" size="9" who="&quot;Strahm, Bill&quot; " />
<person posts="2" size="9" who="Keir Fraser " />
<person posts="2" size="9" who="Zoran Davidovac " />
<person posts="2" size="9" who="Rares Marian " />
<person posts="2" size="9" who="&quot;Albert D. Cahalan&quot; " />
<person posts="2" size="8" who="Ulrich Drepper " />
<person posts="2" size="8" who=" (Aaron Denney)" />
<person posts="2" size="8" who="khromy " />
<person posts="2" size="8" who="Rasmus Andersen " />
<person posts="2" size="8" who="kevin " />
<person posts="2" size="8" who="=?ISO-8859-1?Q?G=E9rard_Roudier?= " />
<person posts="2" size="8" who="Jesse Pollard " />
<person posts="2" size="8" who="Matt Spong " />
<person posts="2" size="8" who="Andrew Pimlott " />
<person posts="2" size="8" who="Andreas Dilger " />
<person posts="2" size="8" who="Carsten Lang " />
<person posts="2" size="8" who="Bernd Kischnick " />
<person posts="2" size="8" who="Doug Ledford " />
<person posts="2" size="8" who="Paul Vojta " />
<person posts="2" size="8" who="" />
<person posts="2" size="8" who="Kenneth J Baker " />
<person posts="2" size="7" who="Michael Westermann " />
<person posts="2" size="7" who="Frank da Cruz " />
<person posts="2" size="7" who="Christoph Egger " />
<person posts="2" size="7" who="Alex Belits " />
<person posts="2" size="7" who="&quot;Rob Taylor&quot; " />
<person posts="2" size="7" who="James Gosnell " />
<person posts="2" size="7" who="Richard Torkar " />
<person posts="2" size="7" who="Simon Richter " />
<person posts="2" size="7" who="Alexander Viro " />
<person posts="2" size="7" who="&quot;Petr Vandrovec&quot; " />
<person posts="2" size="7" who="Greg KH " />
<person posts="2" size="7" who="&quot;Nikolaiev, Mike&quot; " />
<person posts="2" size="7" who="Bill Maidment " />
<person posts="2" size="7" who="Frank Mehnert " />
<person posts="2" size="7" who="&quot;A. Hook&quot; " />
<person posts="2" size="6" who="Miquel van Smoorenburg " />
<person posts="2" size="6" who="&quot;Leeuw van der, Tim&quot; " />
<person posts="2" size="6" who="=?iso-8859-1?q?Adrian=20Baugh?= " />
<person posts="2" size="6" who="Johannes Erdfelt " />
<person posts="2" size="6" who="Cedric Ware " />
<person posts="2" size="6" who="Greg KH " />
<person posts="2" size="6" who="Jens Taprogge " />
<person posts="2" size="6" who="Paul Gearon " />
<person posts="2" size="6" who="Zack Brown " />
<person posts="2" size="6" who="Bill Maidment " />
<person posts="2" size="6" who="Norbert Tretkowski " />
<person posts="2" size="6" who="Dan Aloni " />
<person posts="2" size="6" who="Christoph Rohland " />
<person posts="2" size="6" who=" (Erik Mouw)" />
<person posts="2" size="6" who="Matthew Hawkins " />
<person posts="2" size="6" who="Alessandro Suardi " />
<person posts="2" size="6" who="Young-Ho Cha " />
<person posts="2" size="6" who="Jeff Hoffman " />
<person posts="2" size="6" who="Frank Davis " />
<person posts="2" size="6" who="Anton Petrusevich " />
<person posts="2" size="6" who="Lincoln Dale " />
<person posts="2" size="6" who="Steve Dodd " />
<person posts="2" size="6" who="dean gaudet " />
<person posts="2" size="6" who="Jason Venner " />
<person posts="2" size="6" who="" />
<person posts="2" size="6" who="&quot;Robert H. de Vries&quot; " />
<person posts="2" size="6" who="Arjan van de Ven " />
<person posts="2" size="6" who="Sven Koch " />
<person posts="2" size="6" who=" (Arjan van de Ven)" />
<person posts="2" size="5" who="" />
<person posts="2" size="5" who="Pau Aliagas " />
<person posts="2" size="5" who=" (Henrik =?ISO-8859-1?Q?St=F8rner?=)" />
<person posts="2" size="5" who="&quot;John Hayward-Warburton (Programming account)&quot; " />
<person posts="2" size="5" who="=?iso-8859-15?Q?Andr=E9_Dahlqvist?= " />
<person posts="2" size="5" who="Wakko Warner " />
<person posts="2" size="5" who="Willis Sarka III " />
<person posts="2" size="5" who="&quot;William Scott Lockwood III&quot; " />
<person posts="2" size="5" who=" (Arjan van de Ven)" />
<person posts="2" size="5" who="Agust Karlsson " />
<person posts="2" size="5" who="Tom Leete " />
<person posts="2" size="5" who="Ralf Baechle " />
<person posts="2" size="5" who="&quot;Niall Gormley&quot; " />
<person posts="2" size="4" who="clubneon " />
<person posts="2" size="4" who="Carrer Yuri " />
<person posts="2" size="4" who="" />
<person posts="1" size="97" who="Antonello Biancalana " />
<person posts="1" size="56" who="Sandy Harris " />
<person posts="1" size="41" who="Dimitris Michailidis " />
<person posts="1" size="39" who="dr john halewood " />
<person posts="1" size="34" who="Seth Andrew Hallem " />
<person posts="1" size="26" who="&quot;Theodore Ts'o&quot; " />
<person posts="1" size="22" who="Michael Zappe " />
<person posts="1" size="20" who="&quot;Tim N . van der Leeuw&quot; " />
<person posts="1" size="20" who="David Gibson " />
<person posts="1" size="18" who="Nils Faerber " />
<person posts="1" size="17" who="Mircea Damian " />
<person posts="1" size="16" who="Lawrence Walton " />
<person posts="1" size="16" who="" />
<person posts="1" size="15" who="&quot;Sheldon Easterbrook&quot; " />
<person posts="1" size="13" who="Michael W Zappe " />
<person posts="1" size="12" who="" />
<person posts="1" size="12" who="Serguei Miridonov " />
<person posts="1" size="12" who="bug1 " />
<person posts="1" size="12" who="Joseph Elwell " />
<person posts="1" size="12" who="Fred Feirtag " />
<person posts="1" size="11" who="&quot;Madan A S&quot; " />
<person posts="1" size="11" who="Matthew Kirkwood " />
<person posts="1" size="10" who="Paul Gortmaker " />
<person posts="1" size="10" who="Gary Lawrence Murphy " />
<person posts="1" size="9" who="&quot;Justin C. Ferguson&quot; " />
<person posts="1" size="7" who="Anders Fugmann " />
<person posts="1" size="7" who="Diogo Zulli " />
<person posts="1" size="7" who=" (Rogier Wolff)" />
<person posts="1" size="7" who="Aaron Laffin " />
<person posts="1" size="7" who="Benjamin Redelings I " />
<person posts="1" size="7" who="=?big5?B?2HDYcA==?= " />
<person posts="1" size="7" who="&quot;Joe Pranevich&quot; " />
<person posts="1" size="7" who="&quot;Steve Cooper&quot; " />
<person posts="1" size="6" who="Andrew Sharp " />
<person posts="1" size="6" who="" />
<person posts="1" size="6" who="&quot;Skipper&quot; " />
<person posts="1" size="6" who="Kurt Garloff " />
<person posts="1" size="6" who=" (Jaroslaw Miszkinis)" />
<person posts="1" size="6" who="&quot;Eric S. Raymond&quot; " />
<person posts="1" size="6" who="KMF AV " />
<person posts="1" size="5" who="Anton Ghiugan " />
<person posts="1" size="5" who="Martin Tessun " />
<person posts="1" size="5" who="Steve Lord " />
<person posts="1" size="5" who="" />
<person posts="1" size="5" who=" (Steven S. Dick)" />
<person posts="1" size="5" who="aris " />
<person posts="1" size="5" who="Giuliano Pochini " />
<person posts="1" size="5" who="Programas completos para su PC " />
<person posts="1" size="5" who="Marc SCHAEFER " />
<person posts="1" size="5" who="John Kennedy " />
<person posts="1" size="5" who="Steffen Seeger " />
<person posts="1" size="5" who="David Luyer " />
<person posts="1" size="5" who="Bryan Paxton " />
<person posts="1" size="5" who="Jan Kara " />
<person posts="1" size="5" who="Udo Held " />
<person posts="1" size="4" who="Jan-Benedict Glaw " />
<person posts="1" size="4" who="Thomas Zehetbauer " />
<person posts="1" size="4" who="" />
<person posts="1" size="4" who="Robert Norris " />
<person posts="1" size="4" who="Adrian Baugh " />
<person posts="1" size="4" who="Ruth Ivimey-Cook " />
<person posts="1" size="4" who="Mark McClelland " />
<person posts="1" size="4" who="Tim Timmerman " />
<person posts="1" size="4" who="Nathan Hand " />
<person posts="1" size="4" who="Timothy Knox " />
<person posts="1" size="4" who="&quot;Carlo E. Prelz&quot; " />
<person posts="1" size="4" who="Scott Henry " />
<person posts="1" size="4" who="&quot;Kevin Winchester&quot; " />
<person posts="1" size="4" who="Richard Guy Briggs " />
<person posts="1" size="4" who="Rob Newberry " />
<person posts="1" size="4" who="&quot;Darrell Wright&quot; " />
<person posts="1" size="4" who="Chuck Lever " />
<person posts="1" size="4" who="&quot;Timothy A. DeWees&quot; " />
<person posts="1" size="4" who="Malcolm Beattie " />
<person posts="1" size="4" who="David Lawyer " />
<person posts="1" size="4" who=" (Jim Gettys)" />
<person posts="1" size="4" who=" (Jonathan Corbet)" />
<person posts="1" size="4" who="Jeff Mcadams " />
<person posts="1" size="4" who="Stefan Traby " />
<person posts="1" size="4" who="Roger Gammans " />
<person posts="1" size="4" who="Harald Welte " />
<person posts="1" size="3" who="Rogerio Brito " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="&quot;Pekka Riikonen [Adm]&quot; " />
<person posts="1" size="3" who="Amit D Chaudhary " />
<person posts="1" size="3" who="David Dyck " />
<person posts="1" size="3" who="TimO " />
<person posts="1" size="3" who="&quot;Anthony Barbachan&quot; " />
<person posts="1" size="3" who="Michael Meissner " />
<person posts="1" size="3" who="Bernhard Pelz " />
<person posts="1" size="3" who="Mark Cooke " />
<person posts="1" size="3" who="William Stearns " />
<person posts="1" size="3" who="Matthew Darwin " />
<person posts="1" size="3" who="Jan Kara " />
<person posts="1" size="3" who="Dirk Hohndel " />
<person posts="1" size="3" who="&quot;Grover, Andrew&quot; " />
<person posts="1" size="3" who="Kunihiko IMAI " />
<person posts="1" size="3" who="Gabor Lenart " />
<person posts="1" size="3" who="&quot;Raj, Ashok&quot; " />
<person posts="1" size="3" who="Michael Poole " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="Sam Watters " />
<person posts="1" size="3" who="Athanasius " />
<person posts="1" size="3" who="Jonathan Corbet " />
<person posts="1" size="3" who="David Mansfield " />
<person posts="1" size="3" who="Andrey Savochkin " />
<person posts="1" size="3" who="&quot;Dunlap, Randy&quot; " />
<person posts="1" size="3" who="Drew Sanford " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="&quot;B. Evans&quot; " />
<person posts="1" size="3" who="Zdenek Kabelac " />
<person posts="1" size="3" who="&quot;Pat O'Rourke&quot; " />
<person posts="1" size="3" who="Martin MaD Douda " />
<person posts="1" size="3" who="&quot;Michael T. Babcock&quot; " />
<person posts="1" size="3" who="Andrew Pochinsky " />
<person posts="1" size="3" who="Andreas Bombe " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="Sasa Ostrouska " />
<person posts="1" size="3" who="CaT " />
<person posts="1" size="3" who="Oliver Neukum " />
<person posts="1" size="3" who="&quot;Stephen R. Gore&quot; " />
<person posts="1" size="3" who="Toon van der Pas " />
<person posts="1" size="3" who="Uman " />
<person posts="1" size="3" who="Erik Andersen " />
<person posts="1" size="3" who="Riley Williams " />
<person posts="1" size="3" who="&quot;Jonathan Day&quot; " />
<person posts="1" size="3" who="Tobias =?iso-8859-1?Q?Ringstr=F6m?= " />
<person posts="1" size="3" who="Martin Dalecki " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="Nigel Gamble " />
<person posts="1" size="3" who="Michal Jaegermann " />
<person posts="1" size="3" who="Mark Orr " />
<person posts="1" size="3" who="Takashi Oe " />
<person posts="1" size="3" who="Andreas Haumer " />
<person posts="1" size="3" who="Bert de Bruijn " />
<person posts="1" size="3" who="Jerry Frana " />
<person posts="1" size="3" who="Wayne Pascoe " />
<person posts="1" size="3" who="Jan Behrend " />
<person posts="1" size="3" who="Allan Duncan " />
<person posts="1" size="3" who="&quot;Mark H. Wood&quot; " />
<person posts="1" size="3" who="Joel Jaeggli " />
<person posts="1" size="3" who="&quot;Michael Rothwell&quot; " />
<person posts="1" size="3" who=" (Graham Stoney)" />
<person posts="1" size="3" who="&quot;Amit S. Kale&quot; " />
<person posts="1" size="3" who="Bernd Eckenfels " />
<person posts="1" size="3" who="Daniel Phillips " />
<person posts="1" size="3" who="&quot;Michael Zappe&quot; " />
<person posts="1" size="3" who="=?iso-8859-1?Q?H=E5vard?= Garnes " />
<person posts="1" size="3" who="john halewood " />
<person posts="1" size="3" who="" />
<person posts="1" size="3" who="herman dumont " />
<person posts="1" size="3" who="john slee " />
<person posts="1" size="3" who="&quot;Grzegorz A. Wieczorek&quot; " />
<person posts="1" size="3" who="=?iso-8859-1?Q?S=E9bastien=20C=F4t=E9?= " />
<person posts="1" size="3" who="Robert Broughton " />
<person posts="1" size="3" who="Eric Buddington " />
<person posts="1" size="3" who="&quot;Khimenko Victor&quot; " />
<person posts="1" size="3" who="Marco d'Itri " />
<person posts="1" size="3" who="&quot;Vadim Lebedev&quot; " />
<person posts="1" size="3" who="Jorge Nerin " />
<person posts="1" size="3" who="root " />
<person posts="1" size="3" who="Mark Kettenis " />
<person posts="1" size="3" who="&quot;Andrew Smart&quot; " />
<person posts="1" size="3" who="Christian Ehrhardt " />
<person posts="1" size="3" who="&quot;James H. Cloos Jr.&quot; " />
<person posts="1" size="3" who="Andreas Schwab " />
<person posts="1" size="3" who="Ben OShea " />
<person posts="1" size="3" who="&quot;Frank Jacobberger&quot; " />
<person posts="1" size="3" who="&quot;Kasatenko Ivan Alex.&quot; " />
<person posts="1" size="3" who="Peter Enderborg " />
<person posts="1" size="3" who="&quot;N. D. Culver&quot; " />
<person posts="1" size="3" who="david " />
<person posts="1" size="3" who="Jeff Epler " />
<person posts="1" size="3" who="stulle " />
<person posts="1" size="3" who="&quot;David Schwartz&quot; " />
<person posts="1" size="3" who="&quot;Christopher E. Brown&quot; " />
<person posts="1" size="3" who="Daniel Stone " />
<person posts="1" size="3" who="Roman Zippel " />
<person posts="1" size="3" who="=?ISO-2022-JP?B?GyRCRmBERUh+GyhC?= " />
<person posts="1" size="3" who="Bob Gustafson " />
<person posts="1" size="3" who="&quot;Bernd Jendrissek&quot; " />
<person posts="1" size="3" who="Andrew Ryan " />
<person posts="1" size="3" who="Timothy Roscoe " />
<person posts="1" size="3" who="Aaron Macks " />
<person posts="1" size="3" who=" (Mike Civil)" />
<person posts="1" size="3" who="Martin Mares " />
<person posts="1" size="3" who="Jeffry McNeil " />
<person posts="1" size="3" who="Dionysius Wilson Almeida " />
<person posts="1" size="3" who="Christian Bricart " />
<person posts="1" size="3" who="Ben Collins " />
<person posts="1" size="3" who="Camm Maguire " />
<person posts="1" size="3" who="Prasanna Subash " />
<person posts="1" size="3" who="Derek Fawcus " />
<person posts="1" size="3" who="Meelis Roos " />
<person posts="1" size="3" who="Andreas Gruenbacher " />
<person posts="1" size="3" who="Jean-Eric Cuendet " />
<person posts="1" size="3" who="Sasa Ostrouska " />
<person posts="1" size="3" who="Heinz Diehl " />
<person posts="1" size="3" who="Adrian Cox " />
<person posts="1" size="3" who="HA Quoc-Viet " />
<person posts="1" size="3" who="BenHanokh Gabriel " />
<person posts="1" size="3" who="Peter Chubb " />
<person posts="1" size="3" who="German Jose Gomez Garcia " />
<person posts="1" size="3" who="Nathan Simons " />
<person posts="1" size="3" who="&quot;J. S. Connell&quot; " />
<person posts="1" size="2" who="Davidovac Zoran " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="Ville Voutilainen " />
<person posts="1" size="2" who="&quot;Lee Mitchell&quot; " />
<person posts="1" size="2" who="MK Elektronik " />
<person posts="1" size="2" who="&quot;Dr. Kelsey Hudson&quot; " />
<person posts="1" size="2" who="&quot;Bonds, Deanna&quot; " />
<person posts="1" size="2" who=" (Robert Broughton)" />
<person posts="1" size="2" who="Dave Porta " />
<person posts="1" size="2" who="Derek Martin " />
<person posts="1" size="2" who="Andrew Clausen " />
<person posts="1" size="2" who="Christoph Hellwig " />
<person posts="1" size="2" who="Sean Harding " />
<person posts="1" size="2" who="f5ibh " />
<person posts="1" size="2" who="Russell Coker " />
<person posts="1" size="2" who="Ari Pollak " />
<person posts="1" size="2" who="Kay Salzwedel " />
<person posts="1" size="2" who="Borislav Deianov " />
<person posts="1" size="2" who="&quot;Benjamin C.R. LaHaise&quot; " />
<person posts="1" size="2" who=" (Robert Collier)" />
<person posts="1" size="2" who="&quot;Eric tse&quot; " />
<person posts="1" size="2" who=" (Kees Bakker)" />
<person posts="1" size="2" who="Derek J Witt " />
<person posts="1" size="2" who="Niels Kristian Bech Jensen " />
<person posts="1" size="2" who="&quot;Chris Ross&quot; " />
<person posts="1" size="2" who="&quot;Darrell Wright&quot; " />
<person posts="1" size="2" who="Al Borchers " />
<person posts="1" size="2" who="&quot;jack jack&quot; " />
<person posts="1" size="2" who="MPInet User " />
<person posts="1" size="2" who="=?iso-8859-1?Q?Bengt_G=F6rd=E9n?= " />
<person posts="1" size="2" who="Matt " />
<person posts="1" size="2" who="Phil " />
<person posts="1" size="2" who="Philip Blundell " />
<person posts="1" size="2" who="Chris Quinn " />
<person posts="1" size="2" who=" (Hans-Joachim Baader)" />
<person posts="1" size="2" who="&quot;Vernon H. Soden&quot; " />
<person posts="1" size="2" who="David Caswell " />
<person posts="1" size="2" who="&quot;Johannes Richter&quot; " />
<person posts="1" size="2" who="Andrea Glorioso " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="Adrian Bunk " />
<person posts="1" size="2" who="Mahesh Mahadevan " />
<person posts="1" size="2" who="&quot;Yaroslav S. Polyakov&quot; " />
<person posts="1" size="2" who="&quot;Kaj-Michael Lang&quot; " />
<person posts="1" size="2" who="Dax Kelson " />
<person posts="1" size="2" who="Philip Armstrong " />
<person posts="1" size="2" who="&quot;lhyang&quot; " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="Marc MERLIN " />
<person posts="1" size="2" who="Anton Petrusevich " />
<person posts="1" size="2" who="Anandavel P " />
<person posts="1" size="2" who="&quot;Ryan M. Hager&quot; " />
<person posts="1" size="2" who="Chris Loveland " />
<person posts="1" size="2" who="&quot;Adam Watson&quot; " />
<person posts="1" size="2" who="&quot;Alexander V. Valys&quot; " />
<person posts="1" size="2" who="Andreas Tobler " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="&quot;Eric S. Raymond&quot; " />
<person posts="1" size="2" who="&quot;Raymond Miller&quot; " />
<person posts="1" size="2" who="&quot;Jeong Hwan Park&quot; " />
<person posts="1" size="2" who="Stephen Frost " />
<person posts="1" size="2" who="Craig Ruff " />
<person posts="1" size="2" who="michael " />
<person posts="1" size="2" who="Jeff Garzik " />
<person posts="1" size="2" who="Olaf Titz " />
<person posts="1" size="2" who="Alan Cox " />
<person posts="1" size="2" who="&quot;Aamir Shaikh&quot; " />
<person posts="1" size="2" who="Syd Alsobrook " />
<person posts="1" size="2" who="Dan Mueth " />
<person posts="1" size="2" who="&quot;magiwa.com&quot; " />
<person posts="1" size="2" who="Petko Manolov " />
<person posts="1" size="2" who="Mitchell Blank Jr " />
<person posts="1" size="2" who="Mike Frisch " />
<person posts="1" size="2" who="Malte Thoma " />
<person posts="1" size="2" who="&quot;wu_yb&quot; " />
<person posts="1" size="2" who="&quot;John B. Jacobsen&quot; " />
<person posts="1" size="2" who="jovish jose " />
<person posts="1" size="2" who="Stephen Torri " />
<person posts="1" size="2" who="Aleksandr Koltsoff " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="Reto Baettig " />
<person posts="1" size="2" who="Geoffrey Gallaway " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="Matt Kemner " />
<person posts="1" size="2" who="Sid Boyce " />
<person posts="1" size="2" who="Lars Callenbach " />
<person posts="1" size="2" who="Alejandro Conty " />
<person posts="1" size="2" who="" />
<person posts="1" size="2" who="ebi4 " />
<person posts="1" size="2" who="Velizar Bodoursky " />
<person posts="1" size="2" who="Alessandro Rubini " />
<person posts="1" size="2" who="Mark Hahn " />
<person posts="1" size="2" who=" (Chris Good)" />
<person posts="1" size="2" who="Aaron Tiensivu " />
<person posts="1" size="2" who="&quot;Csaba Nemeth&quot; " />
<person posts="1" size="2" who="Harley Privitera " />

</stats>

<section
  title="Ramdisks, Compression, Embedded Systems, Loopback, And The VM Situation"
  subject="Do ramdisk exec's map direct to buffer cache?"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0007_02/msg00502.html"
  posts="81"
  startdate="12 Jul 2000 00:00:00 -0800"
  enddate="11 Aug 2000 00:00:00 -0800"
>
<topic>BSD: FreeBSD</topic>
<topic>Compression</topic>
<topic>FS: ReiserFS</topic>
<topic>FS: XFS</topic>
<topic>FS: ext2</topic>
<topic>FS: ext3</topic>
<topic>FS: ramfs</topic>
<topic>Small Systems</topic>
<topic>Virtual Memory</topic>

<mention>Andrea Arcangeli</mention>
<mention>David Woodhouse</mention>
<mention>Pavel Machek</mention>
<mention>Jeff Garzik</mention>
<mention>Stephen Tweedie</mention>
<mention>Mike Galbraith</mention>
<mention>Rusty Russell</mention>

<p>A serpentine discussion.</p>

<p align="center">

<b>RAM-Based Filesystems</b>

</p>

<p>Graham Stoney started it off with a question on how to minimize RAM
requirements on embedded systems. He asked:</p>

<quote who="Graham Stoney">

<p>I know the Linux ramdisk uses the buffer cache, but when the kernel exec's a
file from the ramdisk, is it smart enough to map the virtual address space
for .text and .data directly into the buffer cache without copying?</p>

<p>Can it do a similar job when "loading" a shared library?  And if so, what
impact do shared library fixups have on the memory space used by the code of
a dynamically linked executable? Are these likely to cause a significant
number of pages to be copied-on-write?</p>

</quote>

<p>David Woodhouse was dubious about using ramdisks for embedded systems. He
felt embedded systems programmers resorted to ramdisks by default, because
there was no support yet for flash chips or flash filing systems. But he
felt a ramdisk actually wasted space when preloaded by the bootloader from
any place accessible to the kernel. He recommended JFFS on flash chips, with
ramfs for /tmp and possibly /var. To answer Graham's question, he added that
he was pretty sure ramfs did share pages directly with the buffer cache.
Linus Torvalds replied:</p>

<quote who="Linus Torvalds">

<p>ramfs does indeed share pages, and does more than that: it shares the
directory structure with the directory cache directly, so there is not any
wasted memory even in meta-data. That was one of the design goals (along
with extreme simplicity - it's the smallest and fastest filesystem around).</p>

<p>That said, ramfs isn't perfect.  It's not a "tmpfs" in that it cannot page
anything out to disk (a non-issue in embedded devices, but it would make
ramfs more useful in general use), and it's new code that isn't used by all
that many people - so it could have (and has had) problems. As 99% of the
functionality of ramfs is actually just using VFS code directly the basics
of ramfs are very solid, but the devil is in the details..</p>

<p>If you want to pre-populate ramfs (the way initrd does the old-style
ramdisk), I would suggest using a compressed tar-file approach. In the long
run I definitely want to get rid of the old-style ramdisk, as it has some
serious problems both from a design standpoint and from a maintenance angle
(the mm games it plays are much worse than the simple page-pinning stuff
that ramfs can do - ramfs plays at a much higher level and has access to
better abstractions through that).</p>

<p>The only problem with JFFS is that it doesn't do compression, so there's a
lot to be said for using cramfs if you have space constraints (and in
embedded devices, if you don't have space constraints you're doing something
wrong ;). So mixture of ramfs, cramfs and JFFS can be a good thing.</p>

</quote>

<p align="center">

<b>Compressed Filesystems</b>

</p>

<p>Bjorn Wesen (one of the JFFS developers) replied that compression was
already on their To Do list and would be fairly easy to do, in spite of
needing a few special hacks.</p>

<p>Pavel Machek was happy to hear about cramfs, and asked if there were a
read/write compressed filesystem driver available. Linus replied that as far
as he knew, there wasn't. He said, <quote who="Linus Torvalds">cramfs is
_wonderful_ as long as you don't have to write. Very simple, and very
efficient.</quote> But he went on:</p>

<quote who="Linus Torvalds">

<p>All the read-write compressions tend to also compress much less than
something like cramfs, because the metadata requirements for a read-write
filesystem are usually a lot stricter and more complicated.</p>

<p>A compressing JFFS sounds wonderful, although I suspect that even then it
might be useful to have a read-only side that compresses even better.</p>

<p>Compressed ext2 is horrible compression-wise. The metadata takes up a large
amount of space..</p>

</quote>

<p>Theodore Y. Ts'o gave his own take on the problems with read/write
compressed filesystems:</p>

<quote who="Theodore Y. Ts'o">

<p>It's not the metadata that takes a lot of space, it's the fact that it's
using cluster cluster. That is, ext2compr takes a chunk of 8 1k blocks (for
example), compresses it down to 3100 bytes which takes 4 1k blocks to store,
and then stores it in 4 blocks, leaving a "hole" of 4 blocks. Repeat for the
next 8k chunk.</p>

<p>There are two major problems with this approach.</p>

<p>

<ol>

<li>Every 8k (or whatever your compression cluster size is), you end up
resetting the compression algorithm. This makes for very lousy compression
ratios.</li>
  
<li>Because of the compression clusters, it means that you suffer internal
fragmentation and lose an average of 512 bytes (half the 1k block size) for
every 8k of compressed data.</li>

</ol>

</p>

<p>Why is it done this way?  So that random-access reads (and especially
writes) work efficiently. If you are willing to live with two constraints:</p>

<p>

<ol>

<li>Files are written sequentially once, and ever written to again.
(appending is possible, but will be *slow*)</li>
  
<li>You have enough ram that you can afford to keep the entire compressed
file in memory at once --- or be willing to suffer nasty performance
penalties if you do random access seeks into the file.</li>

</ol>

</p>

<p>It's possible to have much better compression ratios, since you don't have
to be the compression clusters game. However, many people would find these
constraints untenable, especially for a general purpose filesystem. Life is
full of tradeoffs.....</p>

</quote>

<p>Linus continued:</p>

<quote who="Linus Torvalds">

<p>Note that cramfs shares the compression algorithm side: everything is
compressed as a 4kB block, because of the random-access issues. Going to
bigger blocks is not _that_ much of a win, and gets painful on a small
machine (and small machines is where this usually matters the most).</p>

<p>However, where cramfs shines is: _no_ fragmentation. Forget about block
device issues, it does data on 4 byte boundaries. That, together with
basically having very minimalistic meta-data (who needs meta-data anyway,
when it's all read-only: _just_ enough to find stuff and no more) is the
biggest win.</p>

<p>But you can't basically do these things if you want to be read-write. A
truly log-based approach (ie not just meta-data journalling) might work out
ok, actually, but most log-based stuff seem to want to have fairly large
caches in order to work well. Which in embedded spaces isn't exactly a good
idea.</p>

</quote>

<p>Theodore continued:</p>

<quote who="Theodore Y. Ts'o">

<p>The large caches are needed because log-based filesystems very quickly tend
to fragment files all over hell-and-gone. But if you're using flash memory
in your embedded device, this is much less of an issue, since you aren't
impacted by the seek times that you have when you have to move heads over
spinning media.</p>

<p>The real trick is being able to allocate on non-block boundaries, and
dealing with fragmentation issues as you delete files and create irregularly
shaped holes. Making a read/write filesystem that is optimized for the
characteristics of flash memory would certainly be "interesting".</p>

<p>One potential problem with log-based schemes is that they tend to rewrite
many more blocks (for example, normally you have to rewrite every single
directory up to the root every time you so much as touch an inode to update
the atime). For flash memory, this is non-optimal since you have a limited
number of write cycles. Although modern flash memories they've extended the
number of write cycles significantly, it's still an issue.</p>

<p>Of course, if you don't care about backing store, and want a pure
memory-based compressed ram disk, life is much easier --- but writing to it
is much less interesting, since it won't survive a reboot.</p>

</quote>

<p>Bjorn replied that JFFS was already a working read/write filesystem
optimized for flash memory - though he acknowledged that the 2.4 port was
"early alpha". The 2.0 version, he said, was the one that really worked.
Theodore was thrilled to hear this, and asked, <quote who="Theodore Y.
Ts'o">So does it do compression as well? If not, please consider adding it,
as the iPAQ folks would love JFFS to pieces if it had that. (They're very
space limited on the amount of flash they have --- which is not surprising
if you're trying to run a complete Linux operating system plus XFree86 on
something that consumers can afford to buy.....)</quote> Bjorn replied,
<quote who="Bjorn Wesen">It does not do compression yet although it would be
simple to add it. Probably we'll add it in the 2.0 branch in parallel with
the 2.4 version getting mature.</quote> And recommended that the iPAQ folks
<quote who="Bjorn Wesen">run cramfs for all the read-only stuff and just use
the current JFFS for configuration/params.</quote></p>

<p align="center">

<b>Embedded Systems</b>

</p>

<p>Theodore agreed, and Jim Gettys also replied:</p>

<quote who="Jim Gettys">

<p>We suspect that a combination of cramfs and jffs will serve handhelds very
well... I can say from first hand experience that compressed ramdisks and/or
cramfs gets very old, very quickly: we really want a writable file system,
with (read) compression.</p>

<p>We have both cramfs and jffs cranking over right now on the iPAQ, but
haven't cut over to using either quite yet (but probably will in the next
few weeks sometime. We initially tried using cramfs, but had a flash driver
bug we didn't know about at the time so did ramdisk to get the iPAQ to
Usenix. But the current state is not ideal.</p>

<p>NOTE: we don't want/need general compression of data being written. Most
data being written (in terms of volume) is likely to be in already
compressed formats (e.g. note taking via audio which will then be compressed
before being written). You don't want to pay the joules or performance to
compress the data twice. Most of the flash is likely to be executables or
shared libraries.</p>

<p>A scheme Keith Packard proposed which would work very well for read only
data is somewhat similar to cramfs: just compress on 4K boundaries and store
an offset table, and mark the file as compressed when you are done; do the
obvious uncompression on the marked files on read. My intuition is that this
might be a performance win even on vanilla machines today.</p>

<p>So we argue that in fact full general compression of files automatically
behind the application's back will in fact be highly counter productive on
handheld devices. The stuff we will write the most of will already be
compressed...</p>

<p>Ted says Keith's scheme can be done with a stacking file system: this would
get it for all file system types, which strikes me as a win. We'll do it
someday, if no one gets to it in the meanwhile (it will probably still be
months before we get to that point).</p>

</quote>

<p>Paul Rusty Russell had a lower opinion of cramfs, and said, <quote who="Paul
Rusty Russell">If you've already got a filesystem, may I recommend you drop
cramfs, and use jffs over readonly compressed loopback? That way you don't
need the cramfs code (or its in-built limitations), and you get much better
compression (in fact, despite its cuteness, I believe cramfs is the wrong
solution for everything). I hacked together a compressed loopback one
afternoon for the <a
href="http://www.linuxcare.com/bootable_cd/index.epl">Linuxcare Bootable
Business Card</a>: I think the source is on the LC site somewhere.</quote></p>

<p>But David pointed out that JFFS wouldn't run on block devices, only memory
devices like flash chips. And Linus remarked, <quote who="Linus Torvalds">At
least cramfs works. I have about ten reports of loopback not working lately,
and I'm likely to disable it completely unless somebody steps in to maintain
the damn thing.</quote></p>

<p align="center">

<b>Loopback</b>

</p>

<p>Steve Dodd felt that loopback should really be maintained alongside the
block device/buffer cache/page cache layers, since it was a fairly special
case, and likely always to be fragile. Jeff Garzik pointed out that Mandrake
relied on loopback for various things, but agreed it should be disabled if
it was broken - as long as it would be fixed eventually. Chris Wedgwood
pointed out, <quote who="Chris Wedgwood">If it is broken -- then it is less
so than the 2.3.99-pre kernels. Back then most certainly I couldn't use it,
these days I use it all the time -- and I've yet to had it fail on a recent
kernel.</quote> And Steve Dodd added, <quote who="Steve Dodd">Before
.99-pre4 (ish), there was a deadlock which kicked in more or less instantly
(related to tq_disk). Disabling plugging on loop cured that, but there are
still ways to make it deadlock pretty quickly. Booting with mem=8m and
running iozone -a on an ext2-backed loop device dies pretty quickly for
me.</quote></p>

<p>Mike Galbraith posted a patch which seemed to fix the deadlock in loopback
he'd been seeing. Linus replied:</p>

<quote who="Linus Torvalds">

<p>This is exactly the kind of patch that the loopback device has always
needed, and is exactly the reason why I would prefer to kill loopback as
soon as possible.</p>

<p>Either loopback is a block device driver, or it isn't. If it is, then it has
absolutely no reason to start messing with fs/buffers.c and add special case
logic for itself. And if it isn't, then the whole point of loopback is gone.</p>

<p>I'm inclined to mark loopback DANGEROUS because there apparently still isn't
a maintainer for it. And the next person who suggests using it instead of a
real filesystem (ramfs, cramfs, JFFS) should be forced to actually make it
work right first!</p>

</quote>

<p>Alan Cox explained, <quote who="Alan Cox">Several folks have tried fixing
it. The idea of replacing it with a raid layer equivalent was also kicked
around at UKUUG and other places. The theory being that loopback is better
done as a block remapping algorithm at the block layer, thus killing the
double caching problem, sorting out the lack of read ahead and more.</quote>
Mike agreed it should be marked dangerous until seriously rewritten, but he
said he didn't know how to fix it himself. Linus took another look at
loopback and noticed what appeared to be a bug. He gave a fix and there was
a bit of technical discussion until Rik van Riel burst through the door,
yelling, <quote who="Rik van Riel">NOOOOOOOOOO!!!!!</quote></p>

<p align="center">

<b>Virtual Memory Redesign</b>

</p>

<p>Rik said the fixes they were discussing had problems, and proposed a more
invasive one of his own. But Linus replied:</p>

<quote who="Linus Torvalds">

<p>Ehh..</p>

<p>We're close to 2.4.x</p>

<p>We need to fix this bug.</p>

<p>We're not adding new untested code. We're fixing bugs.</p>

</quote>

<p>Rik protested:</p>

<quote who="Rik van Riel">

<p>But with this "fix" you'd be adding another one in the process.</p>

<p>Admitted, it's only a performance bug, but I found it to grind the machine
to an absolute halt when doing IO intensive stuff or running large
programs...</p>

<p>Stephen Tweedie, Andrea Arcangeli and me have been looking at this bug and
others and have found there's pretty much NO WAY to fix this without some
bigger changes in the VM code. Performance will suck in the earlier 2.4
kernels, but I hope to have some new VM code ready later on for a more
readable, better maintainable, more stable VM subsystem with somewhat higher
performance.</p>

</quote>

<p>He said he'd write up the new design and post it soon, but Linus replied,
<quote who="Linus Torvalds">Performance bugs are definitely secondary,</quote>
and added:</p>

<quote who="Linus Torvalds">

<p>Quite frankly, nobody has convinced me that
there any way to fix VM balancing issues even _if_ people were to re-write
the VM.</p>

<p>Yes, I've seen a lot of hot air.</p>

<p>The fact is that I suspect that it is fundamentally impossible to balance
the VM so that everybody is always happy. People should realize that making
more changes in the hope of finally reaching some elusive goal is not always
worthwhile.</p>

<p>Strive for a good, stable system that avoids _most_ of the bad performance
under normal load. And be prepared to live with the fact that there will
always be things you can do to make it behave in nasty ways.</p>

<p>Right now I want things to _work_. Big VM changes are for 2.5.x anyway.</p>

<p>(See 2.2.x for how playing with the VM can cause untold stability woes. I
think Alan learned that the hard way).</p>

</quote>

<p>To this last point, Alan put in:</p>

<quote who="Alan Cox">

<p>Yes. Its taken from 2.1.121 or so to 2.2.17pre to get the VM acceptable, and
it'll take another 2 or 3 releases doing only gradual tested changes to
verify the final few bits to get it to be almost as good as 2.0.</p>

<p>For just about every load I've tested 2.0 is the best stable VM we ever had,
late 2.1 was better, 2.2 was bad, 2.4 I can get to the point the box stalls
for 45 seconds - as any user.</p>

<p>Most of the post 2.4 proposals look good, because we know they work well and
the overhead looks like it can be no worse than in 2.4 for the light load
cases. FreeBSD is a very nice test suite for that.</p>

<p>No argument - we cannot do major VM work for 2.4, it'll just have to be
tuned to try and get it as good as 2.2. Post 2.4 I'll take a look at doing a
2.4.x-ac with the newer VM work and whatever else escapes for later folding
in, providing 2.2 is rock solid by then.</p>

</quote>

<p>Rik pointed out that Linus had been CCed on most of the emails discussing
the new VM system, and Linus replied:</p>

<quote who="Linus Torvalds">

<p>I've seen a lot of discussion, yes.</p>

<p>I haven't seen any really convining arguments that any of the rewrites would
really make things all that better.</p>

<p>Yes, they'll probably fix the thing that you try to fix. And they'll
introduce new cases where _they_ work badly, and the old code happened to
work fine.</p>

<p>For example, the "dd if=/dev/zero of=file" thing can be made to be very nice
on interactive behaviour, and you can obviously design a VM subsystem that
does that on purpose. Fine. I bet you that such a VM subsystem has serious
problems with some other workloads..</p>

<p>Or the old idea to start writebacks early in order to try to minimize having
dirty pages in memory that are hard to get rid of. It's wonderful. For
certain loads. And it really sucks on others that have big temp-files that
will get deleted (like bench).</p>

<p>The thing that is dangerous about designing a new VM is that you can design
it so that it avoids the current pitfalls. But you won't even be aware of
the things that the current thing does well, and you may not design it to do
as well on those.</p>

<p>And in the end, reality always tends to hit theory hard in the face when you
least expect it. That's why I'm not holding my breath for some magical VM
rewrite that will fix all performance problems. No matter _how_ much people
talk about it..</p>

</quote>

<p>Alan pointed out that for this very reason, <quote who="Alan Cox">the fact
these folks are understanding why the FreeBSD VM works (something not all
the freebsd folks seemed to know) and are working from a known good VM
implementation is promising. 2.5 will tell.</quote> Rik explained:</p>

<quote who="Rik van Riel">

<p>Yup, we know why FreeBSD VM works and what its weak points could be. We've
also had some help from SGI and Sequent/IBM people as to what the
scalability problems of our new VM design could be.</p>

<p>The new VM will be heavily based on FreeBSD VM, which we know works, with
some small tweaks where we've tried to come up with scenarios where they'd
break (and we failed, so we'll try those tweaks).</p>

</quote>

<p>Linus replied, <quote who="Linus Torvalds">The new VM _will_ be explained to
me before anything else.</quote> To which Rik agreed, and reiterated that he
planned to post the description later that day. Linus also compared the
situation to ext3 of 4 years before, in which similar "hot air" was
released without patches. Alan pointed out that ext3 had been out and
working for awhile already, and Linus replied:</p>

<quote who="Linus Torvalds">

<p>yes, within the last four months or so ext3 has
actually become reality.</p>

<p>In large part, I suspect, because it became so painfully obvious that
ReiserFS was getting quite a lot of attention.</p>

<p>THAT is what I'm complaining about. Not the last couple of months. But the
years that preceded it.</p>

<p>I hope the MM thing doesn't turn into that. We need incremental
improvements, not grand schemes that get talked about.</p>

</quote>

<p>Alan replied that it had been a good bit longer than 4 months (10 seemed
more accurate), and had the last word:</p>

<quote who="Alan Cox">

<p>You've been reading too many conspiracy theories. Ext3 and Reiserfs are not
competitors. Ext3 is a tool to journal ext2fs. Its still slow on huge
directories and its still got every other ext2 feature good and bad</p>

<p>Reiserfs has fast handling of large file trees, efficient packing of small
files and a whole pile of stuff which puts it and XFS as the competitors.</p>

<p>Really its</p>

<p>

<ul>

<li>ext3fs  -       migration path, highly stable, no other feature gain</li>

</ul>

</p>

<p>versus</p>

<p>

<ul>

<li>reiserfs -      packing, name spaces, btrees lots of new goodies</li>
<li>xfs -           lots of scalable new (to Linux) goodies</li>
<li>ibm jfs -       to be seen - but btrees and all the rest</li>

</ul>

</p>

<p>and more researchy stuff like the tree-phase ext2 which offers a whole pile
of interesting future paths that journalling doesn't handle well</p>

</quote>

<p>Thus endeth the thread.</p>

</section>

<section
  title="DTR/DSR Handshaking Deferred To 2.5; Linus Firm On Code Freeze"
  subject="[patch] DTR/DSR hardware handshake support for 2.0/2.2/2.4"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00080.html"
  posts="23"
  startdate="01 Aug 2000 00:00:00 -0800"
  enddate="10 Aug 2000 00:00:00 -0800"
>
<topic>Code Freeze</topic>

<p>

Martin Schenk posted small patches against 2.0, 2.2 and 2.4 and explained,
<quote who="Martin Schenk">As I needed support for <a
href="http://metalab.unc.edu/pub/Linux/docs/HOWTO/other-formats/html_single/Text-Terminal-HOWTO.html#s10">DTR/DSR</a>
hardware handshake to communicate with a serial printer (and the "wire
RTS/CTS to DTR/DSR" tip from the serial-HOWTO works fine for a demo, but is
not applicable for a few thousand POS terminals *gg*), I implemented this
functionality.</quote> Theodore Y. Ts'o was nonplussed, and replied:

</p>

<quote who="Theodore Y. Ts'o">

<p>

Yilch.   This is specialized enough that I'd much rather this *not* go into
the kernel. Next thing we know, someone will want a DTR/CD handshaking
mechanism, etc.,etc.

</p><p>

This should probably be a private kernel patch, or (much more strongly
suggested) that you just get specially wired RS-232 cables. This is in fact
a very standard thing to do, and you can order cables from Black Box or some
other company specializing selling cables which connect crufty legacy
hardware.

</p>

</quote>

<p>

But H. Peter Anvin objected, <quote who="H. Peter Anvin">DTR/DSR is the most
common handshake mechanism for RS-232-B (as opposed to RS-232-C and
RS-232-D) devices. This feature has been requested on and off for the last
five years. I think it's worthwhile.</quote> Martin added:

</p>

<quote who="Martin Schenk">

<p>

A lot of supermarket cash registers are in fact PCs with special hardware,
which for some unclear reason typically knows only about DTR/DSR handshaking
(even if it is from other vendors: SNI or EPSON does not make a difference:
only DTR/DSR).

</p><p>

Sending people to a thousand stores around the country putting special
cables between printers and computers is simply not acceptable (if you know
POS service people, you know that about half of the cables would be put on
the wrong serial ports).

</p>

</quote>

<p>

Frank da Cruz also got requests for DTR/DSR handshaking in kermit, and
recommended putting the patch in. Theodore was not dead-set against the
patch, though he still said, <quote who="Theodore Y. Ts'o">I really do
wonder how many people *really* need this. Maybe I'll set up a survey on
serial.sourceforge.net to determine whether or not there's enough people to
want this kind of feature bloat..... in any case, I'm not terribly inclined
to consider this before 2.4, especially since Linus finally seems to be
serious about the code freeze.</quote> Later he confirmed that he'd only
consider adding the patch after 2.5 started up.

</p>

</section>

<section
  title="Status Of Dual Athlon Support"
  subject="Dual athlon support?"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00216.html"
  posts="44"
  startdate="02 Aug 2000 00:00:00 -0800"
  enddate="08 Aug 2000 00:00:00 -0800"
>
<topic>SMP</topic>

<mention>Tom Leete</mention>
<mention>Dan Hollis</mention>

<p>

Pavel Machek asked about the status of dual Athlon support. Since AMD sold
boxes with AMD-760 and SMP Athlons, he was keen to find out if anyone was
working on this, and how far they'd gotten. Stephen Frost remarked, <quote
who="Stephen Frost">Athlons aren't really SMP but operate more like Alphas
with a P-T-P architecture, from my understanding.</quote> He asked for a URL
since he hadn't seen any Athlon motherboards for sale anywhere, but there
was no reply. Tom Leete also replied to Pavel, saying he'd previously posted
a patch to compile Athlon SMP, though he'd only tested it on UP systems.

</p><p>

Alan Cox had also not seen any dual Athlon boards, and didn't know the
status of any work on them, but he did say, <quote who="Alan Cox">I've
always understood from AMD that because of the way the apic appears that SMP
will just work although the hardware behind the apparent APIC is
unrelated.</quote> Dan Hollis mentioned here that AFAHK dual Athlons were
not scheduled for releast until the 4th quarter of 2000 at the earliest.

</p><p>

At one point in the course of discussion, Pavel added, <quote who="Pavel
Machek">AMD now considers us pretty important, which seems like good news
for linux community.</quote>

</p>

</section>

<section
  title="VM Design Dispute"
  subject="RFC: design for new VM"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00350.html"
  posts="52"
  startdate="02 Aug 2000 00:00:00 -0800"
  enddate="13 Aug 2000 00:00:00 -0800"
>
<topic>BSD: FreeBSD</topic>
<topic>Clustering</topic>
<topic>Ottawa Linux Symposium</topic>
<topic>Virtual Memory</topic>

<mention>Andrea Arcangeli</mention>
<mention>Ben LaHaise</mention>
<mention>Chris Wedgwood</mention>
<mention>Stephen Tweedie</mention>

<p>

Rik van Riel proposed (quoted in full):

</p>

<quote who="Rik van Riel">

<p>

here is a (rough) draft of the design for the new VM, as discussed at UKUUG
and OLS. The design is heavily based on the FreeBSD VM subsystem - a proven
design - with some tweaks where we think things can be improved. Some of the
ideas in this design are not fully developed, but none of those "new" ideas
are essential to the basic design.

</p><p>

The design is based around the following ideas:

</p><p>

<ol>

<li>center-balanced page aging, using

    <ol>
    <li>multiple lists to balance the aging</li>
    <li>a dynamic inactive target to adjust
      the balance to memory pressure</li>
    </ol>

</li>
<li>physical page based aging, to avoid the "artifacts"
  of virtual page scanning</li>
<li>separated page aging and dirty page flushing

    <ol>
    <li>kupdate flushing "old" data</li>
    <li>kflushd syncing out dirty inactive pages</li>
    <li>as long as there are enough (dirty) inactive pages,
      never mess up aging by searching for clean active
      pages ... even if we have to wait for disk IO to
      finish</li>
    </ol>

</li>
<li>very light background aging under all circumstances, to
  avoid half-hour old referenced bits hanging around</li>

</ol>

</p><p align="center">

                Center-balanced page aging:

</p><p>

<ol>

<li>goals
    <ol>
    <li>always know which pages to replace next</li>
    <li>don't spend too much overhead aging pages</li>
    <li>do the right thing when the working set is
      big but swapping is very very light (or none)</li>
    <li>always keep the working set in memory in
      favour of use-once cache</li>
    </ol>
</li>
<li>page aging almost like in 2.0, only on a physical page basis
    <ol>
    <li>page-&gt;age starts at PAGE_AGE_START for new pages</li>
    <li>if (referenced(page)) page-&gt;age += PAGE_AGE_ADV;</li>
    <li>else page-&gt;age is made smaller (linear or exponential?)</li>
    <li>if page-&gt;age == 0, move the page to the inactive list</li>
    <li>NEW IDEA: age pages with a lower page age</li>
    </ol>
</li>
<li>data structures (page lists)
    <ol>
    <li>active list
        <ol>
        <li>per node/pgdat</li>
        <li>contains pages with page-&gt;age &gt; 0</li>
        <li>pages may be mapped into processes</li>
        <li>scanned and aged whenever we are short
          on free + inactive pages</li>
        <li>maybe multiple lists for different ages,
          to be better resistant against streaming IO
          (and for lower overhead)</li>
        </ol>
    </li>
    <li>inactive_dirty list
        <ol>
        <li>per zone</li>
        <li>contains dirty, old pages (page-&gt;age == 0)</li>
        <li>pages are not mapped in any process</li>
        </ol>
    </li>
    <li>inactive_clean list
        <ol>
        <li>per zone</li>
        <li>contains clean, old pages</li>
        <li>can be reused by __alloc_pages, like free pages</li>
        <li>pages are not mapped in any process</li>
        </ol>
    </li>
    <li>free list
        <ol>
        <li>per zone</li>
        <li>contains pages with no useful data</li>
        <li>we want to keep a few (dozen) of these around for
          recursive allocations</li>
        </ol>
    </li>
    </ol>

</li>
<li>other data structures
    <ol>
    <li>int memory_pressure
        <ol>
        <li>on page allocation or reclaim, memory_pressure++</li>
        <li>on page freeing, memory_pressure--  (keep it &gt;= 0, though)</li>
        <li>decayed on a regular basis (eg. every second x -= x&gt;&gt;6)</li>
        <li>used to determine inactive_target</li>
        </ol>
    </li>
    <li>inactive_target == one (two?) second(s) worth of memory_pressure,
      which is the amount of page reclaims we'll do in one second
        <ol>
        <li>free + inactive_clean &gt;= zone-&gt;pages_high</li>
        <li>free + inactive_clean + inactive_dirty &gt;= zone-&gt;pages_high + one_second_of_memory_pressure * (zone_size / memory_size)</li>
        </ol>
    </li>
    <li>inactive_target will be limited to some sane maximum
      (like, num_physpages / 4)</li>
    </ol>
</li>

</ol>

</p><p>

The idea is that when we have enough old (inactive + free) pages, we will
NEVER move pages from the active list to the inactive lists. We do that
because we'd rather wait for some IO completion than evict the wrong page.

</p><p>

Kflushd / bdflush will have the honourable task of syncing the pages in the
inactive_dirty list to disk before they become an issue. We'll run
balance_dirty over the set of free + inactive_clean + inactive_dirty AND
we'll try to keep free+inactive_clean &gt; pages_high .. failing either of
these conditions will cause bdflush to kick into action and sync some pages
to disk.

</p><p>

If memory_pressure is high and we're doing a lot of dirty disk writes, the
bdflush percentage will kick in and we'll be doing extra-agressive cleaning.
In that case bdflush will automatically become more agressive the more page
replacement is going on, which is a good thing.

</p><p align="center">

                Physical page based page aging

</p><p>

In the new VM we'll need to do physical page based page aging for a number
of reasons. Ben LaHaise said he already has code to do this and it's "dead
easy", so I take it this part of the code won't be much of a problem.

</p><p>

The reasons we need to do aging on a physical page are:

</p><p>

<ol>
<li>avoid the virtual address based aging "artifacts"</li>
<li>more efficient, since we'll only scan what we need
    to scan  (especially when we'll test the idea of
    aging pages with a low age more often than pages
    we know to be in the working set)</li>
<li>more direct feedback loop, so less chance of
    screwing up the page aging balance</li>
</ol>

</p><p align="center">

                IO clustering

</p><p>

IO clustering is not done by the VM code, but nicely abstracted away into a
page-&gt;mapping-&gt;flush(page) callback. This means that:

</p><p>

<ol>
<li>each filesystem (and swap) can implement their own, isolated
  IO clustering scheme</li>
<li>(in 2.5) we'll no longer have the buffer head list, but a list
  of pages to be written back to disk, this means doing stuff like
  delayed allocation (allocate on flush) or kiobuf based extents
  is fairly trivial to do</li>
</ol>

</p><p align="center">

                Misc

</p><p>

Page aging and flushing are completely separated in this scheme. We'll never
end up aging and freeing a "wrong" clean page because we're waiting for IO
completion of old and to-be-freed pages.

</p><p>

Write throttling comes quite naturally in this scheme. If we have too many
dirty inactive pages we'll write throttle. We don't have to take dirty
active pages into account since those are no candidate for freeing anyway.
Under light write loads we will never write throttle (good) and under heavy
write loads the inactive_target will be bigger and write throttling is more
likely to kick in.

</p><p>

Some background page aging will always be done by the system. We need to do
this to clear away referenced bits every once in a while. If we don't do
this we can end up in the situation where, once memory pressure kicks in,
pages which haven't been referenced in half an hour still have their
referenced bit set and we have no way of distinguishing between newly
referenced pages and ancient pages we really want to free. (I believe this
is one of the causes of the "freeze" we can sometimes see in current
kernels)

</p><p>

Over the next weeks (months?) I'll be working on implementing the new VM
subsystem for Linux, together with various other people (Andrea Arcangeli??,
Ben LaHaise, Juan Quintela, Stephen Tweedie). I hope to have it ready in
time for 2.5.0, but if the code turns out to be significantly more stable
under load than the current 2.4 code I won't hesitate to submit it for
2.4.bignum...

</p>

</quote>

<p>

There was some documentation discussion: since Rik had based his proposal on
the FreeBSD design, Chris Wedgwood asked if the differences between the two
could be clarified, so that performance differences etc., could be
identified with different parts of the design when appropriate. Rik agreed
that the differences should be clearly indicated, and went on to bemoan,
<quote who="Rik van Riel">The amount of documentation (books? nah..) on VM
is so sparse that it would be good to have both systems properly documented.
That would fill a void in CS theory and documentation that was painfully
there while I was trying to find useful information to help with the design
of the new Linux VM...</quote> Matthew Dillon had also found it difficult to
find anything that didn't focus on only single aspects of VM design.

</p><p>

Linus Torvalds came down pretty hard on Rik's design, saying that using a
multi-list approach would be more difficult, and wouldn't help balancing. He
acknowledged that it would help avoid the overhead of walking extra pages,
but this seemed beside the point. He felt Rik's attitude that the current VM
was irreparably broken, didn't jibe with the fact that Linus felt the old
and new designs were functionally equivalent. He also accused Rik of
"selling" his design, rather than putting it forward on technical merits. He
asked Rik to explain why the new design was so much better than the old. He
summarized at length:

</p>

<quote who="Linus Torvalds">

<p>

The reason I'm unconvinced about multiple lists is basically:

</p><p>

<ul>

<li>

<p>they are inflexible. Each list has a meaning, and a page cannot easily
   be on more than one list. It's really hard to implement overlapping
   meanings: you get exponential expanision of combinations, and everybody
   has to be aware of them.</p>

<p>   For example, imagine that the definition of "dirty" might be different
   for different filesystems. Imagine that you have a filesystem with its
   own specific "walk the pages to flush out stuff", with special logic that
   is unique to that filesystem ("you cannot write out this page until
   you've done 'Y' or whatever). This is hard to do with your approach. It
   is trivial to do with the single-list approach above.</p>

<p>   More realistic (?) example: starting write-back of pages is very
   different from waiting on locked pages. We may want to have a "dirty but
   not yet started" list, and a "write-out started but not completed" locked
   list. Right now we use the same "clock" for them (the head of the LRU
   queue with some ugly heuristic to decide whether we want to wait on
   anything).</p>

<p>   But we potentially really want to have separate logic for this: we want
   to have a background "start writeout" that goes on all the time, and then
   we want to have a separate "start waiting" clock that uses different
   principles on which point in the list to _wait_ on stuff.</p>

<p>   This is what we used to have in the old buffer.c code (the 2.0 code that
   Alan likes). And it was _horrible_ to have separate lists, because in
   fact pages can be both dirty and locked and they really should have been
   on both lists etc..</p>

</li>

<li>

<p>in contrast, scan-points (withour LRU, but instead working on the basis
   of the age of the page - which is logically equivalent) offer the
   potential for specialized scanners. You could have "statistics gathering
   robots" that you add dynamically. Or you could have per-device flush
   deamons.</p>

<p>   For example, imagine a common problem with floppies: we have a timeout
   for the floppy motor because it's costly to start them up again. And they
   are removable. A perfect floppy driver would notice when it is idle, and
   instead of turning off the motor it might decide to scan for dirty pages
   for the floppy on the (correct) assumption that it would be nice to have
   them all written back instead of turning off the motor and making the
   floppy look idle.</p>

<p>   With a per-device "dirty list" (which you can test out with a page
   scanner implementation to see if it ends up reall yimproving floppy
   behaviour) you could essentially have a guarantee: whenever the floppy
   motor is turned off, the filesystem on that floppy is synced. Test
   implementation: floppy deamon that walks the list and turns off the
   engine only after having walked it without having seen any dirty blocks.</p>

<p>   In the end, maybe you realize that you _really_ don't want a dirty list
   at all. You want _multiple_ dirty lists, one per device.</p>

<p>   And that's really my point. I think you're too eager to rewrite things,
   and not interested enough in verifying that it's the right thing. Which I
   think you can do with the current one-list thing easily enough.</p>

</li>

<li>

<p>In the end, even if you don't need the extra flexibility of multiple
   clocks, splitting them up into separate lists doesn't change behaviour,
   it's "only" a CPU time optimization.</p>

<p>   Which may well be worth it, don't get me wrong. But I don't see why you
   tout this as being something radically needed in order to get better VM
   behaviour. Sure, multiple lists avoids the unnecessary walking over pages
   that we don't care about for some particular clock. And they may well end
   up being worth it for that reason. But it's not a very good way of doing
   prototyping of the actual _behaviour_ of the lists.</p>

</li>

</ul>

</p><p>

To make a long story short, I'd rather see a proof-of-concept thing. And I
distrust your notion that "we can't do it with the current setup, we'll have
to implement something radically different".

</p><p>

Bascially, IF you think that your newly designed VM should work, then you
should be able to prototype and prove it easily enough with the current one.

</p><p>

I'm personally of the opinion that people see that page aging etc is hard,
so they try to explain the current failures by claiming that it needs a
completely different approach. And in the end, I don't see what's so
radically different about it - it's just a re-organization. And as far as I
can see it is pretty much logically equivalent to just minor tweaks of the
current one.

</p><p>

(The _big_ change is actually the addition of a proper "age" field. THAT is
conceptually a very different approach to the matter. I agree 100% with
that, and the reason I don't get all that excited about it is just that we
_have_ done page aging before, and we dropped it for probably bad reasons,
and adding it back should not be that big of a deal. Probably less than 50
lines of diff).

</p>

</quote>

<p>

Rik countered that basing the lists on the page age made a big difference;
there was more to be gained from multiple lists, he said, than just to save
time walking pages. He explained:

</p>

<quote who="Rik van Riel">

<p>

We need different queues so waiting for pages to be flushed to disk doesn't
screw up page aging of the other pages (the ones we absolutely do not want
to evict from memory yet).

</p><p>

That the inactive list is split into two lists has nothing to do with page
aging or balancing. We just do that to make it easier to kick bdflush and to
have the information available we need for eg. write throttling.

</p>

</quote>

<p>

He added that the current scheme didn't have enough information available to
do proper balancing, but that having multiple lists would automatically
provide all needed information. This, he went on, was the difference between
his scheme and Linus' counter-proposal of 'scan points'. He added:

</p>

<quote who="Rik van Riel">

<p>

If there was any hope that the current VM would be a good enough basis to
work from I would have done that. In fact, I tried this for the last 6
months and horribly failed.

</p><p>

Other people have also tried (and failed). I'd be surprised if you could do
better, but it sure would be a pleasant surprise...

</p>

</quote>

<p>

Finally, he concluded:

</p>

<quote who="Rik van Riel">

<p>While page aging is a fairly major part, it is
certainly NOT the big issue here...</p>

<p>

The big issues are:

</p><p>

<ul>

<li>separate page aging and page flushing, so lingering dirty
  pages don't fuck up page aging</li>
<li>organise the VM in such a way that we actually have the
  information available we need for balancing the different
  VM activities</li>
<li>abstract away dirty page flushing in such a way that we
  give filesystems (and swap) the opportunity for their own
  optimisations</li>

</ul>

</p>

</quote>

<p>

Linus exhorted Rik to go back and reread Linus' previous email, and

</p>

<quote who="Linus Torvalds">

<p>Realize that your "multiple queues" is nothing
more than "cached information". They do not change _behaviour_ at all. They
only change the amount of CPU-time you need to parse it.

</p><p>

Your arguments do not seem to address this issue at all.

</p><p>

In my mailbox I have an email from you as of yesterday (or the day before)
which says:

</p><p>

<blockquote>

I will not try to balance the current MM because it is not doable

</blockquote>

</p>

<p>And I don't see that your suggestion is fundamentally adding anything but a
CPU timesaver.

</p><p>

Basically, answer me this _simple_ question: what _behavioural_ differences
do you claim multiple queues have? Ignore CPU usage for now.

</p><p>

I'm claiming they are just a cache.

</p><p>

And you claim that the current MM cannot be balanced, but your new one can.

</p><p>

Please reconcile these two things for me.

</p>

</quote>

<p>

Rik agreed that his multiple lists were functionally the same as a single
list, with, he added, <quote who="Rik van Riel">statistics about how many
pages of age 0 there are.</quote> He agreed there were other ways to do what
he'd proposed, such as having a single list and keeping multiple counters
for the stats he felt would enable proper balancing. But he added, <quote
who="Rik van Riel">What I fail to see is why this would be preferable to a
code base where all the different pages are neatly separated and we don't
have N+1 functions that are all scanning the same list, special-casing out
each other's pages and searching the list for their own special
pages...</quote> Linus replied:

</p>

<quote who="Linus Torvalds">

<p>

I disagree just with the "all improved, radically new, 50% more for the same
price" ad-campaign I've seen.

</p><p>

I don't like the fact that you said that you don't want to worry about 2.4.x
because you don't think it can be fixed it as it stands. I think that's a
cop-out and dishonest. I think I've explained why.

</p><p>

I could fully imagine doing even multi-lists in 2.4.x. I think performance
bugs are secondary to stability bugs, but hey, if the patch is clean and
straightforward and fixes a performance bug, I would not hesitate to apply
it. It may be that going to multi-lists actually is easier just because of
some thins being more explicit. Fine.

</p><p>

But stop the ad-campaign. We get too many biased ads for presidents-to-be
already, no need to take that approach to technical issues. We need to fix
the VM balancing, we don't need to sell it to people with
buzz-words.

</p>

</quote>

</section>

<section
  title="Latest Lowlatency Patch For 2.4"
  subject="[patch] lowlatency patch for 2.4, lowlatency-2.4.0-test6-B5"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00444.html"
  posts="24"
  startdate="03 Aug 2000 00:00:00 -0800"
  enddate="14 Aug 2000 00:00:00 -0800"
>
<topic>Assembly</topic>
<topic>Virtual Memory</topic>

<mention>Jamie Lokier</mention>

<p>

Ingo Molnar announced:

</p>

<quote who="Ingo Molnar">

<p>

i've ported my 2.2 lowlatency patch to 2.4.0-test6-pre1. The vanilla 2.4
kernel fixed some latencies present in 2.2.16, but it also introduced a few
new ones - and it keeps the fundamental latency sources largely unchanged,
so the size and scope of the lowlatency patch has not changed much:

</p><p>

<a
href="http://www.redhat.com/~mingo/lowlatency-patches/lowlatency-2.4.0-test6-B5">http://www.redhat.com/~mingo/lowlatency-patches/lowlatency-2.4.0-test6-B5</a>

</p><p>

this patch is *not* yet intended to be merged into the mainstream kernel.
I'd first like to see what kind of latencies and behavior people see, then
i'll split the patch up into an 'uncontroversial' and 'controversial' part.

</p><p>

especially due to the VM changes i'd like people to try this, as in my
experience it makes the system much 'smoother' during heavy VM load. The
stock VM creates latencies up to 200 msec (!) on a 256MB box, and 200 msec
can be easily noticed by humans as well.

</p><p>

the patch is a 'take no prisoners' solution, ie. i fixed all latency sources
i could identify, no matter what the fix does to code 'beauty'. I strongly
disagree with the "it's ok in 99.9% of the cases" approach, because in fact
it's very easy to trigger bad latencies under various (common) workloads.
And i just do not want Linux to become another Windows: "well you can play
music just fine, as long as you dont do this and dont do that, and for God's
sake, put enough RAM into your system.".

</p><p>

With this patch applied i was unable to trigger larger than 0.5 msec
latencies even under extreme VM load in 100.0% of the cases - with the
typical latencies in an unloaded system being around 0.1 msec. The patch
fixes some 'scalability latency sources', ie. extreme latencies which show
up only if a process has many open files, has lots of VM allocated.

</p><p>

95% of the conditional schedule points the patch adds fix some real latency
that caused bigger than 1msec latencies under realistic (and common)
workloads.

</p><p>

Main changes:

</p><p>

<ul>

<li>

<p>moved conditional_schedule() into assembly, to reduce the impact of
conditional_schedule(). The 'slow path' is moved into a separate code
section, so the normal codepath is not impacted. A conditional schedule is
now typically just 3-4 x86 instructions and no (inline) branch, and if
'current' is used in the code already then it's just 2 instructions. (see
asm-i386/condsched.h for more.)

</p><p>

my kernel has 332 conditional schedule points in its binary image. A
condsched slow-path is 22 bytes, so the offline section is ~7k (kernel RAM)
- sounds acceptable.</p>

</li>

<li>identified and fixed a couple of new latency sources.</li>

<li>the VM's pressure handling code is much less atomic now, and in many
cases does work in many smaller steps instead of one large step.</li>

</ul>

</p><p>

reports, comments, suggestions welcome!

</p>

</quote>

<p>

Later, he added:

</p>

<quote who="Ingo Molnar">

<p>

the newest version of the lowlatency patch can be downloaded from:

</p><p>

<a
href="http://www.redhat.com/~mingo/lowlatency-patches/lowlatency-2.4.0-test6-C4">http://www.redhat.com/~mingo/lowlatency-patches/lowlatency-2.4.0-test6-C4</a>

</p><p>

Changes:

</p><p>

<ul>

<li>fixes /dev/urandom (reported by Andreas Jellinghaus)</li>
 
<li>adds back the entry.S critical-section-event fixes (Jamie Lokier)</li>

<li>fixes latencies triggered by Quintelas' mmap002 (me)</li>

<li>fixes latencies triggered by Andreas Jellinghaus's utility (me)</li>
 
<li>fixes some other latencies as well (me)</li>
 
</ul>

</p><p>

enjoy - reports, comments, suggestions welcome.

</p>

</quote>

<p>

Andrew Morton replied:

</p>

<quote who="Andrew Morton">

<p>

Comments and testing results:

</p><p>

<ul>
 
<li>Measured a 2 millisec hit in lat_tcp and 4 millisec in bw_tcp. This was
  in tasklet context :(</li>
 
<li>The 5 millisec pc_keyb X startup thing is still there.  My vote is to
  leave it alone.</li>
 
<li>You need to add a reschedule in ipc/shm.c:shm_free().  Quintela's
  shm-stress will show why.</li>
 
<li>

<p>mmap002 is killed by the VM when running `amlat' at 1024 Hz to create
  scheduling pressure.

</p><p>

  This is changed behaviour: it does not happen without this patch.</p>

</li>
 
<li>

<p>In swap_out_mm() it is not correct to restart the vma scan after
  scheduling.  If the current vma will take over a millisec to scan,
  and there is a process being scheduled once per millisec, swap_out_mm
  will never terminate.  This is fairly easy to demonstrate with
  mmap002.

</p><p>

  The fix is to resume the scan from `vma-&gt;vm_start'.</p>

</li>
 
<li>mmap002 sometimes hangs when run under scheduling pressure. Probably
  due to the above problem.</li>
 
<li>Running Quintela's ipc001 and then running mmap002 causes 8 millisec
  scheduling holes.</li>
 
<li>`lilo' hangs when run under scheduling pressure if there are a lot of
  dirty blocks on the device. fsync_dev() changes.</li>
 
<li>If you run bonnie++ to create lots of slab cache entries and then run
  mmap002, the resulting call to kmem_cache_reap() causes basically
  unbounded scheduling delays. I observed ~6 millisecs. I didn't fix this
  either.</li>

<li>The patch significantly worsens bonnie++ figures.  The overall
  execution time went from 9:04 to 9:23 when run during 1024 Hz scheduling
  pressure. From 8:44 to 8:52 when run without scheduling pressure.</li>

</ul>

</p>

</quote>

</section>

<section
  title="Per-User Resources In 2.4 And 2.5"
  subject="can't mlockall() more than 128MB, is this a kernel limitiation ?"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg00800.html"
  posts="16"
  startdate="05 Aug 2000 00:00:00 -0800"
  enddate="08 Aug 2000 00:00:00 -0800"
>

<p>

In the course of discussion, Alan Cox remarked, <quote who="Alan Cox">Right
now Linux isnt tracking per user resources. You need the beancounter addons
to implement per user memory.</quote> Andi Kleen modified, <quote who="Andi
Kleen">Actually test6-pre* seems to, at least for files and processes. See
linux/kernel/user.c</quote> and Robert H. de Vries also replied to Alan,
<quote who="Robert H. de Vries">I think Linus has just (test6-pre series)
put this facility in the kernel. See the new kernel/user.c</quote>

</p><p>

Alan replied, <quote who="Alan Cox">Yep - sort of a toy edition of
beancounter. Thats not knocking it - the full beancounter isnt 2.4
material..</quote> And Linus Torvalds also said to Robert:

</p>

<quote who="Linus Torvalds">

<p>

Yes and no.

</p><p>

The "new" user.c is not actually new at all.  It's the same old "struct
user_struct" that we've had for a long time, and that tracks the number of
processes a specific user has. You'll find the same "struct user_struct" in
linux-2.2 too - this is much older than the 2.3.x development tree.

</p><p>

The new thing is that it's just separated out - it used to be in
kernel/fork.c, and nothing else really knew about it. But it is basically
the same old code in a new location: kernel/user.c.

</p><p>

The only _new_ thing in the code is due to "future expansion" changes: the
"struct user_struct" thing has always had a reference counter, and that
reference counter was also used as the "nr of processes using this" counter:
they were one and the same. For future expansion, I split up the reference
counter and the process counter into two: they should currently always be
the same, but they won't be forever.

</p><p>

The reason? We can expand it to count more than just processes. And when we
do that, we'll need to have the reference counter be independent of the
things we count.

</p><p>

But no, it's not really new code, just a re-organization of something we've
had for a long time (along with bug-fixes: the stuff in kernel/sys.c are
real fixes for cases that could have caused us to ignore the process counts
completely under low memory circumstances. The new code will correctly
handle the case of not having enough memory to create a new virtual
user).

</p>

</quote>

</section>

<section
  title="RAID Docs Out Of Date"
  subject="RAID questions"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg01083.html"
  posts="5"
  startdate="07 Aug 2000 00:00:00 -0800"
  enddate="08 Aug 2000 00:00:00 -0800"
>
<topic>Disk Arrays: RAID</topic>

<p>

Adam McKenna complained that documentation for software RAID was way out of
date and gave misleading and inaccurate information. Considering that he was
working on a stable kernel, this was very surprising to him, and he asked
for info on how to get software RAID working on the latest stable kernels.
Andrew Pochinsky gave a link to <a
href="http://people.redhat.com/mingo/raid-patches/raid-2.2.16-A0">http://people.redhat.com/mingo/raid-patches/raid-2.2.16-A0</a>,
which he said worked fine on a stock 2.2.16 tree, with the Red Hat 6.2 RAID
tools. Gregory Leblanc also replied to Adam, saying he'd started a FAQ that
went monthly to the Linux-RAID mailing list.

</p>

</section>

<section
  title="New Tool For Kernel Configuration"
  subject="A new config program -- anyone interested?"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg01126.html"
  posts="6"
  startdate="07 Aug 2000 00:00:00 -0800"
  enddate="08 Aug 2000 00:00:00 -0800"
>

<p>

Paul Vojta was fed up with the standard kernel configuration tools, since
'make config', 'make menuconfig', and 'make xconfig' were interactive by
nature. All he wanted was to be able to transfer the configuration of one
kernel to another, so he wrote the 'qconfig' program. Several people pointed
out that 'make oldconfig' would have given him the noninteractive compile he
was after, and he later agreed that if he'd remembered 'make oldconfig', he
probably wouldn't have bothered with 'qconfig'. But he and Michael Elizabeth
Chastain also pointed out that 'qconfig' did have some significant
improvements over 'make oldconfig'. As Michael put it, <quote who="Michael
Elizabeth Chastain">With qconfig, if a variable is not in qconfig.in, and
Linus changes the default value for that variable in arch/$(ARCH)/defconfig,
qconfig will incorporate that change. oldconfig won't.</quote> And Paul
added, <quote who="Paul Vojta">If you diff the qconfig.out files, you find
out what questions have disappeared, and what questions have additional
options or changed status (e.g., no longer experimental, different text,
etc.). You're better able to track how your setup differs from the
default.</quote>

</p><p>

Michael added that it should be possible to implement 'qconfig' with a lot
less code, and posted a brief Makefile recipe:

</p>

<p>

<blockquote>

        # Makefile rules<br />
 
        qconfig:<br />

<blockquote>
                cat arch/$(ARCH)/defconfig qconfig.in > .config<br />
                $(MAKE) oldconfig       # or copy the oldconfig rules here<br />
                diff arch/$(ARCH)/defconfig .config | awk {blah, blah, blah} ... &gt; qconfig.out<br />
</blockquote>

</blockquote>

</p>

</section>

<section
  title="&quot;Heap Of Bugs&quot; Found In 2.4 Drivers"
  subject="[PATCH] checking kmalloc, init_etherdev and other fixes"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_01/msg01155.html"
  posts="8"
  startdate="07 Aug 2000 00:00:00 -0800"
  enddate="08 Aug 2000 00:00:00 -0800"
>

<mention>David S. Miller</mention>
<mention>Linus Torvalds</mention>

<p>

Arnaldo Carvalho de Melo posted a patch and explained, <quote who="Arnaldo
Carvalho de Melo">This patch mostly includes checks for kmalloc and
init_etherdev in the net drivers, but also fixes some bugs on some drivers,
please take a look and consider aplying.</quote> David S. Miller pointed out
some problems with the patch, and Arnaldo posted a corrected patch. Jeff
Garzik replied, <quote who="Jeff Garzik">You have definitely found a heap of
bugs. The patch does need a little work though.</quote> He went on to
describe various problems with the patch, and apparently Linus Torvalds also
worked on it a bit with Arnaldo in private.

</p>

</section>

<section
  title="SGI Starts &quot;Linux Test Project&quot; Testing Suite"
  subject="[Announce] Linux Test Project"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_02/msg00115.html"
  posts="22"
  startdate="08 Aug 2000 00:00:00 -0800"
  enddate="12 Aug 2000 00:00:00 -0800"
>
<topic>SMP</topic>
<topic>User-Mode Linux</topic>

<mention>Horst von Brand</mention>

<p>

Nathan Straz announced:

</p>

<quote who="Nathan Straz">

<p>

SGI would like to announce the Linux Test Project.  The goal of this project
is to create a formalized test system for the Linux kernel.

</p><p>

We have released a set of 96 tests on the project's website (<a
href="http://oss.sgi.com/projects/ltp/">http://oss.sgi.com/projects/ltp/</a>).
These tests exercise file systems and system calls and can be used for
stress testing or sanity tests.

</p><p>

We would like to discuss the following topics with the community.

</p><p>

<ol>

<li>The testing philosophy that is most important to the kernel developers.
What approach best fits the development process? Regression? Functional?
Stress? Performance?</li>
 
<li>What is needed immediately?  Building a test suite for the kernel is
going to take time. What tests or tools are most important?</li>
 
<li>We need to plan a development road map that works with the Linux kernel
development road map.</li>

</ol>

</p><p>

We are hoping to hold an unscheduled BOF at LinuxWorld on Wednesday. Aaron
Laffin and Richard Logan will be there to discuss these issues. If you are
interested in testing and are attending LinuxWorld, please keep an eye open
for our BOF.

</p>

</quote>

<p>

A lot of folks cheered this idea, and gave feedback. Jeff Garzik felt that
regression testing (testing old features to make sure new ones don't break
stuff) and stress testing would be the two most important things to work on,
with regression having first priority, adding, <quote who="Jeff
Garzik">Regression testing provides more stability of interface and code in
the long run. Stress testing tools tend to focus on a few specific areas of
the code, and be completely inadequate for covering certain cases.</quote>
He also suggested culling and unifying test suites from the various
distribution vendors, since they each seemed to have their own unique set of
tests. Horst von Brand added that a good regression suite would pretty much
be dependent on an existing functional test suite, so he recommended putting
functional tests first in priority.

</p><p>

Jeff Dike also suggested, <quote who="Jeff Dike">Coverage isn't mentioned.
If you are interested in doing a coverage test suite, then you should look
into using gcov in conjunction with user-mode linux (<a
href="http://user-mode-linux.sourceforge.net">http://user-mode-linux.sourceforge.net</a>).
I've done this in the past, and it works just fine.</quote> Nathan felt that
a coverage suite wouldn't do much for the Linux kernel, since they wanted to
test functionality more than the code itself. But he added that if they did
decide to start covering code, user-mode Linux would be the way to go.

</p><p>

Andi Kleen also suggested that a suite to test common system calls in
parallel on multiple CPUs could catch locking bugs that might have been
introduced by 2.4's SMP scaling work. Nathan replied, <quote who="Nathan
Straz">You should check out the tests we just released. They are
"quickhitters" which are very simple tests that exercise system calls. You
can run quichitters with a "-c n" option which creates n copies of the test
and runs them simultaneously.</quote> But he added that these tests had not
been created with Andi's idea in mind, and tended to interfere with each
other. But he felt a fix would be possible, and offered to supply more
information to anyone interested in working on it.

</p><p>

David Mansfield also asked how the test suite would deal with tests that
caused pathological behavior in the kernel, <quote who="David Mansfield">for
example, 'infinite' hangs in the MM system during OOM, or crashes (OOPSes,
panics) or deadlocks (process stuck in 'D' state).</quote> All the tests he
normally performed on kernels, he went on, involved situations like these,
and he felt that any test suite would have to deal with them somehow. Pavel
suggested running those tests in user-mode Linux, which would keep the
machine up even if the user-mode kernel crashed. Nathan replied to David,
<quote who="Nathan Straz">We definately need to build into the framework a
way to recover from problems like this. If this will be some type of
automated reboot, or someone walking in a rebooting the machine manually, I
don't know. My goals are to get the system as automated as possible. It may
turn out that we will include these tests as manual tests for
completeness.</quote> Andi replied to this, suggesting the "software
watchdog", which would reboot the system if its daemon failed to write to a
specific file at regular intervals. He went on, <quote who="Andi Kleen">I
would recommend configuring the software watchdog before running any
critical tests. The test procedure could also use a simple log mechanism
(write a START TEST record to a log file, fsync it) and a restart mechanism
that tries to figure out any crashes so they can be logged.</quote>

</p>

</section>

<section
  title="Linux 2.2.17pre16"
  subject="Linux 2.2.17pre16"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_02/msg00367.html"
  posts="13"
  startdate="10 Aug 2000 00:00:00 -0800"
  enddate="14 Aug 2000 00:00:00 -0800"
>
<topic>Networking</topic>
<topic>PCI</topic>
<topic>Sound: i810</topic>
<notopic>Clustering: Beowulf</notopic>

<mention>Paul Mackerras</mention>
<mention>Jan Harkes</mention>
<mention>Andrey Savochkin</mention>
<mention>Pontus Fuchs</mention>
<mention>Marcelo Tosatti</mention>
<mention>Bill Nottingham</mention>
<mention>Dave Jones</mention>
<mention>Donald Becker</mention>
<mention>Andi Kleen</mention>
<mention>Patrick van de Lageweg</mention>



<p>

Alan Cox posted the CHANGELOG for 2.2.17pre16:

</p>

<quote who="Alan Cox">

<p>

2.2.17pre16

</p><p>

<ol>

<li>Thinkpad hacks and external amp support for CS46xx, also fix mono    (Bill Nottingham, me, David Kaiser) </li>
<li>Actually fix i810 audio hangs and other stuff   (me)</li>
<li>Dave Jones addr change                          (Dave Jones)</li>
<li>Fix long standing vm hang bug                   (Marcelo Tosatti)</li>
<li>Fix irda memory leak                            (Pontus Fuchs)</li>
<li>Minor further PPC fixes                         (Paul Mackerras)</li>
<li>Fix PCI id ordering                             (Paul Mackerras)</li>
<li>3Ware corrected update                          (Adam Radford Joel Jacobson)</li>
<li>Fix stale documentation in proc.txt             (Paonia Ezrine)</li>
<li>Fix the TCP/vm bug nicely                       (Andi Kleen)</li>
<li>Add 3c556 support to the 3c59x driver           (Andrew Morton)</li>
<li>Switch eepro100 to I/O mode pending investigation (Andrey Savochkin)</li>
<li>Fix 'Donald Duck impressions' in ES1879 audio   (Bruce Forsberg)</li>
<li>CODA fs fixes for 2.2.17pre                     (Jan Harkes)</li>
<li>RIO serial driver update                        (Patrick van de Lageweg)</li>
<li>Minimal version of the at1700 fix [From Hiroaki Nagoya's original stuff]              (Brian S. Julin)</li>
<li>Typo fix in sysctl vm docs                      (Dave Jones)</li>
<li>DAC960 update to rev 2.2.7                      (Leonard Zubkoff)</li>

</ol>

</p>

</quote>

<p>

To item 11 ("Add 3c556 support to the 3c59x driver"), Andrew Morton
corrected:

</p>

<quote who="Andrew Morton">

<p>

Support is partial because the 3c59x.c in kernel 2.2 does not support power
management. A moderate amount of mangling will be needed to make it do so.

</p><p>

The workaround is to add something like the following to your power
management `resume' script:

</p><p>

        ifdown eth0<br />
        rmmod 3c59x<br />
        modprobe 3c59x<br />
        ifup eth0

</p><p>

The 3c556 is also supported by Donald Becker's driver (<a
href="http://www.scyld.com">http://www.scyld.com</a>). Although that driver
does support power management, it does not yet do so for the 3c556.

</p><p>

Another variant of this device has been reported.  It has a PCI device ID of
0x6056. It has not yet responded to resuscitation attempts.

</p><p>

Additional details are on Fred Maciel's page at <a
href="http://www2.neweb.ne.jp/wd/fbm/3c556">http://www2.neweb.ne.jp/wd/fbm/3c556</a>

</p>

</quote>

</section>

<section
  title="Linux-2.4.0-test6"
  subject="Linux-2.4.0-test6"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_02/msg00401.html"
  posts="6"
  startdate="09 Aug 2000 00:00:00 -0800"
  enddate="10 Aug 2000 00:00:00 -0800"
>
<topic>Disks: IDE</topic>
<topic>FS: UMSDOS</topic>
<topic>FS: ext2</topic>
<topic>Kernel Release Announcement</topic>
<topic>Networking</topic>
<topic>PCI</topic>

<p>

Linus Torvalds announced Linux 2.4.0-test6, saying:

</p>

<quote who="Linus Torvalds">

<p>

Ok, test6 is there now:

</p><p>

Changes in test6:

</p><p>

<ol>

<li>speling fixces.</li>
<li>fix drm/agp initialization issue</li>
<li>saner modules installation
        (*) NOTE! This may/will break some module setups.  Files go in
        different places. Better places.</li>
<li>per-CPU irq count area. Better for caches, simpler code.</li>
<li>"mem_map + MAP_NR(x)" =&gt; virt_to_page(x)
        (*) Purely syntactic change at this point. NUMA memory handling
        will take advantage of this during 2.5.x</li>
<li>page_address() returns (void *) to make it clearer that it is a
   virtual address (it's the reverse of "virt_to_page()", see above).</li>
<li>zimage builds should work again.</li>
<li>Make current gcc's able to compile the kernel.</li>
<li>fix irq probing in IDE driver: this caused strange irq problems for
   other drivers later on (notably PCMCIA, which is one of the few
   drivers to still probe for ISA interrupts on modern machines).</li>
<li>Intel microcode update update.</li>
<li>mips/mips64/sh/sparc/sparc64/acorn updates</li>
<li>DAC960 driver update</li>
<li>floppy shouldn't scream on open/close</li>
<li>console driver does correct palette setting.  No more black screens
   with XF86-4.x</li>
<li>ISDN updates</li>
<li>PCI layer can assign resources from multiple IO and memory windows</li>
<li>yenta_socket driver no longer oopsable on unload.</li>
<li>flush_dcache_page() for more virtual dcache coherency issues</li>
<li>ext2_get_block() races fixed</li>
<li>jffs bugfixes galore.</li>
<li>user resource tracking infrastructure re-organization.</li>
<li>umsdos works again.</li>
<li>loopback shouldn't deadlock</li>
 
</ol>

</p><p>

Tons of small stuff. Holler if there's something bad.

</p>

</quote>

</section>

<section
  title="NT/HFS-Style Multiple &quot;Resources&quot; In A Single File"
  subject="NTFS-like streams?"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_02/msg00723.html"
  posts="332"
  startdate="11 Aug 2000 00:00:00 -0800"
  enddate="15 Aug 2000 00:00:00 -0800"
>
<topic>Extended Attributes</topic>
<topic>FS: NFS</topic>
<topic>FS: NTFS</topic>
<topic>POSIX</topic>

<mention>Pavel Machek</mention>

<p>

Christopher Vickery suggested an NT-like feature, whereby a single file
could have several streams of data, each of which could be operated on as a
unique file. A lot of folks were against this, and various alternatives were
suggested, such as simply using directories with multiple files. In one of
the many subthreads branching off of this post, Michael Rothwell explained
that such a thing did not yet exist for Linux, and he explained, <quote
who="Michael Rothwell">There's two different ways of doing it currently; the
BeOS way and the NT way. As you said, NT makes a namespace augmentation,
using the ":" character to deliniate attribute names from file names. This
is called "named streams". BeOS does not do that, but provides special
accessor functions instead; this is called "extended attributes." They both
accomplish the same goal though: keeping extra data about a file with the
file.</quote> Linus Torvalds replied:

</p>

<quote who="Linus Torvalds">

<p>

Note that this is a subset of what I wanted to make sure the Linux VFS layer
can do: if a filesystem has multiple forks in a file, the VFS layer should
be able to handle it by just doing the normal "readdir()" and "lookup()" on
such regular files.

</p><p>

Of course, no UNIX filesystem does this, so it has never gotten any testing.
But the plan was (and is) that if somebody wants to implement resource
forks, then it should be possible without any hackery.

</p><p>

Linux does _not_ use the ":" character, of course. Linux uses the same old
"/" that it always uses for delineating names. That's pretty built-in into
the VFS layer.

</p><p>

But it definitely should not be impossible to have a file called

</p><p>

        ~/myfile

</p><p>

and then access the "Icon" resource in it by just doing

</p><p>

        xv ~/myfile/Icon

</p><p>

It requires that the low-level FS know what it is doing, and it may require
some changes (small) to the VFS layer just because it has never been done
before (and I'd be surprised if such resource forks didn't uncover
_something_), but it should be entirely doable.

</p>

</quote>

<p>

When the "use directories" argument was put forward again, Linus drove his
point home:

</p>

<quote who="Linus Torvalds">

<p>

I'll talk really slowly.

</p><p>

HFS has resource forks.  They are not directories.  Linux cannot handle them
well.

</p><p>

I'm all for handling HFS resource forks. It's called "interoperability".

</p><p>

It's also realizing that maybe, just maybe, UNIX didn't invent every clever
idea out there. Maybe, just maybe, resource forks are actually a good idea.
And maybe we shouldn't just say "Oh, UNIX already has directories, we don't
need no steenking resource forks".

</p><p>

Put this another way: don't think about "directories vs resource forks" at
all. Instead, think about the problem of supporting something like HFS or
NTFS _well_ from Linux. How would you do it?

</p><p>

Suggestions welcome. What's your interface of choice for a filesystem like
HFS that _does_ have resource forks? Whether you like them or not is
completely immaterial - they exist.

</p><p>

And usability concerns _are_ real concerns. I'm claiming that the best
interface for such a filesystem would be

</p><p>

        open("file", O_RDONLY)          - opens the default fork<br />
        open("file/Icon", O_RDONLY)     - opens the Icon fork<br />
        open("file/Creator"...<br />
        readdir("file")                 - lists the resources that the file has

</p><p>

and I'm also claiming that the Linux VFS layer actually shouldn't have any
fundamental problems with something like this.

</p><p>

Tell me why we shouldn't do it like the above? And DON'T give any crap about
whether resource forks are useful or not, because I claim that they exist
regardless of their usefulness and that we shouldn't just put our heads in
the sand and try to hope that the issue doesn't exist.

</p>

</quote>

<p>

At one point in the discussion, Alexander Viro objected, <quote
who="Alexander Viro">POSIX has a lot of nasty words about mixing files and
directories. And I'm afraid that saying "no, foo is file, it just happens to
have children" won't work - that way you are going to screw a lot of
userland stuff.</quote> To which Linus replied, <quote who="Linus
Torvalds">Note that NFS isn't strictly a POSIX filesystem. And certainly
neither is MSDOSfs or /proc. Not being POSIX doesn't mean that they are
useless.</quote>

</p><p>

Pavel Machek and others worried that supporting file "resources" in this way
would break a lot of userland apps, but Linus countered:

</p>

<quote who="Linus Torvalds">

<p>

I don't think this is a strong argument. Any program that "knows" that it is
handling a POSIX filesystem and simply does part of the work itself is
always going to break on extensions. That's just unavoidable. Adding the
magic string at the end makes "xv" happy, but might easily make something
else that assumes POSIX behaviour unhappy instead (ie somebody else does
'stat("myfile#utar")' and is unhappy because it doesn't exist).

</p><p>

Tough. Whatever we do, complex files are going to act differently from
regular files. Even a HFS approach that looks _exactly_ like a UNIX
filesystem will confuse programs that get unhappy when the resource files
magically disappear when the non-resource file is deleted.

</p>

</quote>

<p>

He went on:

</p>

<quote who="Linus Torvalds">

<p>

I'm personally worried not about individual programs not being able to take
advantage of the resources, but about Linux fundamentally not _supporting_
the notion of resources at all.

</p><p>

So what I want to make sure is that Linux supports the infrastructure for
people to take advantage of resource forks. The fact that not everybody is
going to be able to do so automatically is not my problem.

</p><p>

Put another way: I suspect that we won't support resource forks natively for
another few years, and HFS etc will have their own specialized stuff. I
don't care all that much. But at the same time I do believe that eventually
we'll probably have to handle it. And at _that_ point I care about the fact
that our internal design has to be robust. It doesn't have to make everybody
happy, but it has to be clean both conceptually and from a pure
implementation standpoint. I don't want a "hack that works".

</p>

</quote>

<p>

There were a lot of other implementation concerns from various folks, and
the discussion continued for a good while, as various issues were hashed
out.

</p>

</section>

<section
  title="USB Initialization Cleanup"
  subject="USB initialisation"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_02/msg00916.html"
  posts="3"
  startdate="12 Aug 2000 00:00:00 -0800"
  enddate="12 Aug 2000 00:00:00 -0800"
>
<topic>USB</topic>

<p>

Russell King reported:

</p>

<quote who="Russell King">

<p>

On one of the ARM platforms, we have encountered a problem with the order of
initialisation of USB vs the initial "bus"-type hardware setup.

</p><p>

Since Linus doesn't like new init calls going into init/main.c, the
initialisation of a chip which has PCMCIA and USB hardware incorporated is
placed at the head of the initcall list.

</p><p>

However, the USB drivers (OHCI) are initialised before this time by an
explicit call in init/main.c.

</p><p>

Can we initialise the USB hardware drivers via the initcall method, or is
there some reason why its done the way it is?

</p>

</quote>

<p>

Linus Torvalds replied, <quote who="Linus Torvalds">I would _much_ prefer to
have the USB drivers fully initialized with "initcalls()". The only reason
it's done like it is right now is that I suspect the USB maintainers didn't
realize that you can fully order the initcall sequence by just chaning the
link order. If I get a patch that removes "usb_init()" from init/main.c,
I'll apply it right away.</quote> Russel posted a preliminary patch, though
he suggested running it by the USB folks first; and the thread ended.

</p>

</section>

<section
  title="2.4.0-test7-pre3 And ChangeLogs"
  subject="test7-pre3"
  archive="http://kernelnotes.org/lnxlists/linux-kernel/lk_0008_02/msg01025.html"
  posts="3"
  startdate="12 Aug 2000 00:00:00 -0800"
  enddate="14 Aug 2000 00:00:00 -0800"
>
<topic>FS: FAT</topic>
<topic>Kernel Release Announcement</topic>
<topic>Networking</topic>
<topic>PCI</topic>
<topic>Sound: i810</topic>
<topic>USB</topic>

<mention>Chris Good</mention>

<p>

Linus Torvalds announced 2.4.0-test7-pre3, saying:

</p>

<quote who="Linus Torvalds">

<p>

Trying something new: keeping rudimentary change-logs. I should keep this up
until final 2.4.0. Watch me.

</p><p>

test7:

</p><p>

<ul>

<li>pre1:
    <ul>
    <li>fix PCI resource bug that crept in in test6 due to the new
      requirements to handle multiple bus regions transparently</li>
    <li>ll_rw_block documentation</li>
    <li>sound driver module counting bugfix and cleanup (move to named
      initializers)</li>
    <li>directory rename bug fix for busy directories (oops)</li>
    <li>allow "init_new_context()" to fail - it can do so on some
      architectures when out of memory.</li>
    <li>networking updates - TCP retransmission and ordering logic</li>
    <li>fix strsep(). Not that anybody cared.</li>
    </ul>
</li>
<li>pre2:
    <ul>
    <li>fix modversions.h generation ("make -j dep" works now)</li>
    <li>finish 64-bit VFS: getdents64 and fcntl64 (getdents64 also adds
      the "file type" to the readdir data - VFS layer change.  fcntl64
      allows 64-bit file locking)</li>
    <li>Intel i810 watchdog driver and NS DP83810 network driver</li>
    <li>dup2() cannot screw up the file table with threads any more.</li>
    </ul>
</li>
<li>pre3:
    <ul>
    <li>nfs_commit_rpcsetup() signed comparison bugfix and cleanup</li>
    <li>sparc updates and TLB invalidation fix</li>
    <li>networking updates (less verbose on the new reordering messages)</li>
    <li>network driver Makefile cleanup</li>
    <li>Fix segment copy on fork.</li>
    <li>tsk-&gt;files race fixes: close-on-exec etc.</li>
    <li>sound #define cleanups</li>
    <li>fs/proc/array.c task_lock cleanup</li>
    </ul>
</li>

</ul>

</p>

</quote>

<p>

He replied to himself a couple days later with pre4:

</p>

<quote who="Linus Torvalds">

<p>

<ul>

<li>pre4:
    <ul>
    <li>"USE_STANDARD_AS_RULE" - generic Rules.make as rule</li>
    <li>arm update (arch/arm, asm-arm, drivers/acorn, Documentation/arm etc)</li>
    <li>eicon ISDN driver update (big).</li>
    <li>serial.c warnings removal.</li>
    <li>compilation fixes under different configurations..</li>
    <li>bounds checking for hpfs code page index.</li>
    <li>sparc64 bugfix for atomic_dec_and_lock. Oops. And use flock64.</li>
    <li>FAT missed the d_type thing from readdir.</li>
    <li>fix tsk-&gt;files race fixes from -pre3 ("struct files_struct", not
      "struct file" and make sure to register the socket fs before we
      use a pointer to it)</li>
    <li>ns558.c: don't leave the driver registered after a failed module
      load.  Either return success, or unregister the PCI driver. And
      don't leak IO port allocations.</li>
    <li>USB OHCI controller fixes for oopses due to races..</li>
    <li>usb updates</li>
    <li>3c59x driver update</li>
    <li>VIA KX-133/KT-133 chipset detection and AGP bridge support</li>
    <li>raid/raw-io cleanup: use generic_make_request instead of ll_rw_block.</li>
    <li>Emu10k1 sound driver update</li>
    </ul>
</li>

</ul>

</p>

</quote>

<p>

Chris Good suggested putting the latest changes at the top of the announcements...

</p><p>

<editorialize>I think ChangeLogs are a great improvement, especially over
what we had before, which was, basically, no announcements at all... I doubt
we'll see ChangeLogs once we hit 2.5 though, but hopefully someone will give
Linus a nudge to start up again when we get close to 2.6</editorialize>

</p>

</section>

</kc>
