There is also an optional daemon that may be used with devfs. You can find out more about it at: http://www.atnf.csiro.au/~rgooch/linux/
NEWFLASH: The official 2.3.46 kernel has included the devfs patch. Future patches will be released which build on this. These patches are rolled into Linus' tree from time to time.
A mailing list is available which you may subscribe to. Send
email
to majordomo@oss.sgi.com with the following line in the
body of the message:
subscribe devfs
The list is archived at
http://oss.sgi.com/projects/devfs/archive/.
NOTE that devfs is entirely optional. If you prefer the old disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the default). In this case, nothing will change. ALSO NOTE that if you do enable devfs, the defaults are such that full compatibility is maintained with the old devices names.
There are two aspects to devfs: one is the underlying device namespace, which is a namespace just like any mounted filesystem. The other aspect is the filesystem code which provides a view of the device namespace. The reason I make a distinction is because devfs can be mounted many times, with each mount showing the same device namespace. Changes made are global to all mounted devfs filesystems. Also, because the devfs namespace exists without any devfs mounts, you can easily mount the root filesystem by referring to an entry in the devfs namespace.
The cost of devfs is a small increase in kernel code size and memory
usage. About 7 pages of code (some of that in __init sections) and 72
bytes for each entry in the namespace. A modest system has only a
couple of hundred device entries, so this costs a few more
pages. Compare this with the suggestion to put /dev on a ramdisc.
On a typical machine, the cost is under 0.2 percent. On a modest
system with 64 MBytes of RAM, the cost is under 0.1 percent. The
accusations of "bloatware" levelled at devfs are not justified.
The choice is a patchwork of inefficient user space solutions, which are complex and likely to be fragile, or to use a simple and efficient devfs which is robust.
There have been many counter-proposals to devfs, all seeking to provide some of the benefits without actually implementing devfs. So far there has been an absence of code and no proposed alternative has been able to provide all the features that devfs does. Further, alternative proposals require far more complexity in user-space (and still deliver less functionality than devfs). Some people have the mantra of reducing "kernel bloat", but don't consider the effects on user-space.
A good solution limits the total complexity of kernel-space and user-space.
host 6 bits (say up to 64 hosts on a really big machine) channel 4 bits (say up to 16 SCSI buses per host) id 4 bits lun 3 bits partition 6 bits TOTAL 23 bitsThis requires 8 Mega (1024*1024) inodes if we want to store all possible device nodes. Even if we scrap everything but id,partition and assume a single host adapter with a single SCSI bus and only one logical unit per SCSI target (id), that's still 10 bits or 1024 inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so that's 256 kBytes of inode storage on disc (assuming real inodes take a similar amount of space as VFS inodes). This is actually not so bad, because disc is cheap these days. Embedded systems would care about 256 kBytes of /dev inodes, but you could argue that embedded systems would have hand-tuned /dev directories. I've had to do just that on my embedded systems, but I would rather just leave it to devfs. Another issue is the time taken to lookup an inode when first referenced. Not only does this take time in scanning through a list in memory, but also the seek times to read the inodes off disc. This could be solved in user-space using a clever programme which scanned the kernel logs and deleted /dev entries which are not available and created them when they were available. This programme would need to be run every time a new module was loaded, which would slow things down a lot.
There is an existing programme called scsidev which will automatically create device nodes for SCSI devices. It can do this by scanning files in /proc/scsi. Unfortunately, to extend this idea to other device nodes would require significant modifications to existing drivers (so they too would provide information in /proc). This is a non-trivial change (I should know: devfs has had to do something similar). Once you go to this much effort, you may as well use devfs itself (which also provides this information). Furthermore, such a system would likely be implemented in an ad-hoc fashion, as different drivers will provide their information in different ways.
Devfs is much cleaner, because it (natually) has a uniform mechanism to provide this information: the device nodes themselves!
With the current 8 bit major and minor numbers the connection between disc-based c&b nodes and per-major drivers is done through a fixed-length table of 128 entries. The various filesystem types set the inode operations for c&b nodes to {chr,blk}dev_inode_operations, so when a device is opened a few quick levels of indirection bring us to the driver file_operations.
For miscellaneous character devices a second step is required: there is a scan for the driver entry with the same minor number as the file that was opened, and the appropriate minor open method is called. This scanning is done *every time* you open a device node. Potentially, you may be searching through dozens of misc. entries before you find your open method. While not an enormous performance overhead, this does seem pointless.
Linux *must* move beyond the 8 bit major and minor barrier, somehow. If we simply increase each to 16 bits, then the indexing scheme used for major driver lookup becomes untenable, because the major tables (one each for character and block devices) would need to be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit systems). So we would have to use a scheme like that used for miscellaneous character devices, which means the search time goes up linearly with the average number of major device drivers on your system. Not all "devices" are hardware, some are higher-level drivers like KGI, so you can get more "devices" without adding hardware You can improve this by creating an ordered (balanced:-) binary tree, in which case your search time becomes log(N). Alternatively, you can use hashing to speed up the search. But why do that search at all if you don't have to? Once again, it seems pointless.
Note that devfs doesn't use the major&minor system. For devfs entries, the connection is done when you lookup the /dev entry. When devfs_register() is called, an internal table is appended which has the entry name and the file_operations. If the dentry cache doesn't have the /dev entry already, this internal table is scanned to get the file_operations, and an inode is created. If the dentry cache already has the entry, there is *no lookup time* (other than the dentry scan itself, but we can't avoid that anyway, and besides Linux dentries cream other OS's which don't have them:-). Furthermore, the number of node entries in a devfs is only the number of available device entries, not the number of *conceivable* entries. Even if you remove unnecessary entries in a disc-based /dev, the number of conceivable entries remains the same: you just limit yourself in order to save space.
Devfs provides a fast connection between a VFS node and the device driver, in a scalable way.
Solving this requires a kernel change.
Since writing this, the kernel has been modified so that the SCSI disc driver has more major numbers allocated to it and now supports up to 128 discs. Since these major numbers are non-contiguous (a result of unplanned expansion), the implementation is a little more cumbersome than originally.
Just like the changes to IPv4 to fix impending limitations in the address space, people find ways around the limitations. In the long run, however, solutions like IPv6 or devfs can't be put off forever.
Also, you can't use a shared NFS root filesystem for a cluster of discless Linux machines (having tty ownerships changed on a common /dev is not good). Nor can you embed your root filesystem in a ROM-FS.
You can get around this by creating a RAMDISC at boot time, making an ext2 filesystem in it, mounting it somewhere and copying the contents of /dev into it, then unmounting it and mounting it over /dev.
A devfs is a cleaner way of solving this.
Devfs solves this in a robust and conceptually simple way.
An alternative is to create a new open_pty() syscall which does much the same thing as the user-space daemon. Once again, this requires modifications to pty-handling programmes.
The devfs solution allows a device driver to "tag" certain device files so that when an unopened device is opened, the ownerships are changed to the current euid and egid of the opening process, and the protections are changed to the default registered by the driver. When the device is closed ownership is set back to root and protections are set back to read-write for everybody. No programme need be changed. The devpts filesystem provides this auto-ownership feature for Unix98 ptys. It doesn't support old-style pty devices, nor does it have all the other features of devfs.
Device entry registration events can be used by devfsd to change permissions of newly-created device nodes. This is one mechanism to control device permissions.
Device entry registration/unregistration events can be used to run programmes or scripts. This can be used to provide automatic mounting of filesystems when a new block device media is inserted into the drive.
Asynchronous device open and close events can be used to implement clever permissions management. For example, the default permissions on /dev/dsp do not allow everybody to read from the device. This is sensible, as you don't want some remote user recording what you say at your console. However, the console user is also prevented from recording. This behaviour is not desirable. With asynchronous device open and close events, you can have devfsd run a programme or script when console devices are opened to change the ownerships for *other* device nodes (such as /dev/dsp). On closure, you can run a different script to restore permissions. An advantage of this scheme over modifying the C library tty handling is that this works even if your programme crashes (how many times have you seen the utmp database with lingering entries for non-existent logins?).
Synchronous device open events can be used to perform intelligent device access protections. Before the device driver open() method is called, the daemon must first validate the open attempt, by running an external programme or script. This is far more flexible than access control lists, as access can be determined on the basis of other system conditions instead of just the UID and GID.
Inode lookup events can be used to authenticate module autoload requests. Instead of using kmod directly, the event is sent to devfsd which can implement an arbitrary authentication before loading the module itself. Inode lookup events can also be used to construct arbitrary namespaces, without having to resort to populating devfs with symlinks to devices that don't exist.
The same application also wants to see which devices are actually available on the system. With the existing system it needs to read the /dev directory and speculatively open each /dev/sr* device to determine if the device exists or not. With a large /dev this is an inefficient operation, especially if there are many /dev/sr* nodes. A solution like scsidev could reduce the number of /dev/sr* entries (but of course that also requires all that inefficient directory scanning).
With devfs, the application can open the /dev/sr directory (which triggers the module autoloading if required), and proceed to read /dev/sr. Since only the available devices will have entries, there are no inefficencies in directory scanning or device openings.
While we shouldn't just automatically do something because others do it, we should not ignore the work of others either. FreeBSD has a lot of competent people working on it, so their opinion should not be blithely ignored.
Because of these improvements to the VFS, the multi-mount capability in devfs is no longer needed. The administrator may create a minimal device tree inside a chroot(2) gaol by using VFS bindings. As this provides most of the features of the devfs multi-mount capability, I removed the multi-mount support code (after issuing an RFC). This yielded code size reductions and simplifications.
If you want to construct a minimal chroot() gaol, the following command should suffice:
mount -t bind /dev/null /gaol/dev/nullRepeat for other device nodes you want to expose. Simple!
Compile and install devfsd. You will be provided with a default configuration file /etc/devfsd.conf which will provide compatibility symlinks for the old naming scheme. Don't change this config file unless you know what you're doing. Even if you think you do know what you're doing, don't change it until you've followed all the steps below and booted a devfs-enabled system and verified that it works.
Now edit your main system boot script so that devfsd is started at the very beginning (before any filesystem checks). /etc/rc.d/rc.sysinit is often the main boot script on systems with SysV-style boot scripts. On systems with BSD-style boot scripts it is often /etc/rc. Also check /sbin/rc.
NOTE that the line you put into the boot script should be exactly:
/sbin/devfsd /devDO NOT use some special daemon-launching programme, otherwise the boot script may not wait for devfsd to finish initialising.
1 2 3 4 5 6 7 8This may potentially weaken security by allowing root logins over the network (a password is still required, though). However, since there are problems with dealing with symlinks, I'm suspicious of the level of security offered in any case.
A better solution is to install util-linux-2.10.h or later, which fixes a bug with ttyname handling in the login programme. Then append the following lines to your /etc/securetty file:
vc/1 vc/2 vc/3 vc/4 vc/5 vc/6 vc/7 vc/8This will not weaken security.
--- /etc/security/console.perms.orig Sat Apr 17 16:26:47 1999 +++ /etc/security/console.perms Fri Feb 25 23:53:55 2000 @@ -14,7 +14,7 @@ # man 5 console.perms # file classes -- these are regular expressions -<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] +<console>=tty[0-9][0-9]* [0-9][0-9]* :[0-9]\.[0-9] :[0-9] # device classes -- these are shell-style globs <floppy>=/dev/fd[0-1]*
Alternatively, use the same technique used for unsupported drivers described above.
append = "devfs=mount"This will make the kernel mount devfs at boot time onto /dev.
Now you've finished all the steps required. You're now ready to boot your shiny new kernel. Enjoy.
A much better approach is to use devfsd to save and restore permissions. It may be configured to record changes in permissions and will save them in a database (in fact a directory tree), and restore these upon boot. This is an efficient method and results in immediate saving of current permissions (unlike the tar approach, which save permissions at some unspecified future time).
The default configuration file supplied with devfsd has config entries which you may uncomment to enable persistence management.
If you decide to use the tar approach anyway, be aware that tar will first unlink(2) an inode before creating a new device node. The unlink(2) has the effect of breaking the connection between a devfs entry and the device driver. If you use the "devfs=only" boot option, you lose access to the device driver, requiring you to reload the module. I consider this a bug in tar (there is no real need to unlink(2) the inode first).
Alternatively, you can use devfsd to provide more sophisticated management of device permissions. You can use devfsd to store permissions for whole groups of devices with a single configuration entry, rather than the conventional single entry per device entry.
mount -t bind /dev /dev-state mount -t devfs none /dev devfsd /dev
REGISTER .* COPY /dev-state/$devname $devpath CHANGE .* COPY $devpath /dev-state/$devname CREATE .* COPY $devpath /dev-state/$devname
Hopefully for most people devfs will have enough support so that they can mount devfs directly over /dev without loosing most functionality (i.e. loosing access to various devices). As of 22-JAN-1998 (devfs patch version 10) I am now running this way. All the devices I have are available in devfs, so I don't lose anything.
WARNING: if your configuration requires the old-style device names (i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure it to maintain compatibility entries. It is almost certain that you will require this. Note that the kernel creates a compatibility entry for the root device, so you don't need initrd.
Note that you no longer need to mount devpts if you use Unix98 PTYs, as devfs can manage /dev/pts itself. This saves you some RAM, as you don't need to compile and install devpts. Note that some versions of glibc have a bug with Unix98 pty handling on devfs systems. Contact the glibc maintainers for a fix. Glibc 2.1.3 has the fix.
Note also that apart from editing /etc/fstab, other things will need to be changed if you *don't* install devfsd. Some software (like the X server) hard-wire device names in their source. It really is much easier to install devfsd so that compatibility entries are created. You can then slowly migrate your system to using the new device names (for example, by starting with /etc/fstab), and then limiting the compatibility entries that devfsd creates.
MAKE SURE YOU INSTALL DEVFSD BEFORE YOU BOOT A DEVFS-ENABLED KERNEL!
Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of reports back. Many of these are because people are trying to run without devfsd, and hence some things break. Please just run devfsd if things break. I want to concentrate on real bugs rather than misconfiguration problems at the moment. If people are willing to fix bugs/false assumptions in other code (i.e. glibc, X server) and submit that to the respective maintainers, that would be great.
If you don't run devfsd, or don't enable compatibility entry management, then you will have to configure your system to use the new names. For example, you will then need to edit your /etc/fstab to use the new disc naming scheme. If you want to be able to boot non-devfs kernels, you will need compatibility symlinks in the underlying disc-based /dev pointing back to the old-style names for when you boot a kernel without devfs.
You can selectively decide which devices you want compatibility entries for. For example, you may only want compatibility entries for BSD pseudo-terminal devices (otherwise you'll have to patch you C library or use Unix98 ptys instead). It's just a matter of putting in the correct regular expression into /dev/devfsd.conf.
There are other choices of naming schemes that you may prefer. For example, I don't use the kernel-supplied names, because they are too verbose. A common misconception is that the kernel-supplied names are meant to be used directly in configuration files. This is not the case. They are designed to reflect the layout of the devices attached and to provide easy classification.
If you like the kernel-supplied names, that's fine. If you don't then you should be using devfsd to construct a namespace more to your liking. Devfsd has built-in code to construct a namespace that is both logical and easy to manage. In essence, it creates a convenient abbreviation of the kernel-supplied namespace.
You are of course free to build your own namespace. Devfsd has all the infrastructure required to make this easy for you. All you need do is write a script. You can even write some C code and devfsd can load the shared object as a callable extension.
The default behaviour now is not to mount devfs onto /dev at boot time for 2.3.x and later kernels. You can correct this with the "devfs=mount" boot option. This solves any problems with init, and also prevents the dreaded:
Cannot open initial consolemessage. For 2.2.x kernels where you need to apply the devfs patch, the default is to mount.
If you have automatic mounting of devfs onto /dev then you may need to create /dev/initctl in your boot scripts. The following lines should suffice:
mknod /dev/initctl p kill -SIGUSR1 1 # tell init that /dev/initctl now existsAlternatively, if you don't want the kernel to mount devfs onto /dev then you could use the following procedure is a guideline for how to get around /dev/initctl problems:
# cd /sbin # mv init init.real # cat > init #! /bin/sh mount -n -t devfs none /dev mknod /dev/initctl p exec /sbin/init.real $* [control-D] # chmod a+x initNote that newer versions of init create /dev/initctl automatically, so you don't have to worry about this.
LOOKUP .* MODLOADAs of devfsd-v1.3.10, a generic /etc/modules.devfs configuration file is installed, which is used by the MODLOAD action. This should be sufficient for most configurations. If you require further configuration, edit your /etc/modules.conf file.
append = "root=<device>"Surprised? Yep, so was I. It turns out if you have (as most people do):
root = <device>then LILO will determine the device number of
Note that this isn't an issue if you don't pass "devfs=only".
/dev/discs/disc0 first disc /dev/discs/disc1 second discEach of these entries is a symbolic link to the directory for that device. The device directory contains:
disc for the whole disc part* for individual partitions
/dev/cdroms/cdrom0 first CD-ROM /dev/cdroms/cdrom1 second CD-ROMEach of these entries is a symbolic link to the real device entry for that device.
/dev/tapes/tape0 first tape /dev/tapes/tape1 second tapeEach of these entries is a symbolic link to the directory for that device. The device directory contains:
mt for mode 0 mtl for mode 1 mtm for mode 2 mta for mode 3 mtn for mode 0, no rewind mtln for mode 1, no rewind mtmn for mode 2, no rewind mtan for mode 3, no rewind
controller (host adapter) bus (SCSI channel) target (SCSI ID) unit (Logical Unit Number)All SCSI devices are placed under /dev/scsi (assuming devfs is mounted on /dev). Hence, a SCSI device with the following parameters: c=1,b=2,t=3,u=4 would appear as:
/dev/scsi/host1/bus2/target3/lun4 device directoryInside this directory, a number of device entries may be created, depending on which SCSI device-type drivers were installed.
See the section on the disc naming scheme to see what entries the SCSI disc driver creates.
See the section on the tape naming scheme to see what entries the SCSI tape driver creates.
The SCSI CD-ROM driver creates:
cdThe SCSI generic driver creates:
generic
controller bus (aka. primary/secondary) target (aka. master/slave) unitAll IDE devices are placed under /dev/ide, and uses a similar naming scheme to the SCSI subsystem.
New name Old-name Device Type -------- -------- ----------- /dev/tts/{0,1,...} /dev/ttyS{0,1,...} Serial ports /dev/cua/{0,1,...} /dev/cua{0,1,...} Call out devices /dev/vc/{0,1,...} /dev/tty{1...63} Virtual consoles /dev/vcc/{0,1,...} /dev/vcs{1...63} Virtual consoles /dev/pty/m{0,1,...} /dev/ptyp?? PTY masters /dev/pty/s{0,1,...} /dev/ttyp?? PTY slaves
/dev/rd/{0,1,2,...}
/dev/md/{0,1,2,...}
In order to configure devfsd to create these convenience names, the following lines should be placed in your /etc/devfsd.conf:
REGISTER .* MKNEWCOMPAT UNREGISTER .* RMNEWCOMPATThis will cause devfsd to create (and destroy) symbolic links which point to the kernel-supplied names.
/dev/sd/c1b2t3u4 for the whole disc /dev/sd/c1b2t3u4p5 for the 5th partition /dev/sd/c1b2t3u4p5s6 for the 6th slice in the 5th partition
/dev/st/c1b2t3u4m0 for mode 0 /dev/st/c1b2t3u4m1 for mode 1 /dev/st/c1b2t3u4m2 for mode 2 /dev/st/c1b2t3u4m3 for mode 3 /dev/st/c1b2t3u4m0n for mode 0, no rewind /dev/st/c1b2t3u4m1n for mode 1, no rewind /dev/st/c1b2t3u4m2n for mode 2, no rewind /dev/st/c1b2t3u4m3n for mode 3, no rewind
/dev/sr/c1b2t3u4
/dev/sg/c1b2t3u4
/dev/hda /dev/ide/hd/c0b0t0u0 /dev/hdb /dev/ide/hd/c0b0t1u0 /dev/hdc /dev/ide/hd/c0b1t0u0 /dev/hdd /dev/ide/hd/c0b1t1u0
scsihosts=<name_1>:<name_2>:<name_3>:...:<name_n>where <name_1>,<name_2>,...,<name_n> are the names of drivers used in the /proc filesystem. For example:
scsihosts=aha1542:ppa:aha1542::ncr53c7xxmeans that devices connected to
- first aha1542 controller - will be c0b#t#u# - first parallel port ZIP - will be c1b#t#u# - second aha1542 controller - will be c2b#t#u# - first NCR53C7xx controller - will be c4b#t#u# - any extra controller - will be c5b#t#u#, c6b#t#u#, etc - if any of above controllers will not be found - the reserved names will not be used by any other device. - c3b#t#u# names will never be usedYou can use ',' instead of ':' as the separator character if you wish. I have used the devfsd naming scheme here.
Note that this scheme does not address the SCSI host order if you have multiple cards of the same type (such as NCR53c8xx). In this case you need to use the driver-specific boot parameters to control this.
- All miscellaneous character devices support devfs (this is done transparently through misc_register()) - SCSI discs and generic hard discs - Character memory devices (null, zero, full and so on) Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> - Loop devices (/dev/loop?) - TTY devices (console, serial ports, terminals and pseudo-terminals) Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> - SCSI tapes (/dev/scsi and /dev/tapes) - SCSI CD-ROMs (/dev/scsi and /dev/cdroms) - SCSI generic devices (/dev/scsi) - RAMDISCS (/dev/ram?) - Meta Devices (/dev/md*) - Floppy discs (/dev/floppy) - Parallel port printers (/dev/printers) - Sound devices (/dev/sound) Thanks to Eric Dumas <dumas@linux.eu.org> and C. Scott Ananian <cananian@alumni.princeton.edu> - Joysticks (/dev/joysticks) - Sparc keyboard (/dev/kbd) - DSP56001 digital signal processor (/dev/dsp56k) - Apple Desktop Bus (/dev/adb) - Coda network file system (/dev/cfs*) - Virtual console capture devices (/dev/vcc) Thanks to Dennis Hou <smilax@mindmeld.yi.org> - Frame buffer devices (/dev/fb) - Video capture devices (/dev/v4l)
The simplest option (especially when porting drivers to devfs) is to keep using the old major and minor numbers. Devfs will take whatever values are given for major&minor and pass them onto userspace.
Alternatively, you can have devfs choose unique device numbers for
you. When you register a character or block device using
devfs_register you can provide the optional
DEVFS_FL_AUTO_DEVNUM flag, which will then automatically allocate a
unique device number (the allocation is separated for the character
and block devices).
This device number is a 16 bit number, so this leaves plenty of space
for large numbers of discs and partitions. This scheme can also be
used for character devices, in particular the tty devices, which are
currently limited to 256 pseudo-ttys (this limits the total number of
simultaneous xterms and remote logins). Note that the device number
is limited to the range 36864-61439 (majors 144-239), in order to
avoid any possible conflicts with existing official allocations.
Please note that using dynamically allocated block device numbers may break the NFS daemons (both user and kernel mode), which expect dev_t for a given device to be constant over the lifetime of remote mounts.
A final note on this scheme: since it doesn't increase the size of device numbers, there are no compatibility issues with userspace.
This has several limitations:
Problems:
Problems:
However, not all is lost. If you want to create your own naming scheme, it is a simple matter to write a standalone script, hack devfsd, or write a script called by devfsd. You can create whatever naming scheme you like.
Further, if you want to remove all traces of the devfs naming scheme
from /dev, you can mount devfs elsewhere (say
/devfs) and populate /dev with links into
/devfs. This population can be automated using devfsd if you
wish.
You can even use the VFS binding facility to make the links, rather
than using symbolic links. This way, you don't even have to see the
"destination" of these symbolic links.