GSOC: kevent part 2
Samuel J. Greear
sjg at evilcode.net
Fri May 6 10:28:57 PDT 2011
All,
During my 2010 GSoC Project, I re-implemented the select(2) and
poll(2) kernel interfaces on top of the kevent(2) interface. This
project went mostly as-expected, but having had some time to settle a
number of issues and additional opportunities have arisen. This
project aims to bring the rework of the DragonFly kernel file
descriptor event subsystems to its logical conclusion, fix any
lingering bugs and generally put the code in maintainable and portable
shape.
The first opportunity is to restructure the knote structure (kernel
struct knote). Currently, when select(2), poll(2) or kevent(2)
registers an interest in read, write or exceptional conditions
(filters) on a descriptor the internal kqueue_register() function is
called which looks up the requested knote based on the file descriptor
and the filter type.
struct knote {
SLIST_ENTRY(knote) kn_link; /* for fd */
TAILQ_ENTRY(knote) kn_kqlink; /* for kq_knlist */
SLIST_ENTRY(knote) kn_next; /* for struct kqinfo */
TAILQ_ENTRY(knote) kn_tqe; /* for kq_head */
struct kqueue *kn_kq; /* which queue we are on */
struct kevent kn_kevent;
int kn_status;
int kn_sfflags; /* saved filter flags */
intptr_t kn_sdata; /* saved data field */
union {
struct file *p_fp; /* file data pointer */
struct proc *p_proc; /* proc pointer */
} kn_ptr;
struct filterops *kn_fop;
caddr_t kn_hook;
};
Doing things in this fashion is perfectly acceptable for the kevent(2)
interface, as read, write and exceptional conditions are logically
separated. For the poll(2), select(2) and future epoll(2)/devpoll(2)
interfaces it is problematic. For these interfaces the conditions are
more closely tied to the descriptor. The code implementing select(2)
and poll(2) at present must do a lot of extra work to return combined
filter results. The epoll(2) interface would be extremely problematic
to implement in this manner, potentially requiring full scans of all
registered knotes. The logical step here seems to be to restructure
the knote structure in such a fashion that filters are grouped based
on the file descriptor. Currently I am thinking something like the
following:
/* kqueue, select, poll, epoll, devpoll (5) */
#define KEVENT_INTERFACES 5
SLIST_HEAD(klist, knote)
struct knote {
SLIST_ENTRY(knote) kn_link; /* for fd */
TAILQ_ENTRY(knote) kn_kqlink; /* for kq_knlist */
SLIST_ENTRY(knote) kn_next; /* for struct kqinfo */
TAILQ_ENTRY(knote) kn_tqe; /* for kq_head */
struct kqueue *kn_kq;
union {
struct file *p_fp; /* file data pointer */
struct proc *p_proc; /* proc pointer */
} kn_ptr;
struct knote_data *notes[KEVENT_INTERFACES];
}
struct knote_data {
struct kevent kn_kevent;
int kn_status;
int kn_sfflags; /* saved filter flags */
intptr_t sn_sdata; /* saved data field */
struct filterops *kn_fop;
caddr_t kn_hook;
}
In this or a similar structure would allow the other implementations
(select(2)/poll(2)) to do a great deal less work, allow for the
relatively easy implementation of epoll(2) and has virtually no
consequence to the speed or robustness of the original kqueue
implementation.
***** [BOLD, READ ME, PLEASE RESPOND, ETC.]
As the rest of the work hangs off of what is described above, it would
be fantastic to hear any other thoughts/ideas/viewpoints on how to
accomplish splitting or otherwise reworking struct knote.
*****
After restructuring kevent to function as described above and updating
the existing select(2) and poll(2) implementations, I would like to
implement epoll(2) replacing the current Linux implementation as well
as /dev/poll. This would uniquely differentiate DragonFly as the only
platform to support all of the major i/o multiplexing schemes
natively.
The next most pressing issue with the kevent implementation as it
stands in DragonFly (and the other BSD's as well) is one of
robustness. The beauty of kevent is that it is stateful and stateful
means fast. The drawback of this is that there are many lists and
queues that must be precisely managed in order for things not to go
awry. This is evidenced by, for example, the ums usb mouse detach bug:
http://bugs.dragonflybsd.org/issue1873 As it stands, all kqueue/kevent
implementations suffer from this problem.
The most reasonable solution, apart from going back to a stateless
approach like the former kernel poll support, is to limit the direct
list handling as much as possible and force kqueue filter implementers
to do the right thing where possible. I would propose to extend devfs
with a new API for filter implementers that manages most aspects of
device bring up, teardown and knote management for all devices which
feature a devfs device node. Reference:
http://leaf.dragonflybsd.org/mailarchive/kernel/2011-02/msg00006.html
As a next step I would like to take this approach of centralized
control of the lists and data structures even further by extending the
setup and teardown API's offered to the filters. As a part of this I
foresee making the various structures used by kqueue be allocated and
fully managed by the kqueue subsystem, especially the destruction of
any used objects should only be done after the kqueue subsystem is
certain that there are no more consumers. Reference:
http://leaf.dragonflybsd.org/mailarchive/kernel/2011-02/msg00008.html
There are several open bug reports that are obviously problems with
the current kqueue implementation or the select/poll wrappers. Once
the bulk of the above work is complete I believe it would be prudent
to both attempt to reproduce these bugs to see if they still exist and
attempt to fix them if at all possible. Additional testing should also
be done to ensure that no regressions were introduced and that the
epoll and /dev/poll interfaces work as-expected with software that
implements support for them. References:
http://bugs.dragonflybsd.org/issue1730
http://bugs.dragonflybsd.org/issue1998
http://bugs.dragonflybsd.org/issue2011
http://bugs.dragonflybsd.org/issue2028
Finally, I would like to document the finalized filter implementation
API and a porting guide for kernel developers, to ease bringing in
drivers from other platforms.
Best,
Sam
More information about the Kernel
mailing list