Software Freedom Law Center

root/tags/went-live-on-2006-01-20/stet-spec.txt

Revision 11, 21.1 kB (checked in by orion, 3 years ago)


Line 
1 Specification for stet, a system for public comment on gplv3
2 Orion Montoya <orion@diderot.uchicago.edu>
3 $Id: stet-spec.txt 46 2006-01-05 17:29:26Z orion $
4
5
6 O V E R V I E W
7
8 This software is being developed to facilitate public comment on
9 drafts of the GPLv3, with the goal of creating as much consensus as
10 possible.  The goals of the GPLv3 process are outside the scope of
11 this document, but this software will allow those goals to be met, by
12 allowing all voices and viewpoints to be heard and considered.
13
14 The system is named "stet", after the proofreader's mark meaning "let
15 it stand as it is".  Stet is collaborative document revision system,
16 getting its users and commenters to the point where they can say "let
17 it stand as it is" about the whole document.
18
19 The present document is being developed by Orion Montoya after
20 discussions with Eben Moglen and Bradley Kuhn of the Software Freedom
21 Law Center.
22
23 BASIC BEHAVIOR
24
25 Three interfaces:
26  - PUBLIC comment-submission, querying and tracking (Perl/CGI)
27  - VOLUNTEER comment triage/categorization/description (Perl/CGI)
28  - DRAFTERS editing, implicit comment querying & overlaid
29    coloring/styling for different criteria (Perl/CGI/elisp)
30
31 PUBLIC USERS will select specific sections of the license to comment
32 on, and these comments will remain tied to the license across
33 revisions of the document.  Public users will receive notification
34 upon changes to sections they've commented on, and will also be able
35 to perform and syndicate arbitrary queries to keep track of sections
36 or issues that concern them.  Users will also indicate their level of
37 agreement with existing comments, as an aid to volunteer triage.
38
39 VOLUNTEERS will manage incoming comments: group individual comments by
40 common areas of concern, apply predefined issue categories to these
41 groups, and indicate higher-level relationships among comments and
42 groups.
43
44 DRAFTERS will read the aggregated comments processed by volunteers,
45 record notes on these comments, and will eventually make changes to
46 individual sections.  These changes will modify the master text
47 document, preserving past revisions within the text itself, and
48 maintaining consistency with the pointers to the previous revisions.
49
50 COMPONENTS
51
52 The development timeframe for this system is fairly short, so it is
53 essential to use existing packages to manage as much as possible, and
54 avoid duplicating any effort.  The specifics of these components are
55 still subject to revision and change, but various things are more or less
56 certain:
57 - obviously: Apache, GNU/Linux, Free Software
58 - certain: subversion
59 - highly likely: libxml, perl xml modules, emacs psgml/psgmlx/nxml,
60    mysql/postgres
61 - basic idea: Request Tracker or other coherent bug tracker with
62    custom fields and a usable API.  It is almost certain that this
63    will need to store its data in SQL.
64
65 ON METADATA RELATIONSHIPS
66
67 Volunteers will "indicate higher-level relationships among comments
68 and groups": a limited but exhaustive set of these will be set out
69 beforehand, but additional categories/relationships will be added if
70 they prove necessary.  It will be preferable to keep these under
71 control, but as descriptive as is reasonable.  Example relationships:
72
73 GPLv2, section 7, paragraph 1, sentence 2:
74
75    If you cannot distribute so as to satisfy simultaneously your
76    obligations under this License and any other pertinent obligations,
77    then as a consequence you may not distribute the Program at all.
78
79 Some global metadata properties that could apply to this sentence include:
80
81 Resolving Conflicting Obligations
82 Outside Conditions
83 Circumstances Precluding Distribution
84 Other Licenses:General
85
86 and from the enclosing section/preceding sentences:
87
88 Consequences Of Court Judgments
89 Consequences Of Outside Agreements
90 Patent Licensing
91 Outside Conditions
92
93 These properties may apply to multiple sections:
94
95 GPLv2 Sections concerning price, charge, royalty or fee include:
96  - Preamble, paragraph 2 sentence 1.
97  - Preamble p2 s2
98  - Preamble p4 s1
99  - Preamble p7 s3
100  - Section 1 p2 s1
101  - Section 2b p1 s1
102  - Section 3b p1 s1
103  - Section 3c p1 s1
104  - Section 7 p1 s3
105  - Section 11 p1 s1
106
107 A query for the broad category of "money" would return
108 comments/revisions on all of these sections, as well as any additional
109 comments that had been tagged by volunteers as concerning money.
110 (Alternatively, comments on these sections that do not concern money
111 -- as judged by volunteers -- might not be returned).
112
113 I N T E R F A C E S
114
115 PUBLIC CGI INTERFACE
116
117 Public users will be required to create a username, for authentication
118 and notification puposes.  They will initially browse a simple,
119 formatted view of the License.  They will highlight a section of text
120 for comment, and javascript will use the XPath and the value of the
121 selection to generate a persistent path to the selection in question.
122 Users will then enter the text of their comment, selecting additional
123 options/attributes as appropriate:
124
125 - Notify on change? (default yes; this may not be optional: if you
126 don't want to hear about revisions, maybe you aren't really interested
127 in the issue)
128 - Related issues (this is mostly a volunteer task, but if the
129 submitter wants to help set default values, or bring volunteer
130 attention to particular relationships, allowing user selection of
131 issues may be useful)
132 - regional scope
133
134 This user-submitted information, as well as various tracking metadata
135 (date/time/IP/current subversion revision number) will be stored in an
136 issue tracker, where it will be processed by volunteers.  The XPath
137 and the highlighted selection will uniquely identify the text that is
138 the subject of the comment; if the selection is too short to be unique
139 within the xpath, the user should receive an error that asks them to
140 select more text.
141
142 Public users will also be able to view/aggregate/query comments as
143 discussed in QUERIES and in COMMON INTERFACE ELEMENTS below.
144
145 VOLUNTEER CGI INTERFACE
146
147 Trained volunteers will view (or query) a queue of new comments, which
148 they will categorize according to relevance, urgency and uniqueness --
149 grouping and summarizing comments that express the same idea -- and
150 assign them to the various issue metadata categories.
151
152 The volunteer training will focus on correctly triaging incoming
153 comments: exhaustively assigning related issues, while not assigning
154 things to irrelevant categories.
155
156 The interface will be substantially the same as any bug/issue tracking
157 interface, with each comment existing as a single bug/issue with
158 additional custom subfields
159
160 The volunteers will determine when, for example, 300 comments have
161 come in saying essentially the same thing about the same
162 issue/section, so that the drafters may see these comments in the
163 aggregate rather than reading 300 separate notes.
164
165 Volunteer users will also be able to view/aggregate/query comments as
166 discussed in QUERIES and in COMMON INTERFACE ELEMENTS below.
167
168 In a conversation on 2005-10-12, Dan Ravicher made a good case for
169 using Slashdot-style moderation and agreement rather than volunteers.
170 Orion's concern with volunteers is that they might not be numerous or
171 active enough to handle all the incoming comments; with community
172 moderation, trained volunteers would still have a role --
173 e.g. rescuing good comments that had been wrongly modded down -- but
174 the whole system would not be dependent on them in the way that it is
175 currently specified.  In this system, users would be able to limit the
176 proliferation of comments by simply saying "I agree/disagree" and
177 optionally "I wish to expand on this comment."  This would be more
178 threaded-discussion style commenting, which should also be more
179 manageable generally.
180
181 DRAFTERS' EMACS MODE
182
183 Once the initial license draft has been written and encoded, all
184 further revisions of the license will be done in emacs, in a
185 project-specific "stet-mode" that will handle the necessary markup and
186 validity tasks for the drafters.  This mode will be built atop
187 existing xml modes; most likely psgmlx, which appears to have the best
188 support for XPath and XPointer.
189
190 Changes made to the License will be recorded, to the extent
191 practicable, in the document's XML format itself.  When discussion on
192 a section is judged to be settled, it will be possible to 'lock' that
193 section, to discourage or disable future comments -- but this should
194 come in only near the end of the process.
195
196 stet-mode functions will include:
197
198 - stet-remove-region: marks the selected region for deletion, by putting
199 it within xml tags that indicate that it is deprecated from the
200 current version.
201
202 - stet-add-sentence: creates a new empty sentence container with an
203   id= attribute, as well as other housekeeping attributes: date-added,
204   creator.  Issue-tracker metadata -- e.g. the reason for a change --
205   will be entered when the changes are committed, but may also be
206   entered here.
207
208 - stet-display [arg]
209 - stet-hide [arg]
210 This pair of functions toggles the display of various types of
211 metadata: deletions, insertions, hot topic notifiers,
212 comment-breakdowns, etc.
213
214 - stet-query-display-comments: creates a minibuffer to craft a query
215   to display an arbitrary selection of sections or comments, as
216   discussed below in QUERIES:
217
218 - stet-commit-changes: executes callbacks to ensure document validity,
219   section-id consistency; requests any needed drafter input; commits
220   changes to svn repository
221
222 - stet-comment-on-region: grabs the XPath and selection, and submits
223   the same sort of information as described for public users and in
224   NON-CGI SUBMISSIONS, with additional options for setting issue
225   metadata and skipping volunteer triage.
226
227 - stet-note-on-comment: adds drafters' notes on a given comment or
228   comment-group; this is presumably stored in the issue tracker's
229   database.
230
231 QUERIES
232
233 Sections and their related issues are one plane of metadata that must
234 be queryable, but there will be also be several other planes of
235 information available to limit queries:
236
237 [Date|reason] for any change in [status|content]
238 [list|aggregate number] of [comments|changes] on [issue|section|same opinion]
239      [in range of time|in range of revisions|by user]
240
241 Types of issues:
242  - Scope of [comment|section] (region, copyright law, patents,
243    distribution, license compatibility)
244  - Clarifications/loopholes
245  - Events (primarily meetings/conferences; these are aggregations of
246     issues to be discussed )
247  - show comments from [time range|user|region] concerning [section|issue|event]
248  - define hot topics: count above [threshold]; [time-range]; [limit]
249  - etc. [ *discuss* or request ]
250
251 To prevent excessive proliferation or disjunction of metadata
252 properties, new properties should be added only after a reasonable
253 amount of human coordination.  But the process of adding will itself
254 be fairly fast and easy, requiring just the name of the property and a
255 changelog-style reason for the addition.  Drafters and volunteers will
256 receive a daily notification if new properties have been added -- this
257 will both ensure oversight, and keep the concerned parties informed as
258 to what properties are available to them.
259
260 The system will be built upon an existing issue/bug tracker to avoid
261 reimplementing existing software, but depending on the overhead for
262 querying the issue database through the bug tracker's API, it may be
263 necessary instead to query the underlying database; for this reason
264 the chosen bug tracker should store its data in an SQL database.
265
266 The underlying query system must return its values as a data structure
267 that may be formatted into arbitrary output formats.  Public users
268 will see results in HTML or as the results of an XmlHttpRequest;
269 Volunteers may interact directly with the issue tracker, or may use a
270 custom CGI interface.  Drafters will see their results in an emacs
271 mode.
272
273 IMPLICIT QUERIES
274
275 For the default interface to be functional, a number of QUERIES must
276 take place in the background when the document is initally loaded.
277 This is particularly important for the drafters, who will have
278 up-to-date display of visual notifications whenever they load the
279 document.  For the public interface, on the other hand, it may be
280 necessary to cache these, if complicated queries impede performance.
281 See SLASHDOT NOTES.
282
283 COMMON INTERFACE ELEMENTS
284
285 - hot topic notifiers: when a particular section has recieved a number
286   of comments (above a given threshold) in a given period of time
287   (default: past n days/week), there will be a visual indication of
288   this, for example in Emacs, possibly on the modeline, but possibly
289   inline in the text:
290
291     [!!!] If, as a consequence of a court judgment or allegation ...
292
293 - general comment views: in addition to hot topics, comments must be
294   browseable by license section and by issue.  These will essentially
295   be returned by queries as described above.
296
297 - toggle of styled display of XML "invisible ink": deletions,
298   additions, revision history.
299
300 Upon a given comment area, the user may toggle a view of the comment
301 breakdown on that area, with the most frequently-occurring opinions first:
302
303     [!!!] If, as a consequence of a court judgment or allegation of infringement...
304       |
305       |-> 25 comments: "What if the allegation is spurious/abusive?" [actions]
306       |-> 10 comments: "I should be able to distribute illegally if I want, as long as
307       |                      I comply with the GPL and assume any liability" [actions]
308       |-> 2 comments: "If such a judgment is made, no one should distribute any GPL
309                                    code of any kind ever" [actions]
310
311 the "[actions]" slug will offer various kinds of actions/status
312 changes that the drafter may take in response to the comment:
313 - this revision should fix -- [ping|don't ping] commenters
314 - this is necessary for compatibility with [choose] -- [ping|don't ping] commenters
315 - this is the intended/implicit behavior -- [ping|don't ping] commenters
316 - this opinion is a troll/shill -- ignore, [don't ping|ping] commenters
317 - this is on hold pending resolution of [section|issue] -- [don't ping|ping] commenters
318 - etc [*discuss* or request]
319
320 PUTTING IT ALL TOGETHER
321
322 As a consequence of all this, if someone asks a drafter about a money
323 clause in section 2b, the drafter will be able to easily find
324 yesterday's similar comment in section 3b, and he will see that the
325 money issue implicit in "noncommercial distribution" in section 3c has
326 been particularly "hot".  He will easily be able to view those
327 comments, any drafters' "changelog entries" on these sections, and
328 other drafters' notes-on-comments.  From this he will be able to
329 determine the current position on that issue, and to characterize the
330 nature of any changes or decisions not to change, etc.
331
332 NON-CGI SUBMISSIONS
333
334 For extensibility and accessiblity, comments should be submittable by
335 other means than the CGI interface.  They may arrive in batches or
336 singly; a standard format for the server request allows them to come
337 from any client that writes this format.  The server script parses the
338 input and makes entries in the issue tracker, making them ready for
339 volunteer triage (unless submitted by the drafters' emacs mode).
340
341 OUTSTANDING PRE-IMPLEMENTATION TASKS
342
343 At this stage, only a limited amount of implementation can be done
344 before we work out certain specifics: most importantly, an encoding
345 model for the license text, and the constrained list of anticipated
346 issues and concerns.  The following tasks may be worked around
347 indefinitely, but they must be completed well in advance of launch in
348 order to ensure that the system can be tested with real data.
349
350 - schema/dtd/encoding model
351
352 XML, tagging major divisions (preamble/body/applying this license),
353 sections, paragraphs, sentences.  Each of these will also have a
354 unique, human-assigned ID number, which will be persistent across
355 revisions (i.e. even if the paragraph is marked for deletion).  The
356 unique ID is provisional; if someone has a better way of keeping track
357 of changes across revisions, we should certainly adopt that instead.
358
359 - issue list
360
361 For issues/concerns, key drafters will go through the draft license
362 section-by-section, enumerating various conceptual issues that relate
363 to each clause.  See ON METADATA RELATIONSHIPS.
364
365 SLASHDOT NOTES
366
367 At some point someone will post a link to Stet on the front page of
368 Slashdot.  This will be helpful in alerting a certain portion of the
369 concerned population to the opportunity for comment, but it will also
370 put a very heavy load on the server, and subject the system as a whole
371 to extensive scrutiny.  Although the system will be built for optimal
372 performance, it would be wise, in resource-intensive subroutines, to
373 write alternate behavior for a lightweight version that would run when
374 the server is under heavy load.  Further discussion on this topic is
375 necessary.  One idea floated in a conference on 2005-10-11 was to
376 disable new user registrations under heavy load; this seems like bad
377 PR.  But we should certainly profile to find what part of the system
378 is the heaviest, and figure out ways of limiting this when under load.
379
380 DEVELOPMENT TIMELINE
381
382 Substantial portions of the functionality must be completed by the end
383 of January, 2006.  This means, at a minimum, and in roughly this order:
384
385 - initial prototype, with highlights, notes, note storage & display: 28 Oct.
386
387 - storing pointers in the issue tracker
388    in a way manageable by volunteers; (1 week) (perl, elisp, javascript)
389 - querying pointers; display & intensity highlighting (1.5 weeks) (xmlhttprequest)
390 - adding and querying relationships (1 week)
391 - user auth & comment rating/agreement (2 weeks)
392
393 - core query implementation: returning usable data structures; (2-3 weeks)
394   (sql, perl, elisp)
395 - core drafter elisp behavior: comment navigation & explicit query; notes
396    on comments (4-6 weeks) (elisp, perl)
397 - volunteer interface: customization of issue tracker; volunteer training;
398   (2-3 weeks) (meetings required) (perl, js, xslt)
399
400 [ - draft license markup/schema (core comment behavior may be
401   implemented on sample text, presumably the gplv2); (< 1 week; meeting required)
402   (xml, xslt) ONGOING, IN PROGRESS ]
403 [ - list of expected issues; their relationships to the marked-up license text.
404   (< 1 week, meeting required) (perl, legalese) ONGOING, IN PROGRESS ]
405
406 After this deadline, additional features will be implemented, probably
407 focusing on the drafters' interface first:
408
409 - stet-mode for additions and deletions; (2-3 weeks)
410 - implicit query interleaved with stet-mode; fine-tuning display;
411   (2 weeks)
412 - additional behavior for stet-mode; (4-6 weeks)
413 - non-cgi (offline) comment queuing, caching and submission; (3-5 weeks)
414 - user notification of changes (2-3 weeks)
415 - query interface for public users; alternate result formats (4-8 weeks)
416 - ongoing maintenance and improvement
417
418 Additional features discussed with Dave Turner et al. on 2005-10-11:
419 - merging/unmerging of sentences
420 - tracking alternate proposed language (to be posted in Decision
421   Statements)
422 - Translation branches and cross-branch synchronization
423
424
425 NOTES FOR SOPINSPACE
426
427 - how are the data stored?
428
429 - can we get a more abstract interface to this backend? this could
430   conceivably be good for conversation threading (I couldn't actually
431   post a comment yet) but building the comment-target stuff into this
432   would be as much of an investment as doing it any other way.
433
434 NOTES FROM EBEN TALK 2005-10-17
435
436 I have this list of issues that we're discussing; as we're navigating
437 it should pop me to the position in the document that this issue
438 concerns.
439
440 - navigating the issues in one pane, and the text follows me in another
441 - navigatiing through issues navigates through text
442 - open a little notes pane, type my note, move on (AJAX)
443 - make a link to a piece of text somewhere: a URI or a MSG ID.
444 - ideally, typing one key will let me enter/open a URI, and another
445   will let me enter an arbitrary command-line string (to open a msg
446   with provided msg id)
447
448 So Stallman asks, where did we write down (in email) our last revision
449 of the AGPL clause?  Stet should associate that message id with the
450 issue we're discussing.
451
452 The canonical GPL3 file will be plaintext (? ugh for me)
453
454 So by the end of October: a system that:
455
456  - navigates issues
457  - highlights parts of the document that must be highlighted, in
458     relation to location & spans
459  - a distinction between *highlighted* and *highlighted with notes*
460
461
462 *** simplified submission based on comment match via procmail and cgi ***
463
464 *** you can highlight the title of the document as an easteregg scratch buffer ***
465
466
467 MUSING
468
469 Stet will be released under the GPLv3 as soon as the license is
470 finalized -- so in some ways this process may be seen as a sort of
471 meta-licensing bootstrap operation: a program whose sole purpose is to
472 develop the terms for its own distribution.
473
474
475
476
477 Project database for 'Stet' created.
478
479  Customize settings for your project using the command:
480
481    trac-admin /var/trac
482
483  Don't forget, you also need to copy (or symlink) "trac/cgi-bin/trac.cgi"
484  to you web server's /cgi-bin/ directory, and then configure the server.
485
486  If you're using Apache, this config example snippet might be helpful:
487
488     Alias /trac "/wherever/you/installed/trac/htdocs/"
489     <Location "/cgi-bin/trac.cgi">
490         SetEnv TRAC_ENV "/var/trac"
491     </Location>
492
493     # You need something like this to authenticate users
494     <Location "/cgi-bin/trac.cgi/login">
495         AuthType Basic
496         AuthName "Stet"
497         AuthUserFile /somewhere/trac.htpasswd
498         Require valid-user
499     </Location>
500
501  The latest documentation can also always be found on the project website:
502  http://projects.edgewall.com/trac/
503
504 Congratulations!
505
Note: See TracBrowser for help on using the browser.

SFLC Main Page

[frdm] Support SFLC