regparser.layer package¶
Submodules¶
regparser.layer.def_finders module¶
Parsers for finding a term that’s being defined within a node
-
class
regparser.layer.def_finders.
DefinitionKeyterm
(parent)[source]¶ Bases:
object
Matches definitions identified by being a first-level paragraph in a section with a specific title
-
class
regparser.layer.def_finders.
ExplicitIncludes
[source]¶ Bases:
regparser.layer.def_finders.FinderBase
Definitions can be explicitly included in the settings. For example, say that a paragraph doesn’t indicate that a certain phrase is a definition; we can define INCLUDE_DEFINITIONS_IN in our settings file, which will be checked here.
-
class
regparser.layer.def_finders.
FinderBase
[source]¶ Bases:
object
Base class for all of the definition finder classes. Defines the interface they must implement
-
class
regparser.layer.def_finders.
Ref
[source]¶ Bases:
regparser.layer.def_finders.Ref
A reference to a defined term. Keeps track of the term, where it was found and the term’s position in that node’s text
-
end
¶
-
position
¶
-
-
class
regparser.layer.def_finders.
ScopeMatch
(finder)[source]¶ Bases:
regparser.layer.def_finders.FinderBase
We know these will be definitions because the scope of the definition is spelled out. E.g. ‘for the purposes of XXX, the term YYY means’
-
class
regparser.layer.def_finders.
SmartQuotes
(stack)[source]¶ Bases:
regparser.layer.def_finders.FinderBase
Definitions indicated via smart quotes
-
class
regparser.layer.def_finders.
XMLTermMeans
(existing_refs=None)[source]¶ Bases:
regparser.layer.def_finders.FinderBase
Namespace for a matcher for e.g. ‘<E>XXX</E> means YYY’
regparser.layer.external_citations module¶
-
class
regparser.layer.external_citations.
ExternalCitationParser
(tree, **context)[source]¶ Bases:
regparser.layer.layer.Layer
External Citations are references to documents outside of eRegs. See external_types for specific types of external citations
-
shorthand
= 'external-citations'¶
-
regparser.layer.external_types module¶
Parsers for various types of external citations. Consumed by the external citation layer
-
class
regparser.layer.external_types.
CFRFinder
[source]¶ Bases:
regparser.layer.external_types.FinderBase
Code of Federal Regulations. Explicitly ignore any references within this part
-
CITE_TYPE
= 'CFR'¶
-
-
class
regparser.layer.external_types.
Cite
(cite_type, start, end, components, url)¶ Bases:
tuple
-
cite_type
¶ Alias for field number 0
-
components
¶ Alias for field number 3
-
end
¶ Alias for field number 2
-
start
¶ Alias for field number 1
-
url
¶ Alias for field number 4
-
-
class
regparser.layer.external_types.
CustomFinder
[source]¶ Bases:
regparser.layer.external_types.FinderBase
Explicitly configured citations; part of settings
-
CITE_TYPE
= 'OTHER'¶
-
-
class
regparser.layer.external_types.
FDSYSFinder
[source]¶ Bases:
object
Common parent class to Finders which generate an FDSYS url based on matching a PyParsing grammar
-
CONST_PARAMS
¶ Constant parameters we pass to the FDSYS url; a dict
-
GRAMMAR
¶ A pyparsing grammar with relevant components labeled
-
-
class
regparser.layer.external_types.
FinderBase
[source]¶ Bases:
object
Base class for all of the external citation parsers. Defines the interface they must implement.
-
CITE_TYPE
¶ A constant to represent the citations this produces.
-
-
class
regparser.layer.external_types.
PublicLawFinder
[source]¶ Bases:
regparser.layer.external_types.FDSYSFinder
,regparser.layer.external_types.FinderBase
Public Law
-
CITE_TYPE
= 'PUBLIC_LAW'¶
-
CONST_PARAMS
= {'collection': 'plaw', 'lawtype': 'public'}¶
-
GRAMMAR
= QuickSearchable:({{{{Suppress:({{WordStart 'Public'} WordEnd}) Suppress:({{WordStart 'Law'} WordEnd})} W:(0123...)} Suppress:("-")} W:(0123...)})¶
-
-
class
regparser.layer.external_types.
StatutesFinder
[source]¶ Bases:
regparser.layer.external_types.FDSYSFinder
,regparser.layer.external_types.FinderBase
Statutes at large
-
CITE_TYPE
= 'STATUTES_AT_LARGE'¶
-
CONST_PARAMS
= {'collection': 'statute'}¶
-
GRAMMAR
= QuickSearchable:({{W:(0123...) Suppress:("Stat.")} W:(0123...)})¶
-
-
class
regparser.layer.external_types.
USCFinder
[source]¶ Bases:
regparser.layer.external_types.FDSYSFinder
,regparser.layer.external_types.FinderBase
U.S. Code
-
CITE_TYPE
= 'USC'¶
-
CONST_PARAMS
= {'collection': 'uscode'}¶
-
GRAMMAR
= QuickSearchable:({{{W:(0123...) "U.S.C."} Suppress:(["Chapter"])} W:(0123...)})¶
-
regparser.layer.formatting module¶
Find and abstracts formatting information from the regulation tree. In many ways, this is like a markdown parser.
-
class
regparser.layer.formatting.
Dashes
[source]¶ Bases:
regparser.layer.formatting.PlaintextFormatData
E.g. Some text some text_____
-
REGEX
= <_sre.SRE_Pattern object>¶
-
-
class
regparser.layer.formatting.
FencedData
[source]¶ Bases:
regparser.layer.formatting.PlaintextFormatData
E.g.
`note Line 1 Line 2 `
-
REGEX
= <_sre.SRE_Pattern object>¶
-
-
class
regparser.layer.formatting.
Footnotes
[source]¶ Bases:
regparser.layer.formatting.PlaintextFormatData
E.g. [^4](Contents of footnote) The footnote may also contain parens if they are escaped with a backslash
-
REGEX
= <_sre.SRE_Pattern object>¶
-
-
class
regparser.layer.formatting.
Formatting
(tree, **context)[source]¶ Bases:
regparser.layer.layer.Layer
Layer responsible for tables, subscripts, and other formatting-related information
-
shorthand
= 'formatting'¶
-
-
class
regparser.layer.formatting.
HeaderStack
[source]¶ Bases:
regparser.tree.priority_stack.PriorityStack
Used to determine Table Headers – indeed, they are complicated enough to warrant their own stack
-
class
regparser.layer.formatting.
PlaintextFormatData
[source]¶ Bases:
object
Base class for formatting information which can be derived from the plaintext of a regulation node
-
REGEX
¶ Regular expression used to find matches in the plain text
-
-
class
regparser.layer.formatting.
Subscript
[source]¶ Bases:
regparser.layer.formatting.PlaintextFormatData
E.g. a_{0}
-
REGEX
= <_sre.SRE_Pattern object>¶
-
-
class
regparser.layer.formatting.
Superscript
[source]¶ Bases:
regparser.layer.formatting.PlaintextFormatData
E.g. x^{2}
-
REGEX
= <_sre.SRE_Pattern object>¶
-
-
class
regparser.layer.formatting.
TableHeaderNode
(text, level)[source]¶ Bases:
object
Represents a cell in a table’s header
-
regparser.layer.formatting.
build_header
(xml_nodes)[source]¶ Builds a TableHeaderNode tree, with an empty root. Each node in the tree includes its colspan/rowspan
-
regparser.layer.formatting.
build_header_rowspans
(tree_root, max_height)[source]¶ The following table is an example of why we need a relatively complicated approach to setting rowspan:
|R1C1 |R1C2 | |R2C1|R2C2|R2C3 |R2C4 | | | |R3C1|R3C2|R3C3|R3C4|
If we set the rowspan of each node to:
max_height - node.height() - node.level + 1
R1C1 will end up with a rowspan of 2 instead of 1, because of difficulties handling the implicit rowspans for R2C1 and R2C2.
Instead, we generate a list of the paths to each leaf and then set rowspan based on that.
Rowspan for leaves is
max_height - node.height() - node.level + 1
, and for root is simply 1. Other nodes’ rowspans are set to the level of the node after them minus their own level.
-
regparser.layer.formatting.
node_to_table_xml_els
(node)[source]¶ Search in a few places for GPOTABLE xml elements
-
regparser.layer.formatting.
table_xml_to_data
(xml_node)[source]¶ Construct a data structure of the table data. We provide a different structure than the native XML as the XML encodes too much logic. This structure can be used to generate semi-complex tables which could not be generated from the markdown above
regparser.layer.graphics module¶
regparser.layer.internal_citations module¶
-
class
regparser.layer.internal_citations.
InternalCitationParser
(tree, cfr_title, **context)[source]¶ Bases:
regparser.layer.layer.Layer
-
parse
(text, label, title=None)[source]¶ Parse the provided text, pulling out all the internal (self-referential) citations.
-
remove_missing_citations
(citations, text)[source]¶ Remove any citations to labels we have not seen before (i.e. those collected in the pre_processing stage)
-
shorthand
= 'internal-citations'¶
-
regparser.layer.interpretations module¶
regparser.layer.key_terms module¶
regparser.layer.layer module¶
-
class
regparser.layer.layer.
Layer
(tree, **context)[source]¶ Bases:
object
Base class for all of the Layer generators. Defines the interface they must implement
-
static
convert_to_search_replace
(matches, text, start_fn, end_fn)[source]¶ We’ll often have a bunch of text matches based on offsets. To use the “search-replace” encoding (which is a bit more resilient to minor variations in text), we need to convert these offsets into “locations” – i.e. of all of the instances of a string in this text, which should be matched. Yields SearchReplace tuples
-
process
(node)[source]¶ Construct the element of the layer relevant to processing the given node, so it returns (pargraph_id, layer_content) or None if there is no relevant information.
-
shorthand
¶ Unique identifier for this layer
-
static
regparser.layer.meta module¶
-
class
regparser.layer.meta.
Meta
(tree, cfr_title, version, **context)[source]¶ Bases:
regparser.layer.layer.Layer
-
process
(node)[source]¶ If this is the root element, add some ‘meta’ information about this regulation, including its cfr title, effective date, and any configured info
-
shorthand
= 'meta'¶
-
regparser.layer.model_forms_text module¶
regparser.layer.paragraph_markers module¶
-
class
regparser.layer.paragraph_markers.
ParagraphMarkers
(tree, **context)[source]¶ Bases:
regparser.layer.layer.Layer
-
shorthand
= 'paragraph-markers'¶
-
regparser.layer.scope_finder module¶
regparser.layer.section_by_section module¶
regparser.layer.table_of_contents module¶
-
class
regparser.layer.table_of_contents.
TableOfContentsLayer
(tree, **context)[source]¶ Bases:
regparser.layer.layer.Layer
-
check_toc_candidacy
(node)[source]¶ To be eligible to contain a table of contents, all of a node’s children must have a title element. If one of the children is an empty subpart, we check all it’s children.
-
process
(node)[source]¶ Create a table of contents for this node, if it’s eligible. We ignore subparts.
-
shorthand
= 'toc'¶
-
regparser.layer.terms module¶
-
class
regparser.layer.terms.
Inflected
(singular, plural)¶ Bases:
tuple
-
plural
¶ Alias for field number 1
-
singular
¶ Alias for field number 0
-
-
class
regparser.layer.terms.
ParentStack
[source]¶ Bases:
regparser.tree.priority_stack.PriorityStack
Used to keep track of the parents while processing nodes to find terms. This is needed as the definition may need to find its scope in parents.
-
class
regparser.layer.terms.
Terms
(*args, **kwargs)[source]¶ Bases:
regparser.layer.layer.Layer
-
ENDS_WITH_WORDCHAR
= <_sre.SRE_Pattern object>¶
-
STARTS_WITH_WORDCHAR
= <_sre.SRE_Pattern object>¶
-
applicable_terms
(label)[source]¶ Find all terms that might be applicable to nodes with this label. Note that we don’t have to deal with subparts as subpart_scope simply applies the definition to all sections in a subpart
-
calculate_offsets
(text, applicable_terms, exclusions=None, inclusions=None)[source]¶ Search for defined terms in this text, including singular and plural forms of these terms, with a preference for all larger (i.e. containing) terms.
-
excluded_offsets
(node)[source]¶ We explicitly exclude certain chunks of text (for example, words we are defining shouldn’t have links appear within the defined term.) More will be added in the future
-
ignored_offsets
(cfr_part, text)[source]¶ Return a list of offsets corresponding to the presence of an “ignored” phrase in the text
-
is_exclusion
(term, node)[source]¶ Some definitions are exceptions/exclusions of a previously defined term. At the moment, we do not want to include these as they would replace previous (correct) definitions. We also remove terms which are inside an instance of the IGNORE_DEFINITIONS_IN setting
-
look_for_defs
(node, stack=None)[source]¶ Check a node and recursively check its children for terms which are being defined. Add these definitions to self.scoped_terms.
-
pre_process
()[source]¶ Step through every node in the tree, finding definitions. Also keep track of which subpart we are in. Finally, document all defined terms.
-
process
(node)[source]¶ Determine which (if any) definitions would apply to this node, then find if any of those terms appear in this node
-
shorthand
= u'terms'¶
-