regparser.tree.depth package¶
Submodules¶
regparser.tree.depth.derive module¶
-
class
regparser.tree.depth.derive.
ParAssignment
(typ, idx, depth)¶ Bases:
tuple
-
depth
¶ Alias for field number 2
-
idx
¶ Alias for field number 1
-
typ
¶ Alias for field number 0
-
-
class
regparser.tree.depth.derive.
Solution
(assignment, weight=1.0)[source]¶ Bases:
object
A collection of assignments + a weight for how likely this solution is (after applying heuristics)
-
regparser.tree.depth.derive.
debug_idx
(marker_list, constraints=None)[source]¶ Binary search through the markers to find the point at which derive_depths no longer works
-
regparser.tree.depth.derive.
derive_depths
(original_markers, additional_constraints=None)[source]¶ Use constraint programming to derive the paragraph depths associated with a list of paragraph markers. Additional constraints (e.g. expected marker types, etc.) can also be added. Such constraints are functions of two parameters, the constraint function (problem.addConstraint) and a list of all variables
regparser.tree.depth.heuristics module¶
Set of heuristics for trimming down the set of solutions. Each heuristic works by penalizing a solution; it’s then up to the caller to grab the solution with the least penalties.
-
regparser.tree.depth.heuristics.
prefer_diff_types_diff_levels
(solutions, weight=1.0)[source]¶ Dock solutions which have different markers appearing at the same level. This also occurs, but not often.
-
regparser.tree.depth.heuristics.
prefer_multiple_children
(solutions, weight=1.0)[source]¶ Dock solutions which have a paragraph with exactly one child. While this is possible, it’s unlikely.
regparser.tree.depth.markers module¶
Namespace for collecting the various types of markers
regparser.tree.depth.optional_rules module¶
Depth derivation has a mechanism for _optional_ rules. This module contains a collection of such rules. All functions should accept two parameters; the latter is a list of all variables in the system; the former is a function which can be used to constrain the variables. This allows us to define rules over subsets of the variables rather than all of them, should that make our constraints more useful
-
regparser.tree.depth.optional_rules.
depth_type_inverses
(constrain, all_variables)[source]¶ If paragraphs are at the same depth, they must share the same type. If paragraphs are the same type, they must share the same depth
-
regparser.tree.depth.optional_rules.
limit_paragraph_types
(*p_types)[source]¶ Constraint paragraphs to a limited set of paragraph types. This can reduce the search space if we know (for example) that the text comes from regulations and hence does not have capitalized roman numerals
-
regparser.tree.depth.optional_rules.
limit_sequence_gap
(size=0)[source]¶ We’ve loosened the rules around sequences of paragraphs so that paragraphs can be skipped. This allows arbitrary tightening of that rule, effectively allowing gaps of a limited size
-
regparser.tree.depth.optional_rules.
star_new_level
(constrain, all_variables)[source]¶ STARS should never have subparagraphs as it’d be impossible to determine where in the hierarchy these subparagraphs belong. @todo: This _probably_ should be a general rule, but there’s a test that this breaks in the interpretations. Revisit with CFPB regs
regparser.tree.depth.pair_rules module¶
Rules relating to two paragraph markers in sequence. The rules are “positive” in the sense that each allows for a particular scenario (rather than denying all other scenarios). They combine in the eponymous function, where, if any of the rules return True, we pass. Otherwise, we fail.
-
class
regparser.tree.depth.pair_rules.
MarkerAssignment
[source]¶ Bases:
regparser.tree.depth.pair_rules.MarkerAssignment
-
is_inline_stars
()[source]¶ Inline stars (* * *) often behave quite differently from both STARS and other markers.
-
-
regparser.tree.depth.pair_rules.
continuing_seq
(prev, curr)[source]¶ E.g. “d, e” is good, but “e, d” is not. We also want to allow some paragraphs to be skipped, e.g. “d, g”
-
regparser.tree.depth.pair_rules.
decreasing_stars
(prev, curr)[source]¶ Two stars in a row can exist if the second is shallower than the first
-
regparser.tree.depth.pair_rules.
decrement_depth
(prev, curr)[source]¶ Decrementing depth is okay unless we’re using inline stars
-
regparser.tree.depth.pair_rules.
marker_star_level
(prev, curr)[source]¶ Allow a marker to be followed by stars if those stars are deeper. If not inline, also allow the stars to be at the same depth
-
regparser.tree.depth.pair_rules.
markerless_same_level
(prev, curr)[source]¶ Markerless paragraphs can be followed by any type on the same level as long as that’s beginning a new sequence
-
regparser.tree.depth.pair_rules.
new_sequence
(prev, curr)[source]¶ Allow depth to be incremented if starting a new sequence
-
regparser.tree.depth.pair_rules.
pair_rules
(prev_typ, prev_idx, prev_depth, typ, idx, depth)[source]¶ Combine all of the above rules
-
regparser.tree.depth.pair_rules.
paragraph_markerless
(prev, curr)[source]¶ A non-markerless paragraph followed by a markerless paragraph can be one level deeper
regparser.tree.depth.rules module¶
Namespace for constraints on paragraph depth discovery.
For the purposes of this module a “symmetry” refers to two perfectly valid solutions to a problem whose differences are irrelevant. For example, if the distinctions between a vs. a STARS STARS may not matter if we’re planning to ignore the final STARS anyway. To “break” this symmetry, we explicitly reject one solution; this reduces the number of permutations we care about dramatically.
-
regparser.tree.depth.rules.
ancestors
(all_prev)[source]¶ Given an assignment of values, construct a list of the relevant parents, e.g. 1, i, a, ii, A gives us 1, ii, A
-
regparser.tree.depth.rules.
continue_previous_seq
(typ, idx, depth, *all_prev)[source]¶ Constrain the current marker based on all markers leading up to it
-
regparser.tree.depth.rules.
depth_type_order
(order)[source]¶ Create a function which constrains paragraphs depths to a particular type sequence. For example, we know a priori what regtext and interpretation markers’ order should be. Adding this constrain speeds up solution finding.
-
regparser.tree.depth.rules.
marker_stars_markerless_symmetry
(pprev_typ, pprev_idx, pprev_depth, prev_typ, prev_idx, prev_depth, typ, idx, depth)[source]¶ - When we have the following symmetry:
- a a a
STARS vs. STARS vs. STARS MARKERLESS MARKERLESS MARKERLESS
Prefer the middle
-
regparser.tree.depth.rules.
markerless_stars_symmetry
(pprev_typ, pprev_idx, pprev_depth, prev_typ, prev_idx, prev_depth, typ, idx, depth)[source]¶ Given MARKERLESS, STARS, MARKERLESS want to break these symmetries:
MARKERLESS MARKERLESS STARS vs. STARS MARKERLESS MARKERLESS
Here, we don’t really care about the distinction, so we’ll opt for the former.
-
regparser.tree.depth.rules.
must_be
(value)[source]¶ A constraint that the given variable must matches the value.
-
regparser.tree.depth.rules.
same_parent_same_type
(*all_vars)[source]¶ All markers in the same parent should have the same marker type. Exceptions for:
STARS, which can appear at any level Sequences which _begin_ with markerless paragraphs
-
regparser.tree.depth.rules.
star_sandwich_symmetry
(pprev_typ, pprev_idx, pprev_depth, prev_typ, prev_idx, prev_depth, typ, idx, depth)[source]¶ Symmetry breaking constraint that places STARS tag at specific depth so that the resolution of
c? ? ? ? ? ? <- Potential STARS depths 5
- can only be one of
- OR
c c STARS STARS
5 5 Stars also cannot be used to skip a level (similar to markerless sandwich, above)