Module sumeh.services.utils
¶
SchemaDef = Dict[str, Any]
module-attribute
¶
__compare_schemas(actual, expected)
¶
Compare two lists of schema definitions and identify discrepancies.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actual
|
List[SchemaDef]
|
The list of actual schema definitions. |
required |
expected
|
List[SchemaDef]
|
The list of expected schema definitions. |
required |
Returns:
Type | Description |
---|---|
bool
|
Tuple[bool, List[Tuple[str, str]]]: A tuple where the first element is a boolean indicating |
List[Tuple[str, str]]
|
whether the schemas match (True if they match, False otherwise), and the second element |
Tuple[bool, List[Tuple[str, str]]]
|
is a list of tuples describing the discrepancies. Each tuple contains: - The field name (str). - A description of the discrepancy (str), such as "missing", "type mismatch", "nullable but expected non-nullable", or "extra column". |
Notes
- A field is considered "missing" if it exists in the expected schema but not in the actual schema.
- A "type mismatch" occurs if the data type of a field in the actual schema does not match the expected data type.
- A field is considered "nullable but expected non-nullable" if it is nullable in the actual schema but not nullable in the expected schema.
- An "extra column" is a field that exists in the actual schema but not in the expected schema.
Source code in sumeh/services/utils.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
__convert_value(value)
¶
Converts the provided value to the appropriate type (date, float, or int).
Depending on the format of the input value, it will be converted to a datetime object, a floating-point number (float), or an integer (int).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
str
|
The value to be converted, represented as a string. |
required |
Returns:
Type | Description |
---|---|
Union[datetime, float, int]: The converted value, which can be a datetime object, float, or int. |
Raises:
Type | Description |
---|---|
ValueError
|
If the value does not match an expected format. |
Source code in sumeh/services/utils.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
__extract_params(rule)
¶
Source code in sumeh/services/utils.py
38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
__parse_databricks_uri(uri)
¶
Parses a Databricks URI into its catalog, schema, and table components.
The URI is expected to follow the format protocol://catalog.schema.table
or
protocol://schema.table
. If the catalog is not provided, it will be set to None
.
If the schema is not provided, the current database from the active Spark session
will be used.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uri
|
str
|
The Databricks URI to parse. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Optional[str]]
|
Dict[str, Optional[str]]: A dictionary containing the parsed components:
- "catalog" (Optional[str]): The catalog name, or |
Source code in sumeh/services/utils.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
__transform_date_format_in_pattern(date_format)
¶
Source code in sumeh/services/utils.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|