Skip to content

Classify EU Legal Scholarship

This is a set of tools created to classify text snippets from EU legal scholarship, determining whether a listed entity is the grammatical subject (agent) of a sentence or clause.

Basic Usage Description

✔️ VALIDATE Whether Listed Entity Is Subject in Sentence

The tool run_is_sentence_valance_basic() expects a text column containing a sentence or snippet and a column identifying the candidate entity. For each row, the prompt function combines these into a user message. The LLM returns 1 if the entity is the agent/subject of the sentence, or 0 if it is not.

from gptquery.tools.tool_classify_text import run_is_sentence_valance_basic
df = ...  # DataFrame with required text columns
df_out = run_is_sentence_valance_basic(df, api_key="your-openai-key")
print(df_out['is_agent'])
# [1, 0, 1, "PROCESSING_ERROR"]

📤 Input/Output Schema

Input Columns (required by the prompt function):

Column Type Description
sentence_text str The sentence or clause to classify
entity str The candidate entity to check as grammatical subject

Output Column:

  • is_agent1 if the entity is the subject/agent of the sentence, 0 if not, "PROCESSING_ERROR" if the LLM returns an unexpected value, "API_ERROR" on request failure

💾 Example DataFrame

sentence_text entity is_agent
"The Commission submitted observations on the admissibility of the request." Commission 1
"The request was submitted to the Commission by the referring court." Commission 0
"Member States must transpose the Directive by 31 December." Member States 1
(malformed/empty row) "PROCESSING_ERROR"

Note: Unlike the extraction tools, is_agent returns a scalar integer per row (not a list), and invalid LLM outputs (result not in ['1', '0']) are caught and flagged as "PROCESSING_ERROR" rather than silently coerced.