| | |
- is_content_word(token: spacy.tokens.token.Token) -> bool
- This function checks if a token is a content word: Substantive, verb, adverb or adjective.
Parameters:
token(Token): A Spacy token to analyze.
Returns:
bool: True or false.
- is_word(token: spacy.tokens.token.Token) -> bool
- This function checks if a token is a word. All characters will be alphabetic.
Parameters:
token(Token): A Spacy token to analyze.
Returns:
bool: True or false.
- split_doc_into_sentences(doc: spacy.tokens.doc.Doc) -> List[spacy.tokens.span.Span]
- This function splits a text into sentences.
Parameters:
text(str): The text to be split into sentences.
Returns:
List[Span]: A list of sentences represented by spacy spans.
- split_text_into_paragraphs(text: str) -> List[str]
- This function splits a text into paragraphs. It assumes paragraphs are separated by two line breaks.
Parameters:
text(str): The text to be split into paragraphs.
Returns:
List[str]: A list of paragraphs.
- split_text_into_sentences(text: str, language: str = 'es') -> List[str]
- This function splits a text into sentences.
Parameters:
text(str): The text to be split into sentences.
language(str): The language of the text.
Returns:
List[str]: A list of sentences.
|