Amazon Comprehend Medical vs Microsoft Text Analytics for Health: Which API is best?

In this article, I review the two leading cloud-based natural language processing services for unstructured health data, testing the services against a relatively standard clinic letter.

Matthew Stubbs

22nd Nov 2021

Some time ago I wrote an article discussing the Amazon Comprehend Medical service (ACM). At the time, this was the only publicly available API performing text-based entity analysis on unstructured medical text. However, now there's a new kid on the block - Microsoft Azure's Text Analytics for Health (TAH).

But with the arrival of the second-largest cloud provider into the medical text analytics space, I thought it would be useful to compare the usability and output of these two APIs, to help inform which service I should use when analysing unstructured data in the future.

In this article, I am going to test each service with a relatively simple clinic letter in order to assess how they differ in accuracy, structure, and pricing.

But first, so that everybody is up to speed, let's discuss the purpose of these APIs. If you're familiar with natural language processing (NLP) in healthcare, please feel free to skip ahead to the test methodology.

Natural Language Processing for Healthcare

As we all know, the healthcare sector is rife with unstructured medical data. A gold mine of healthcare insights lies within these documents structured in prose, but no individual could feasibly extract this data to the scale required to achieve valuable insights, there is also significant human error introduced when such text is manually sorted by hand.

Therefore, we need a system by which the prose text can be analysed and key medical points extracted into a useful format. An example of this would be extracting a drug name and associated dose and treatment frequency from a sentence such as you see in the below figure.

extraction_example.png

This functionality is exactly what ACM and TAH are trying to achieve. Allowing medical institutions, researchers, and clinicians to autonomously analyse large repositories of healthcare data.

Test Methodology

In order to test both APIs, I will use the same clinic letter I wrote previously when I originally reviewed ACM. I tried to make this clinic letter as realistic as possible, including as many medical entities and relations as possible to test out the service's ability to effectively process the text. I will compare the output to a model answer, including all the entities, relations, and negations (more on this later) that I would expect an NLP service to detect.

Below is the clinic letter we will use to test the APIs.

Thank you for your referral for Ms Jane Dow, a 42 year old lady, to my Endocrinology clinic today. You referred her to my clinic due to increased anxiety and swelling within her neck. You found an increased T4 level with depressed TSH.

Ms Jane Dow has been suffering with ongoing anxiety and difficulty sleeping. In addition to this, she has noted an increased sensitivity to heat with excessive perspiration. On examination of Ms Dow I noted an increased heart rate at approximately 105 bpm. There is a large nodular goitre anterolaterally to the left of her neck. It is not hot to touch and does not move on swallow. There is no exophthalmos.

Ms Dow reports that her Mother suffered from thyroid dysfunction, though she is unsure of the exact diagnosis and whether this was a hyper- or hypothyroid condition.

I will arrange for Ms Dow to have further blood tests including thyroid stimulating immunoglobulin, stimulating hormone receptor antibody and anti-thyroid peroxidase antibody. I will also arrange an ultrasound thyroid scan.

I have started Ms Dow on Propranolol 10mg TDS to control her current symptoms until a definitive diagnosis is found.

Yours Sincerely, Dr Smith

Ease of Use

Both services are relatively easy to set up, each supplying SDK's available in multiple languages. For the purposes of this review, we'll be using the Python versions of the SDK.

ACM Code

# Import AWS Boto3 SDK
import boto3

# Load clinic letter from txt file
f = open('clinic_letter.txt', 'r')
clinic_letter = f.read()

# Initiate comprehend medical client
client = boto3.client(service_name='comprehendmedical', region_name='eu-west-2')
result = client.detect_entities(Text=clinic_letter)

# Get entities from result
entities = result['Entities'];

For the ACM setup, we can use the Boto3 SDK. The code is incredibly simple and can be written in less the 6 lines. In the code above, after importing the Boto3 package, we load the raw .txt file containing the text we want to analyse.

Following this we initiate a client object, specifying the service we want to use as well as the AWS region. The client object contains all the functions related to our specified service, and detecting entities is as easy as calling the detect_entities() function on the client object.

TAH

At this point, it's important to note that there is a Python SDK available for the azure text analytics service. However, I did not use this SDK for this analysis. This is because the SDK does not return the raw JSON in the response. In order to compare the two services, I needed the responses to be in a similar format to avoid a bloating code base. I was able to get around this issue by using the REST API rather than SDK and the code below shows this approach. I will, however, include an example of using the Azure SDK in the appendix of this post.

import requests

# Load clinic letter from txt file
f = open('clinic_letter.txt', 'r')
endo_letter = f.read()

# Your azure portal details
key = "text-analysis-api-key"
endpoint = "https://your-endpoint/text/analytics/v3.1/entities/health/jobs"

# Create dictionary array for API call
documents = [{
    'id': '1', 
    'language': 'en', 
    'text': endo_letter
}]

# Custom headers and data object 
headers = {'Content-Type': 'application/json', 'Ocp-Apim-Subscription-Key': key}
data = {'documents': documents}

# Post request to perform job:
    response = requests.post(endpoint, headers=headers, data=json.dumps(data))
except requests.exceptions.RequestException as e:
    raise SystemExit(e)

# Get operation-location in returned headers
operation_location = response.headers['operation-location']

# Get request to get result of text analysis
try:
    response = requests.get(operation_location, headers=headers)
except requests.exceptions.RequestException as e:
    raise SystemExit(e)

# store response
msft_response = response.json()

Using the RESTful API requires you to make 2 API calls. The first API call is a POST request which sends the text you wish to analyse to the text analytics service. The response headers from this request contain an operation-location header that can be used in a GET request to return the result of the operation (i.e. the entities and relations).

While this is a slightly more complex process than the AWS SDK (the Azure SDK is also similarly more complex), it is still straightforward, and I would consider both services a similar level of ease of use.

Analysis of Results

In order to compare the results of each service, I have split up the analysis into 3 key categories: entity, relation, and negation analysis. Before I move on to the results, however, I will quickly review what these mean.

Entity

EntityExplanantion.png

Entities (or named entities) are predefined terms that the natural language processing model has been trained to recognise. In this context, entities include things such as medications, symptoms, medical conditions, etc.

Relation

RelationExplanation.png

A relation refers to the extraction of text that is semantically related to the entity (although in some contexts, relations are considered entities themselves). These are often detected by recognising common patterns with the text, rather than just detecting the particular entity. As you can see, relations are related to an entity, and in this context can include doses, frequencies, locations (anatomical or otherwise), etc.

Negation

NegationExplanation.png

A negation is similar to a relation but has an important function in negating the entity, i.e. someone does not do something. This is a very important area within medicine, as doctors often quote 'important negatives', i.e. symptoms/signs that patients do not have, in order to rule out certain conditions.

Now that we understand what the key criteria for this analysis are, let's look at how the API's performed when assessing the sample clinic letter.

Entity Extraction Comparison

AWSvsMSFT (5).png

At first look, it appears that ACM performs better than TAH at entity extraction. However, on a closer look at the output, this difference in the two services was driven primarily by the fact that Microsoft's solution does not detect people's names within the text (in this case, 'Jane Dow' ). When we correct for this, i.e. removing name entity detection from the model answer, we find that TAH performs better than ACM on entity detection.

You can argue whether name entity detection should be part of a medical natural language processing service. Personally, I think it is useful to have this functionality straight out of the box when handling medical data. By including this functionality you have the ability to remove medically identifiable information from the text. You can also do this with other azure APIs, and could perhaps run some pre-processing before analysing the text with TAH (it is also possible that this is an optional parameter in the TAH service, though if it is I haven't found it). However, I think having the ability to do this from one API call, as is the case in ACM, is advantageous.

Though considering it isn't strictly a medical entity, we will consider the scores with the 'name' entity removed from the model answer.

Below you can see the entities (not including names) that each of the services missed.

Missed_entities.png

ACM performed very well at detecting strictly medical entities, not missing any significant entities relating to the patient's condition. However, where it seemed to perform more poorly was in detecting more general medical information. Important entities, such as the sex of the individual, as well as the clinic setting and time of the clinic appointment, were missed. While ACM did detect '42' as an age, it failed to detect if this was 42 days, weeks, years, etc. This is an important entity to detect as to not do so makes the ages detected from the text relatively unusable unless you can guarantee all ages will be referred to in years (which is far from a guarantee in a healthcare setting). I did test the response to changing the age unit to months and weeks, and ACM also failed to detect the unit in these circumstances also.

While TAH performed better overall, it failed to detect some key medical data within the text. While it was able to detect the term 'perspiration', this fails to understand the actual symptom is excessive perspiration, which ACM was able to detect this point. It also struggled to detect the hyper- or hypothyroid entity, detecting only the latter, and not the former.

Interestingly, there was no cross over of the missed entities within the text, each service having there own medical blind spots within the text.

Relationship Extraction Comparison

Visual Chart Page Iteration 1.png

When we look at the relationships detected with the text, ACM performs more effectively. Analysis of the missed relations reveals some interesting differences between the two services. Critically, despite TAH detecting 'Mother' as a family relation, it fails to detect the fact that the thyroid dysfunction is related to the Mother and not Ms Jane Dow.

ACM misses both 'Mother' as an entity or relation. Considering family history is a very important part of medical history, I found this quite surprising oversight within the service.

Missed_relations.png

In addition to the missed relation related to the family history, both services missed several relations of size and position, as can be seen in the text above.

Importantly, TAH was able to detect both 'anterolaterally' and 'left' within the text as an entity but failed to detect the relationship to the patient's goitre. Unfortunately, without this relation, these entities are useless with regards to useful analysis, as all relations in the text have been lost.

Missed Negations

AWSvsMSFT (1080 x 1080 px) (7).png

Both services performed very poorly in relation to negations, ACM being the only service to detect any negations, and in turn, only detecting one of three negations.

Negations are incredibly important to detect when analysing medical text. Important negatives (i.e. a patient not having a sign or symptom), are often used to rule out certain conditions. Therefore, if you fail to detect the negation of a symptom or sign within the text, your assumptions from the text will be completely inaccurate, as you may be returning signs and symptoms that were actually used as an important negative to inform another practitioner that this has been ruled out.

Missed_negations.png

As you can see from the missed negation above, TAH misses all three negations in the text, with ACM only detecting a negation in reference to exophthalmos.

This is an area both services definitely need to work on if they wish to be used for reliable and accurate medical text analysis.

API Response Structure

Both services return significantly different structures in the response from the API, highlighting the different approaches and thought processes of the teams behind the service. Please note, in the JSON structures below I have removed some keys and included only one entity and associated relations to make the structure more readable.

The response in the below JSON examples refers to the 'increased heart rate' phrase within the clinic letter.

ACM Structure

{
    "Entities": [
        {
            "Id": 35,
            "BeginOffset": 459,
            "EndOffset": 469,
            "Score": 0.9953954815864563,
            "Text": "heart rate",
            "Category": "TEST_TREATMENT_PROCEDURE",
            "Type": "TEST_NAME",
            "Traits": [],
            "Attributes": [
                {
                    "Type": "TEST_VALUE",
                    "Score": 0.3008654713630676,
                    "RelationshipScore": 0.9963931441307068,
                    "RelationshipType": "TEST_VALUE",
                    "Id": 34,
                    "BeginOffset": 449,
                    "EndOffset": 458,
                    "Text": "increased",
                    "Category": "TEST_TREATMENT_PROCEDURE",
                    "Traits": []
                },
                {
                    "Type": "TEST_VALUE",
                    "Score": 0.9547999501228333,
                    "RelationshipScore": 0.999314546585083,
                    "RelationshipType": "TEST_VALUE",
                    "Id": 36,
                    "BeginOffset": 487,
                    "EndOffset": 490,
                    "Text": "105",
                    "Category": "TEST_TREATMENT_PROCEDURE",
                    "Traits": []
                },
                {
                    "Type": "TEST_UNIT",
                    "Score": 0.995649516582489,
                    "RelationshipScore": 0.9962669014930725,
                    "RelationshipType": "TEST_UNIT",
                    "Id": 37,
                    "BeginOffset": 491,
                    "EndOffset": 494,
                    "Text": "bpm",
                    "Category": "TEST_TREATMENT_PROCEDURE",
                    "Traits": []
                }
            ]
        }
    ]
}

The ACM response has a nested structure, returning an array of entities with the relations placed within an attributes array within each entity. There are all the expected keys, including the offset of the entity/relation within the text, the score (i.e. the certainty of the returned entity/relation), as well as the category and type of the entity/relation - the definitions of which can be found on the documentation page for ACM.

TAH Response

{
    "entities": [
        {
            "offset": 449,
            "length": 9,
            "text": "increased",
            "category": "MeasurementValue",
            "confidenceScore": 0.68
        },
        {
            "offset": 459,
            "length": 10,
            "text": "heart rate",
            "category": "ExaminationName",
            "confidenceScore": 0.79,
            "name": "heart rate",
            "links": [
                {
                    "dataSource": "UMLS",
                    "id": "C0018810"
                },
                {
                    "dataSource": "AOD",
                    "id": "0000002504"
                },
                {
                    "dataSource": "CHV",
                    "id": "0000005884"
                },
                {
                    "dataSource": "CSP",
                    "id": "1394-2702"
                }
            ]
        }
    ],
    "relations": [
        {
            "relationType": "ValueOfExamination",
            "entities": [
                {
                    "ref": "#/results/documents/0/entities/20",
                    "role": "Value"
                },
                {
                    "ref": "#/results/documents/0/entities/21",
                    "role": "Examination"
                }
            ]
        }
    ]
}

TAH uses a different format for its response, returning all detected entities (which includes the entity relations) within an entity array. The entity relations are then held within a separate array that contains the relationship between individual entities, which can be extracted as the index of the entity array (i.e. by using the figure at the end of the reference string).

It's important to note that using the SDK the returned response is not a JSON and may be different to this structure, but that is outside the scope for this review.

Which Structure is Best?

With regards to which structure is best, it really comes down to your personal preference.

Personally, I prefer to work with the ACM structure. I find the relation values being nested within the attributes key of each entity to be more intuitive and easier to map out the relations. The ACM structure also has an additional classification tab, having both a 'category' and a 'type' key. I haven't worked with these 2 API's to truly understand the impact of this, so I can't be sure if this is unnecessary over-complication or a useful extra level of granularity.

If you prefer to work with a less nested structure, or if you like to have both the entities and relations within the same array, then the TAH structure is best to work with. The TAH structure also includes a links array, which will return you various clinical codes related to the entity detected. This is very useful for further analysis and for the purposes of clinical coding. This is also available for ACM, however, it involves a slightly different API call.

Cost of Services

Amazon Comprehend Medical
Medical Named Entity and Relationship Extraction (NERe) API $0.01
Medical Protected Health Information Data Extraction and Identification (PHId) API $0.0014
Medical ICD-10-CM Ontology Linking API $0.0005
Medical RxNORM Ontology Linking API $0.00025
Text Analytics for Health
Text analytics for health $100 per 1,000 text records

A key differentiator between ACM and TAH is the price. Importantly, for the amateur data analyst, or an individual not wishing to analyse many records, ACM offers a free tier covering 25k units of text (or 2.5M characters) for the first 3 months from starting to use the service. Currently, TAM does not offer a free tier for medical text analysis.

In the paid tiers for each API, you pay for what you use. In the case of ACM, there are multiple API endpoints that can be used, each with differing levels of analysis. ACM provides two APIs: Medical Named Entity and Relationship Extraction (NERe) and Protected Health Information Data Extraction and Identification (PHId). For most use cases you will use the former, however, if your use case is just to remove protected health information, you can use the cheaper PHId API. NERe is charged at $0.01 per unit, a unit being 100 characters. If you send less than 1 unit to the API, you will still be charged for using a full unit.

TAH is charged at $100 per 1000 text records, a text record refers to 1000 characters. If you send fewer than 1000 characters, you will be charged for one full-text record.

3.png

This means that both services are charged at $0.01 per 100 characters. However, the difference in unit sizes could have a significant impact on your costs if using smaller file sizes. For example, if I submit a 300 character document to ACM, I will be charged $0.03 (i.e. 3 ACM pricing units), however, if I submit the same document to TAH, I will be charged $0.10 (i.e. 1 full TAM pricing unit). In the worst case, i.e. submitting a document of 100 characters, you would be charged 10x more using TAH than ACM (shown in the graph).

Notwithstanding the generous free tier from ACM, its more granular pricing makes it the far more cost-effective option in terms of pricing.

Conclusion

It is clear that both services provide a very good solution for analysing medical unstructured data. The accuracy of both ~80%+ in entity and relation extraction, with TAH performing better with regards to entities and ACM performing better in relation extraction. However, both performed poorly with regards to detecting negation of the entities, with ACM detecting only one, and TAH missing all negations within the text.

Both have a sensible structure in response object and I don't see there being a significant differentiator in the services in this regard.

The main differentiator between the two services is the cost. While on a cost per character basis the services are exactly the same ($0.01 / 100 characters), due to the different sizes in the pricing units, TAH can end up significantly more expensive than ACM. ACM also has a generous free tier for the first 3 months of using the service.

It is for this reason that I will continue to use ACM for medical text analysis, being of a comparable level of accuracy, but at a potentially much lower cost.

Next Steps

With regards to the next steps, I would like to take the testing of these two services further by using more complex input. I intend to include more medical abbreviations for medical tests and conditions, as well as introducing further genetic and pharmacological relations between the entities.

Appendix

ACM Output

AWS_output.png

TAH Output

TAH_output.png

TAH SDK Example

key = "paste-your-key-here"
endpoint = "paste-your-endpoint-here"

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

# Authenticate the client using your key and endpoint 
def authenticate_client():
    ta_credential = AzureKeyCredential(key)
    text_analytics_client = TextAnalyticsClient(
            endpoint=endpoint, 
            credential=ta_credential)
    return text_analytics_client

client = authenticate_client()

# Example function for extracting information from healthcare-related text 
def health_example(client):
    documents = [
        """
        Patient needs to take 50 mg of ibuprofen.
        """
    ]

    poller = client.begin_analyze_healthcare_entities(documents)
    result = poller.result()

    docs = [doc for doc in result if not doc.is_error]

    for idx, doc in enumerate(docs):
        for entity in doc.entities:
            print("Entity: {}".format(entity.text))
            print("...Normalized Text: {}".format(entity.normalized_text))
            print("...Category: {}".format(entity.category))
            print("...Subcategory: {}".format(entity.subcategory))
            print("...Offset: {}".format(entity.offset))
            print("...Confidence score: {}".format(entity.confidence_score))
        for relation in doc.entity_relations:
            print("Relation of type: {} has the following roles".format(relation.relation_type))
            for role in relation.roles:
                print("...Role '{}' with entity '{}'".format(role.name, role.entity.text))
        print("------------------------------------------")
health_example(client)