Thursday, 13 June 2019

Analysing Annual Report - Data Extraction and Data Analysis from PDF

Analysing Annual Report - Data Extraction and Data Analysis from PDF
Listed companies like to issue their annual reports in PDF formats. PDFs are nice to look at on your screen and for printing but they are a pain to extract data from. Fortunately, there are programmatic ways (using Python) to extract and analyse the data.
With the 2018 Annual Report of ASX (the Australian Stock Exchange, which is also listed on itself as a company) I am going to demonstrate the following:
Part1: Table Extraction and Data Analysis
  1. Extract the income statement, balance sheet, and cashflow statement. (Using camelot)
  2. Calculate financial ratios.
Part2: Extractive Summary and Sentiment Analysis
  1. Extract text from the report pdf. (Using PyPDF2)
  2. Summarise the report by extracting key phrases.
  3. Conduct a sentiment analysis on the report (using TextBlob)
The techniques described in this post are applicable to dealing with all sorts of textual documents in PDF. For example, you have a client or supplier who stubbornly refuse to send data in csv/xml/json/Excel, and instead send you a lot of data in one or many PDFs. Instead of entering data by hand, you can try using some of the techniques below.

The following Python libraries are required: numpy, pandas, re, nltk, PyPDF2, camelot, and tkinter (as dependency of camelot). As a dependency of camelot Ghostscript (software) is also needed to be installed on your computer.

Full discloser: As at 13/06/2019, I am not a direct shareholder of ASX. However, I am exposed to the company through VanEck Vectors Australian Equal Weight ETF (MVW:ASX). ASX constitutes 1.27% of the ETF's holding as at 11/06/2019. I am most likely also exposed to the company through an industry superannuation fund.

This blog post is an html version of a Jupyter Notebook. You will see "In"s which are the scripts written in Python, and "Out"s which are the outputs.

Part 1: Table Extraction and Data Analysis

Extract the income statement, balance sheet, and cashflow statement:

The Plan of attack is simple: find the pages these statements are located on the pdf document, pass them into camelot.read_pdf() and load the tables to pandas for easy analysis. Loading these statements programmatically is advantageous more often than not compare to copying and pasting tables from a pdf to Excel - with Excel, you often get misaligned columns and merged cells.
image.png
Income Statement:
In [1]:
import camelot
import pandas as pd

filepath = "data/ASXAnnualReport2018.pdf"
tables = camelot.read_pdf(filepath, pages = '57', flavor = 'stream')
incomeStatement = tables[0].df #index 0 for first table and .df to convert to pandas dataframe
incomeStatement
Out[1]:
0 1 2 3
0 2018 2017
1 For the year ended 30 June Note $m $m
2 Revenue
3 Listings and Issuer Services 222.9 194.8
4 Derivatives and OTC Markets 286.7 269.4
5 Trading Services 211.8 197.1
6 Equity Post-Trade Services 105.3 104.4
7 Interest income 170.9 150.5
8 Dividend income 14.2 13.9
9 Share of net (loss)/profit of equity accounted... C2 (0.4) 0.1
10 Other 1.6 1.9
11 1,013.0 932.1
12 Expenses
13 Staff (114.6) (110.6)
14 Occupancy (16.4) (14.6)
15 Equipment (29.4) (29.3)
16 Administration (40.3) (30.0)
17 Finance costs (102.4) (85.2)
18 Depreciation and amortisation D2, D3 (47.6) (46.0)
19 Other C2 (20.2) -
20 (370.9) (315.7)
21 Profit before income tax expense 642.1 616.4
22 Income tax expense A5 (197.0) (182.3)
23 Net profit for the year attributable to owners... 445.1 434.1
24 Other comprehensive income
25 Items that may be reclassified to profit or lo...
26 Change in the fair value of available-for-sale... (0.9) (0.5)
27 Change in the fair value of available-for-sale... (10.3) 39.6
28 Change in the fair value of cash flow hedges 1.2 (0.4)
29 Other comprehensive income for the year, net o... (10.0) 38.7
30 Total comprehensive income for the year attrib... 435.1 472.8
31 Earnings per share
32 Basic earnings per share (cents per share) A4 230.0 224.5
33 Diluted earnings per share (cents per share) A4 230.0 224.5
34 1 $0.2 million (2017: $0.3 million) was reclas...
Balance Sheet:
In [2]:
tables = camelot.read_pdf(filepath, pages = '58', flavor = 'stream')
balanceSheet = tables[0].df
balanceSheet
Out[2]:
0 1 2 3
0 2018 2017
1 As at 30 June Note $m $m
2 Current assets
3 Cash and funds on deposit B2 5,563.9 5,683.8
4 Available-for-sale financial assets B2 4,001.4 3,401.8
5 Receivables D1 373.2 1,124.9
6 Prepayments 17.4 16.6
7 Total current assets 9,955.9 10,227.1
8 Non-current assets
9 Available-for-sale investments C1 416.4 431.1
10 Equity accounted investments C2 53.1 66.7
11 Investments at fair value through profit or loss C3 4.8 -
12 Intangible assets D2 2,438.1 2,439.2
13 Property, plant and equipment D3 54.4 46.6
14 Prepayments 0.3 1.0
15 Total non-current assets 2,967.1 2,984.6
16 Total assets 12,923.0 13,211.7
17 Current liabilities
18 Amounts owing to participants B1 8,295.8 7,884.7
19 Payables D4 354.3 1,092.4
20 Current tax liabilities 17.1 16.3
21 Provisions D5 14.6 15.8
22 Revenue received in advance 22.4 18.2
23 Total current liabilities 8,704.2 9,027.4
24 Non-current liabilities
25 Amounts owing to participants B1 200.0 200.0
26 Net deferred tax liabilities A5 64.7 69.3
27 Provisions D5 8.5 6.8
28 Revenue received in advance 0.1 0.1
29 Total non-current liabilities 273.3 276.2
30 Total liabilities 8,977.5 9,303.6
31 Net assets 3,945.5 3,908.1
32 Equity
33 Issued capital A3 3,027.2 3,027.2
34 Retained earnings 666.7 622.2
35 Restricted capital reserve 71.5 71.5
36 Asset revaluation reserve 168.4 178.4
37 Equity compensation reserve 11.7 8.8
38 Total equity 3,945.5 3,908.1
Cashflow Statment:
In [3]:
tables = camelot.read_pdf(filepath, pages = '60', flavor = 'stream')
cashflow = tables[0].df
cashflow
Out[3]:
0 1 2 3
0 2018 2017
1 For the year ended 30 June Note $m $m
2 Cash flows from operating activities
3 Receipts from customers 891.7 835.4
4 Payments to suppliers and employees (248.0) (257.4)
5 643.7 578.0
6 Interest received 169.1 150.4
7 Interest paid (101.9) (83.9)
8 Dividends received 14.2 13.9
9 Income taxes paid (196.4) (174.8)
10 Net cash inflow from operating activities 528.7 483.6
11 Cash flows from investing activities
12 Increase in participants’ margins and commitments 404.5 2,018.9
13 Payments for available-for-sale investments - (16.2)
14 Payments for equity accounted investments C2 (7.0) -
15 Payments for investments at fair value through... B3, C3 (4.6) -
16 Payments for other non-current assets (48.3) (61.0)
17 Net cash inflow from investing activities 344.6 1,941.7
18 Cash flows from financing activities
19 Dividends paid A2 (400.6) (388.8)
20 Net cash (outflow) from financing activities (400.6) (388.8)
21 Net increase in cash and cash equivalents1 472.7 2,036.5
22 Increase/(decrease) in the fair value of cash ... 0.4 (1.3)
23 Increase/(decrease) in cash and cash equivalen... 6.6 (22.4)
24 Cash and cash equivalents at the beginning of ... 9,085.6 7,072.8
25 Cash and cash equivalents at the end of the year1 B2 9,565.3 9,085.6
26 Cash and cash equivalents consists of:
27 ASX Group funds 1,069.5 1,000.9
28 Participants’ margins and commitments B1 8,495.8 8,084.7
29 Total cash and cash equivalents1 B2 9,565.3 9,085.6
30 1 Available-for-sale financial assets pledged...
31 agreements are used to support the investment ...
32 Reconciliation of the operating profit after i...
33 Net profit after tax 445.1 434.1
34 Non-cash items:
35 Depreciation and amortisation 47.6 46.0
36 Share-based payments 2.9 -
37 Share of net (loss)/profit of equity accounted... 0.4 (0.1)
38 Tax on fair value adjustment of available-for-... 0.4 0.2
39 Tax on fair value adjustment of cash flow hedges (0.5) 0.2
40 FX revaluation on investments at fair value th... (0.2) -
41 Change in fair value on equity accounted inves... 20.2 -
42 Changes in operating assets and liabilities:
43 Increase in tax balances 0.5 7.1
44 (Increase) in receivables1 (3.3) (0.1)
45 (Increase) in prepayments (0.1) (5.0)
46 Increase in payables1 11.1 0.3
47 Increase in revenue received in advance 4.2 1.8
48 Increase/(decrease) in provisions 0.4 (0.9)
49 Net cash inflow from operating activities 528.7 483.6
50 1 Receivables and payables excludes the moveme...
At this point it is probably easiest to save the dataframes to csv to keep the complex labelling and formatting of the tables/dataframes. Unlike Excel, pandas would not let you to have multiple values per column. However, we can still calculate ratios by indexing the tables.

Calculate financial ratios

Let's first create a function that converts all the values with ',' and '()' to numbers (floats):
In [4]:
def to_float(value):
    value = value.replace(',','')
    value = value.replace(')','')
    value = value.replace('(','-')
    value = value.strip()
    return float(value)
One easy way to access the "cells" in a pandas dataframe is to use the .iloc method and pass in the indexes.

Revenue Growth Rate:

(Rev 2018 - Rev 2017) / Rev 2017
In [5]:
revGR = (to_float(incomeStatement.iloc[11,2]) - 
        to_float(incomeStatement.iloc[11,3]))/to_float(incomeStatement.iloc[11,3])

print('The revenue growth rate was: ', round(revGR*100,2), '%')
The revenue growth rate was:  8.68 %

Earnings per Share Growth Rate:

(EPS 2018 - EPS 2017) / EPS 2017
In [6]:
EPSGR = (to_float((incomeStatement.iloc[33,2])) -
         to_float((incomeStatement.iloc[33,3])))/ to_float(incomeStatement.iloc[33,3])

print('The EPS growth rate was: ', round(EPSGR*100,2), '%')
The EPS growth rate was:  2.45 %

Debt to Equity Ratio (and Growth Rate):

Debt to equity = Total Liabilities / Total Equity
Growth Rate = (DTE 2018 - DTE 2017) / DTE 2017
In [7]:
# Debt to Equity for each year
DER2018 = to_float(balanceSheet.iloc[23,2]) / to_float(balanceSheet.iloc[38,2])
DER2017 = to_float(balanceSheet.iloc[23,3]) / to_float(balanceSheet.iloc[38,3])

# Growth Rate
DERGR = (DER2018-DER2017)/DER2017*100

print('Debt to Equity Ratio was {} in 2018 and was {} in 2017. The change was {}%.'.format(
    round(DER2018,2), 
    round(DER2017,2),
    round(DERGR,2)))
Debt to Equity Ratio was 2.21 in 2018 and was 2.31 in 2017. The change was -4.49%.

Net cash inflow from operating activities:

(Net Cash 2018 - Net Cash 2017) / Net Cash 2017
In [8]:
oc2018 = to_float(cashflow.iloc[10,2])
oc2017 = to_float(cashflow.iloc[10,3])

growth = (oc2018 - oc2017) / oc2017*100

print('The growth rate in Receipt from Customers was {} %.'.format(round(growth,2)))
The growth rate in Receipt from Customers was 9.33 %.
The ASX had a pretty good year in the financial year of 2018. EPS and operating cashflows were all up. Debt to equity ratio dropped. However, EPS growth was lagging revenue significantly.

Part2: Extractive Summary and Sentiment Analysis

Extract text from the report pdf

Our first step is to extract the text from our report. To do that, we use the library PyPDF2 to load the file and use the method .extractText to extract the text.
In [9]:
import PyPDF2

file = open(filepath, 'rb')
pdfObj = PyPDF2.PdfFileReader(file)

contents = []

for p in range(pdfObj.getNumPages()):
    page = pdfObj.getPage(p)
    pageContent = page.extractText()
    contents.append(pageContent)
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]
Quick inspection of the content:
In [10]:
contents[0]
Out[10]:
'ASX Limited\n Annual Report 2018\n'
In [11]:
contents[7]
Out[11]:
'/  ASX Annual Report 2018\n Letter from the Chairman\n66Letter from the Chairman\nI am pleased to present ASX™s Annual Report for the ˜nancial year \nended 30 June 2018 (FY18).\nOver the past 12 months, ASX continued to embrace technological \nand operational change to strengthen our foundations, develop \nnew products and services for customers, and position the Group \nfor future growth.\nThis extends our long history of innovation within a market and \nregulatory environment that is constantly on the move. \nIn FY18, ASX increased its returns to shareholders for the ˜fth \nyear in a row. The total shareholder return was 21.9%, signi˜cantly \noutperforming the wider market as measured by the S&P/ASX \n200 Index. This continues a longer term trend, with ASX delivering \n shareholders a total return of 107.0% over the past ˜ve years \ncompared to 50.9% by the Index.\nStatutory pro˜t after tax was $445.1 million, up $11.0 million or 2.5% \non the previous 12-month period.\nThere was one $20.2 million signi˜cant item in FY18. This was \na non-cash impairment charge taken against the value of ASX™s \ninvestment in Yieldbroker Pty Limited, an electronic market\n operator for OTC debt and interest rate derivatives in Australia. \nASX acquired 49% of Yieldbroker in 2014. While the move to an \nelectronic market for these ˜nancial instruments has been slower \nthan expected, ASX remains con˜dent that the move is inevitable. \nYieldbroker continues to be an important strategic investment. \nUnderlying pro˜t after tax was $465.3 million for the period, \n $31.2 million or 7.2% higher than last year, excluding the signi˜cant \nitem.Statutory earnings per share (EPS) grew by 2.4% to 230.0 cents \nand underlying EPS rose 7.1% to 240.4 cents.\nShareholders are continuing to see the bene˜ts of ASX™s strong cash \n˚ow, steady earnings growth and commitment to pay out 90% of \nunderlying pro˜t in dividends. The Yieldbroker impairment charge \ndid not impact on dividends. \nTotal dividends for FY18 were 216.3 cents per share, up 7.2% on the \nprevious year. Our dividends remain 100% franked.\nTrust and con˜dence\nASX operates at the heart of Australia™s ˜nancial markets. With \nthis privilege comes great responsibility. Our products, services \nand technology power Australia™s equity, debt and futures markets. \nOver the past three years we have worked in partnership with \nour regulators to improve the robustness and resilience of all our \ninfrastructure, integrate contemporary technologies, and adopt \nnew methods and processes. This is a multi-year challenge with \nseveral more years to go before we achieve the higher standards \nwhich we aspire to, and which our customers and the wider ˜nancial \ncommunity have come to expect.\nASX works hard to earn the trust and con˜dence of its custom\n-ers and the wider community. We know this cannot be taken for \ngranted and must be renewed every day. We also recognise that \nthe standards for all ˜nancial institutions globally are being raised \nand we need to respond accordingly.\nTrust in ASX is critical to our success. We endeavour to protect and \nstrengthen our reputation through ongoing operational improve\n-ment and encouraging our people to act responsibly and be account\n-able. Overseeing how this is done is one of the core responsibilities \nof the Board, as is increasing capabilities across the Group.\nTo that end, the Board has encouraged and supported a deep \nfocus on a range of risk-based and operational activities, as well \nas a signi˜cant renewal of technology platforms and systems. \nThis involves an increase in capital expenditure, upskilling of the \nexecutive, and a reorganisation of critical functions recommended \nand led by the CEO. \nEmbedding a strong foundation of respect, trust and integrity \nASX seeks to build and preserve a trustworthy and responsible \nculture, and pays close attention to:\n ŁVision and strategy\n ŁCompany and community values\n ŁRemuneration incentives. \nRick Holliday-Smith\n  ChairmanShareholders are continuing to see \nthe bene˜ts of ASX™s strong cash \n˚ow, steady earnings growth and \ncommitment to pay out 90% of \nunderlying pro˜t in dividends. \n'
This techinques is also useful when you want to extract parts of a pdf document. You can save the text to a .txt file and when you open the file with Notepad or other word processing software, it will get rid of all the \n's and intepret them as blank lines.

Summarise the report by extracting key phrases

First we can combine the list contents to one long string:
In [12]:
text = ' '.join(contents)
Next, we are going to use the LexRank algorithm from the sumy library to extract the 5 most significant sentences from our body of text. LexRank is an unsupervised approach that is based on finind a "centroid" sentence that is the mean word vector of all the sentences in a document - it finds sentences that are "representive" of the document.
In [13]:
import sumy
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer

parser = PlaintextParser.from_string(text,Tokenizer('english'))
In [14]:
# Create Summarizer
summarizer = LexRankSummarizer()
# Summarize the text to 5 sentences
summary = summarizer(parser.document, 5)

# Print out the result
print('The following is an extractive summary of the annual report: \n')
for sentence in summary:
    print(sentence, '\n')
The following is an extractive summary of the annual report: 

TSR for S&P/ASX 200 over the same period 8.1% Total dividends per share, fully franked Œ  90% payout ratio $445.1m 216.3c21.9% 4/  ASX Annual Report 2018 Vision, strategy, execution 4Our vision The world™s most respected ˜nancial marketplace Our strategy Diverse ecosystem We provide an open system of collaboration to support partnerships, products and services across the Australian ˚nancial ecosystem We think constantly about how we can improve the experience for our customers and make their lives easier We offer innovative solutions and technology to drive ef˚ciency and deliver bene˚ts to customers, employees and the wider ˚nancial marketplace We foster collaboration and agility within our businesses, across our teams and among our people We earn trust and deliver resilience by making sure our systems and processes are stable, secure, reliable and fair, and our people act with integrity towards the market and each other Customer centric Innovative solutions and technologyCollaborative culture Enduring trust, integrity and resilience Our Licence to Operate activities focus on what we need to do to stay a resilient, reliable and leading ˜nancial marketplace. 

Board skills matrix Category Description Number of non-executive directors with these skills 12345678Executive leadership Successful career as a CEO or senior executive Strategy De˜ne strategic objectives, constructively question business plans and implement strategy Financial acumenAccounting and reporting, corporate ˜nance and internal controls, including assessing quality of ˜nancial controls Risk and compliance Forward looking, able to identify the key risks to the organisation and monitor effectiveness of risk management frameworks and practices Public policyPublic and regulatory policy, including impact on markets and corporations Information technology/ digitalUse and governance of critical information tech -nology infrastructure, digital disruption and information monetisation Business develop -ment and customer management Commercial and business experience, including development of product, service or customer management strategies, and innovation People and change management Overseeing and assessing senior management, remuneration frameworks, strategic human resource management and organisational change Corporate governance Knowledge, experience and commitment to the highest standards of governance International exchange experience International ˜nancial markets or exchange groups including post trade services and relation -ships with ˜nancial markets participants Financial services experience Broking, funds management, superannuation and/or investment banking activities Corporate governance continued 34Corporate governance continued Director independence ASX recognises that having a majority of independent directors helps to ensure that the decisions of the Board re˚ect the best interests of ASX and its shareholders generally and that those decisions are not biased towards the interests of management or any other group. 

Reconciliation of the operating pro˚t after income tax to the net cash ˜ows from operating activities Net pro˚t after tax 445.1 434.1 Non-cash items: Depreciation and amortisation 47.6 46.0 Share-based payments 2.9 -Share of net (loss)/pro˜t of equity accounted investments 0.4 (0.1) Tax on fair value adjustment of available-for-sale ˜nancial assets 0.4 0.2 Tax on fair value adjustment of cash ˚ow hedges (0.5) 0.2 FX revaluation on investments at fair value through pro˜t or loss (0.2) -Change in fair value on equity accounted investments 20.2 -Changes in operating assets and liabilities: Increase in tax balances 0.5 7.1 (Increase) in receivables 1(3.3) (0.1) (Increase) in prepayments (0.1) (5.0) Increase in payables 111.1 0.3 Increase in revenue received in advance 4.21.8Increase/(decrease) in provisions 0.4 (0.9) Net cash in˜ow from operating activities 528.7 483.6 1 Receivables and payables excludes the movement in margins receivable/payable. 

/  ASX Annual Report 2018 Performance of the Group 65ASX Annual Report 2018 Risk management /Risk management The Group is subject to a variety of risks including clearing and settlement risk, and operational risk. 

E5.3 Auditor™s remuneration The following fees were paid or payable by the Group for and on behalf of all Group entities for services provided by the auditor and its related practices during the ˜nancial year: PricewaterhouseCoopers Australia 2018$'0002017$'000Statutory audit services: Audit and review of the ˜nancial statements and other audit work under the Corporations Act 2001 627612 Audit of information technology platforms 184180Other audit services: Model validation -152Code of Practice compliance 9090Non-audit services: Tax compliance services 10574Other review services 55-Total remuneration for PricewaterhouseCoopers Australia 1,061 1,108 Group disclosures continued /  ASX Annual Report 2018 Group disclosures 84E5.4 Other accounting policies (a) New and amended standards and interpretations adopted by the Group The new standards and amendments to standards that are mandatory for the ˜rst time in the ˜nancial year commenced on 1 July 2017 do not affect any amounts recognised in the current or prior years, and are not likely to materially affect amounts in future years. 

It seems extractive summary using Lex Rank is not very good. Let's try LSA, an algorithm based on term frequency techniques with singular value decomposition to summarize texts.
In [15]:
from sumy.summarizers.lsa import LsaSummarizer
summarizer_lsa = LsaSummarizer()
summary2 =summarizer_lsa(parser.document,5)

print('The following is an extractive summary of the annual report: \n')
for sentence in summary2:
    print(sentence, '\n')
The following is an extractive summary of the annual report: 

An unlisted entity operating licensed electronic markets for trading Australian and New Zealand debt securities Ł7% shareholding in Digital Asset Holdings LLC, up $11.0 million inclusive of convertible notes purchased. 

Australian Liquidity Centre customers ALC customersALC service connections FY14FY15FY16FY17FY1867962281987198495891081161230200400600800100012005060708090100110120130 CHESS replacement project Building on ASX™s strong tradition of innovation, we are leading the global ˚nancial exchange industry by selecting distributed ledger technology (DLT) for our new equities clearing and settlement system. 

The dealing rules: ŁAre designed to help prevent directors and staff from contra -vening laws on insider trading ŁEstablish a best practice procedure for dealings in securities. 

Regulatory standards applying to many ˜nancial market participants have increased in recent years and there is an expectation that these may increase further over time. 

On initial adoption of the standard all debt securities other than those lodged by participants to cover margin obligations will be reclassi˜ed and measured at amortised cost. 

It seems LSA has done a much better job at finding sentences that make more sense. 3 of the 5 sentences relate to regulatory. Reading the sentences, it is easy to get that the Annual Report is about an exchange. The second sentence says: " ...we are leading the global ˜nancial [sic] exchange industry...", which clearly indicates that this is a document about a financial exchange.

Sentiment Analysis

Next, I am going to conduct an sentiment analysis on the whole text. The library I will be using is the vader library from nltk. nltk is a broad libraries for NLP (natural language processing).
Before I feed the text into a sentiment analyser, the text will first need to be cleaned first to ensure only words are analysed. It involves the following steps:
  1. Remove all numbers
  2. Make every word lowercase
  3. Remove any unneccessary space
I will create a function to clean the text:
In [16]:
import re

def clean_text(text):
    # remove all numbers
    numbers_removed = re.sub('[^A-Za-z]', ' ', text)
    # lowercase all words
    lowercase_text = numbers_removed.lower()
    # remove any unneccessary space
    clean_text = lowercase_text.strip()
    return clean_text
In [17]:
cleaned_text = clean_text(text)

Using Textblob:

Next, feeding cleaned_text through TextBlob for sentiment scores. The sentiment function of TextBlob returns two scores: polarity and subjectivity. Polarity is a sentiment score that has a range of [-1,1] where -1 is the most negative and 1 is the most positive. Subjectivity is a score for how subjective, emotional or judgemental the text is and has a range of [0,1].
In [18]:
from textblob import TextBlob

sentiment_score = TextBlob(text).sentiment
print('Sentiment score using TextBlob:',sentiment_score)
Sentiment score using TextBlob: Sentiment(polarity=0.07644454134208137, subjectivity=0.36095363469023456)
It seems the Annual Report is fairly dull with a polarity score of 0.076. It is also mildly subjective with a subjectivity score of 0.36 - you should always read annual reports with a grain of salt.

Reference Resource:

1 comment:

  1. Hello,

    Thank you for this, very useful. I am just a beginner to all this, but I am considering the prospects of automatizing the extracting of very specific sections of general annual reports of listed firms. Namely, the executive remuneration and board of directors. In some cases ( i think in the majority of cases), this data is based in tables, in other cases they are simple in the text. What could be an approach to extract this data with the tools you have mentioned? For example, first searching for the respective sections perhaps based on keywords, and if a table is found, retrieve it?

    Thanks!

    ReplyDelete

Portfolio Optimisation with Python

 Recently I have been busy so I have been neglecting this blog for a very long time. Just want to put out some new content. So there is this...