How to Thesis #

Setup #

Read the formalities Bachelor Inf., Master Inf.
You have the ok to register?
- Remind me to create a thesis-yourname channel in Mattermost
- Remind me to create a git repository for you in our gitlab
  - All code should be there, no public repository
- Optional but helpful: Give me access to your overleaf or repository where you store your thesis.
If you do not have a second supervisor (Zweitgutachter) yet tell me.

Formals Thesis #

Use the University template.
How much to write depens on the topic and writing style. Most write 30+ pages content for a Bachelorthesis and 60+ pages content for a Masterthesis. Content does not include formalities like table of contents, titlepage, bibliography, etc. Believe me, you will have enough to write, even if it feels different at the moment.
We have a regular meeting every two weeks. This meeting is optional, but cancel the appointment at least one day prior.
Write every week a 1-2 sentences what you did or what the current problem are in the thesis channel. It is totally okay to write that you did nothing.
There is an intermediate presentation for the working group after 50% of your time passed.
Write me if you need help.

Content #

How to start #

Copy your expose, change the title, delete the schedule
Start exploring your idea further, by implementing and reading.
After 1-2 Weeks do the following:
- First day:
  - Write down a rough structure (Section Titles)
  - Write for each section a small text (rough notes, no polished text)
- Some other day:
  - Refine the structure with subsections
  - Write for each subsection a small text (rough notes, no polished text)
- On our next meeting:
  - We discuss your structure
Write at least 3 sentences every day (Mo-Fr). This will really help you in staying in the writing.
The most common mistake is to only do the programming and to write the majority of the thesis in the last month. It is okay to start with programming. But when you have implemented something or found an approach that is likely to stay that way, write it down.

Structure #

Abstract

Written at the very end of your thesis.

Introduction

Rewritten at the end of your thesis.
First state what the problem is in a public science way.
The automated use of Internet services is an essential building block of the Internet and the web. Many services depend on each other. Examples are the embedding of a weather feed into a web page or a service that provides a price comparison by automatically querying different marketplaces. Many services provide specific interfaces for other services to allow the automation of their usage. However, there are also services without such interfaces that are intended for humans only. Automated use of a service by a program, hereafter called a bot, can affect the satisfaction of human users and can cause financial or social damage. For example, the automated use of social media can be used to spread opinions and false information, which can even influence elections \cite{bessi2016social}.
If needed, set your focus
Automation of a service can be done in different ways \cite{amin2020web}. The most efficient approach is to automate API of the service. Using the API directly only requires a script. By simply executing the script multiple times, it is possible to create a large number of bots, e.g., to influence voting opinions on social media through nationwide spamming \cite{bessi2016social}.
Explain the problem with current solutions.
In many cases, CAPTCHAs are the first and last defence against bots. However, they introduce user friction and are losing effectiveness as machine learning advances \cite{captchaBreaking,captchaStudy}. Other approaches use anti-reverse engineering techniques to oppose Man-At-The-End (MATE) attackers \cite{mate}, that have access to a client application on a controlled device. Those approaches aim at making the extraction and use of the application protocol more difficult, e.g., embedding (unique) API keys in the client application or using obfuscation and anti-reverse engineering techniques \cite{roundy2013binary,antiDebug}. Most of these techniques only make it difficult to create the first bot. Once a bot is created, it can be scaled again. But this is what makes API bots so threatening: the ability to quickly and inexpensively spawn large numbers of bots. Because this approach can be really harmful, this paper focuses on how such automation can be restricted.
State your contribution (rewrite this at the end)
The main contribution of this paper is an approach to combat the cost-efficient duplication of bots, which can be applied with low performance and organizational overhead. In more detail, we make the following contributions:
- We propose a method, to increase the cost of duplicating bots by assigning each client of the same service its own application protocol. We call this Polymorphic Protocols. While there are already many strong obfuscation techniques for binaries (Tigress, Thermida) \cite{tigress,OreansTechnologiesSoftware} and protocol obfuscation techniques in the censorship resistance realm \cite{dyer2015marionette,protoObf1}, we are the first to our knowledge to use obfuscation of application protocols against bots.
- We implement the approach for the widely used protocol language protobuf and the programming language Java\footnote{Avaiable open source at \url{https://github.com/UHH-ISS/polymorphic-protocols}}. It is easily applied to existing protocols and just requires the existing protobuf file as input. Everything else is generated automatically so that it can even be used in a CI/CD pipeline without the need for a developer.
- We evaluate the technical performance overhead of the approach. We also discuss the organizational overhead for developers and the additional effort for attackers to duplicate bots.
Note that polymorphic protocols are an obfuscation technique to make the scaling of bots more difficult and not to prevent the creation of bots. The approach gains from being used along-side existing anti-reverse engineering mechanisms that impede code extraction (slicing), e.g., anti symbolic execution, virtualization, or just in time compilation \cite{antiSymbolicExecution,tigress}. Legitimate bots and interoperability across different services is still possible, e.g., by providing special API keys after thorough verification.
Explain the structure of your thesis
The rest of the thesis is structured as follows. Section \ref{sec:rel-work} discusses other approaches that make it harder to create bots. Section \ref{sec:system} explains how polymorphic protocols can be created and applied. Section \ref{sec:eval} describes the implementation, evaluates and discusses the results. Finally, Section \ref{sec:conclusion} concludes the thesis.

Background

Includes all concepts that are not necessarily known. Think of the knowledge of an average bachelor graduate.

Requirements & Related Work

First chapter (Related Work) to fill, never stop filling this chapter!
What is already there?
Structure the related work in categories.
Also adress the weakpoints of the related work. Where to improve?
Pick suitable related work to compare to your method, evaluation and results
- Ideal scenraio (although rare): Your method is different, but the evaluation is the in some parts the same and the results differ. This results in an easy comparison.
- Method: You should compare your method with related work. Additionally, it is beneficial to justify your method by pointing out relevant techniques that either relate directly to your work or can be adapted to your problem.
- Evaluation: Using some (or all) of the metrics and evaluation techniques from related work is the best way to ensure that your results can be compared to theirs. You should not fully copy their evaluation, but having some common ground can facilitate meaningful comparisons.
- Results: Comparing your results to related work is essential. If a direct numerical comparison is not feasible, you should explain how your results relate. You can also compare the evaluation by highlighting differences in metrics or other factors such as a stronger adversary. If none of these options are viable, you should provide a compelling argument as to why your method is superior. However, such arguments can be weaker, and it is better to have some points that are somewhat easy to compare. Note that you dont need to compare all of your results to related work, but you should at least compare some so that your approach can be compared to others.

Method

While designing your system be sure to justify your decisions, best with related work!. For example if you want to use the levensthein distance as a metric justify why it is a fitting metric and you are not using other distance metrics, e.g., Affine Gap Distance, Hamming or Cosine. Another example, if you use a neural network to do something, you need to justify your decisions. Why this network, layers and training algorithm. Why trained on this data?
While designing your method, think about how it can be evaluated!

Implementation

Not the main focus, keep it short!
You can even put this chapter as a section in the evaluation.

Evaluation

You can have the best method, idea or implementation. If you cannot evaluate it, it loses a lot of worth
~~Try to~~ Think about this chapter before you implement everything
Section X.0
- What are you doing in this eval?
- What are the research questions (RQ)? You should have 2-3 questions that should not be answered with yes or no.
Section X.1 Environmental Setup / Implementation (Optional)
- Describe your setup and implementation
Section X.2 Metrics (Optional)
- If you have special metrics describe them here
Section X.3 RQ1
- State the results for your first research question.
- Discuss the results. What is good, what is bad, what is noteable, what is unexpected, expected, etc.
Section X.4 RQ2
- Same as above
Section X.Y Discussion (or move it to an extra chapter)

Limitations

Be critical. What are open problems? What are limitations?
Can be merged with the evaluation chapter.

Conclusion

Written at the very end of your thesis.
Answer each research question here again.

Grammar, spelling and style #

Use software like Grammarly to correct your spelling.
Use passive tone. You should not write “we implemented …” or “I implemented …”.
In general write in present tense. ~~The evaluation showed that the soruce IP…~~ “The evaluation shows that the soruce IP …”.