Hide

Problem B
Interrelatednesses

Languages en sv

Your friend Anna is an expert in the game Interrelatednesses. In this game she has to find groups of 4 words that are connected to each other by something in common. For example the words "dog", "cat", "bird", "fish" can be connected to each other by the common word "animal".

Anna wants to spread the love of the game to her friends and has therefore sent 100 Interrelatednesses games to you. Each game contains a list of either 8, 12 or 16 words where you have to put them into groups of 4 words that are connected to each other by something in common.

You now want to prove the power of AI by not looking at the games and instead write a program that can solve the games. Like in the other problems, you are not allowed to manually label the data.

Normally you are NOT allowed to use any pre-trained models. But for this task you are allowed to use any local embedding model you want.

Resources

https://www.youtube.com/watch?v=wjZofJX0v4M&t=781s (embeddings) and https://www.youtube.com/watch?v=4b5d3muPQmA&t=34s ($k$-means).

Input

Download the file with data and the baseline solution. This can be found at the bottom under “attachments”. You will receive a zip file containing:

  • data.json – JSON file describing all games.

  • baseline.py – a baseline solution to the problem.

The file data.json contains a list of games, each in the following format:

[
  {
    "amount_groups": <number of groups in the game>,
    "words": [
      <word1>,
      <word2>,
      ... (total number of words is always 4 * amount_groups)
    ]
  },
  ...
]

Each object in the list represents a single game. The words list contains all the words for that game, shuffled.

You will also find other files with similar formatting for other tasks or split types (e.g., test set).

Output

For each game, output your proposed grouping of the words.

Your program should output the games in the same order as in data.json. For a game with amount_groups $= g$, output exactly $g$ lines. On each of these lines, first print an integer $k$ followed by $k$ distinct words from that game, all separated by single spaces.

Note that the groups you output do not necessarily need to have size 4.

Scoring

Your output does not need to match the answer exactly. Instead, Anna will compare your proposed groups to her own hidden groupings and assign a score between $0$ and $100$ based on how good your groupings are. Roughly speaking, you gain points when words that belong together in Anna’s solution are placed in the same group in your output.

The final score is calculated by comparing how many times words that belong together in Anna’s solution end up in the same group in your output. If we say that $S$ is the percentage of times you place words in the correct group, then your final score is:

\[ \text{Score} = \max \left(0, \min \left(100, \frac{S - 55}{85 - 55} \times 100 \right)\right) \]

At the end of the competition, all solutions will be retested on the remaining 70% of the data. Your final score at the end of the competition will only be based on the remaining 70% of the data; the 30% tested during the competition will have no effect. It is guaranteed that the 30% tested during the competition were chosen uniformly at random and are entirely disjoint from the 70% tested at the end. Therefore, the results on the 30% tested during the competition should be seen as a strong indicator of how well your solution performs. At the same time, it is detrimental to overfit your solution to the test data.

Please log in to submit a solution to this problem

Log in