arrow_back
Download 50 PySpark Questions-2
50 PySpark Questions-final
Easy
1. Write a Spark code snippet to calculate the sum of a salary in a DataFrame
Identify rows containing non-numeric values in the "Quantity" column, if any.
Find out hashtag count for each quote
Write a PySpark program to select every 3rd (nth) row in the dataset
Write a pyspark code to rank the products based on their total sales amount for each month, and return the top product for each month
Employee salary greater than Manager Id
Find Duplicate Emails
Find the missing numbers in the column
Write a Pyspark program to format the scientific notation, and show them as decimal numbers
Total Partitions and Total Rows in Each Partitions
Find year,start_week_date,end_week_date,week_num
You are given a fixed length text file. Your task is to create multiple columns from the single row
Write a PySpark program to augment this dataframe with two additional columns: 'std_code' and 'landline'.
Save Integer and Array Data into the DataFrame
24. How to combine many lists to form a PySpark DataFrame?
28. Top Selling Products
36. Write a PySpark program to remove the extra spaces between the words in the 'Full_name' column and trim any leading or trailing spaces
37. Find first not null value in each row
38. Collect_list and Collect_set
40. find id which is odd number and description is boring
42. Calculate the average salary for each department
43. Add a new column named "bonus" that is 10% of the salary for all employees
44. Group the data by department and find the employee with the highest salary in each department
45. Find the top 3 departments with the highest total salary.
46. Find the top most department having highest salary
47. Filter the DataFrame to keep only employees aged 30 or above and working in the "Sales" department
48. Calculate the difference between each employee's salary and the average salary of their respective department
49. Calculate the sum of salaries for employees whose names start with the letter "J"
Medium
Find the accounts that should be banned. An account should be banned if it was logged in from two different IP addresses at any moment.
Commulative Salary
20. Add file_name in the column in the dataframe
19. Find out number of rows in each joins
๐๐จ๐ฐ ๐ญ๐จ ๐ฐ๐ซ๐ข๐ญ๐ ๐ฉ๐ฒ๐ฌ๐ฉ๐๐ซ๐ค ๐๐จ๐๐ ๐๐จ๐ซ ๐ญ๐ก๐ข๐ฌ ,๐ฌ๐ฎ๐ฉ๐ฉ๐จ๐ฌ๐ ๐จ๐ง mobile number 10 ๐๐ข๐ ๐ข๐ญ๐ฌ ๐๐ซ๐ ๐ญ๐ก๐๐ซ๐ .๐ฐ๐ซ๐ข๐ญ๐ ๐๐จ๐๐ ๐ญ๐จ ๐ฆ๐๐ฌ๐ค ๐จ๐๐ middle 8 ๐๐ข๐ ๐ข๐ญ๐ฌ ๐๐ง๐ ๐ฌ๐ก๐จ๐ฐ ๐จ๐ง๐ฅ๐ฒ first and last 2 ๐๐ข๐ ๐ข๐ญs
25. Handling Null Values in PySpark
26. Find the origin and the destination of each customer.
32. How do you deal with inconsistent or erroneous data formats
33. Write a solution to find the people who have the most friends and the most friends number.
35. Write PySpark Code to get the below output - Explode vs Explode_Outer
Hard
17.How to keep only the top 2 most frequent values (by job) as it is and replace everything else as โOtherโ
18. Flatten Nested Json in PySpark
23. Write a spark program to return the shipped and delivered rate for each order. Return order_id, shipped percentage, and delivered percentage.
27. Data validation between source and target table
29. Calculate consecutive number repeats at least 3 times
34. Write a solution to find the employees who are high earners in each of the departments.
41. Pivot and Unpivot
39. Write a solution to swap the seat id of every two consecutive students.
Write a sql query To find the list of returning customers
Preview - Practice 50 PySpark Interview Questions
Discuss (
0
)
navigate_before
Previous
Next
navigate_next